# Session 1

## Index 

1. [Intended learning outcomes](#outcomes)
2. [Using Jupyter Notebooks](#notebooks)
3. [Literals, operators, and data types](#operators)
4. [Variables and keywords](#variables)
5. [SciPy and arrays](#arrays)
6. [Basic statistics with arrays](#stats)
7. [Reading and writing files](#files)
8. [Matplotlib: plotting data](#plotting)
9. [Log plots and histograms](#histograms)


## 1. Intended learning outcomes<a id="outcomes"></a>
After this session, you should be able to:
- create variables and carry out arithmetic operations on them;
- import the SciPy and Matplotlib packages;
- create and manipulate arrays, and do basic statistics on them;
- read in a data file and save your data to an output file;
- create line plots, scatter plots, log plots, and histograms;
- amend the plot format to create a figure of a publishable standard.

## 2. Using Jupyter Notebooks<a id="notebooks"></a>
Throughout your 4 computing sessions we will be using Jupyter Notebooks as interactive lab scripts. These notebooks include both text and code. Once you have saved a copy on your own hard drive, you can type code in the code cells (the cells preceded by "In [ ]:" and execute it either by pressing shift+enter or by pressing the 'run cell' button (play symbol) in the toolbar above. You can add more code cells by pressing the 'insert cell below' button (plus symbol). By default any new cells you add are code cells; however you can change these to Markdown  in the drop-down list in the toolbar to allow you to make your own notes in the lab scripts. For a cheat sheet on Markdown, see [here](http://assemble.io/docs/Cheatsheet-Markdown.html).

Make sure you add new code cells every time you want to try out something new, instead of editing your previous code. This way you have a record of everything you have done, which both you and your demonstrator can refer to. Save your work regularly - sometimes you may be forced to close and reopen your file, and you don't want to lose any of your work!

A couple of notes on usage: when you run a cell, the notebook will automatically scroll to the next cell. Note however that you must first click the next cell to explicitly highlight it before you start typing (otherwise unexpected things will happen!). If you double-click on a Markdown cell it will change into edit mode.

**Exercise 1: try this to see what it looks like. To revert to the normal view you can run the cell in the same way as you would run a code cell.**

The lab scripts use various colour-coded cells:

<div style="background-color: #FFF8C6">

This is an optional cell, which includes additional information or optional exercises. The optional exercises are meant to be interesting relevant exercises that enhance your understanding; however you should only attempt these if you are clearly ahead in time, or have extra time to spend on this at home. Your first priority should be to complete the entire non-optional lab script for each session before the next session begins.

<div style="background-color: #00FF00"> These cells indicate you should discuss your results with a demonstrator. Make sure you put your hand up and talk to a demonstrator at this point whether or not you feel you understood the preceding material. Of course you can ask further questions of your demonstrators at any point during the session.

Every now and then the lab script will remind you to make notes of what you have learnt in your lab book. Your computing lab book should be a running commentary of your work: for example, answers to questions posed in the lab script, new functions you have found, salient points you have learnt, etc. Later on it should also include block diagrams of the programmes you write, and solutions to common bugs you encounter. You should be making notes in your computing lab book not only when you encounter a reminder, but continuously as you go along.

Before you start the computing work in the next section, make sure you are comfortable using Jupyter Notebooks, and you have saved your Jupyter Notebook on your H:\ drive in a dedicated folder for each session of the computing course (or in any sensible location if you are using a private computer).  You will be saving further files to the same location as you work through the scripts below.

## 3. Literals, operators, and data types<a id="operators"></a>

On a very basic level, Python can be used as a calculator. 

**Exercise 2: in the cell below, try typing:**

<span style="color:blue">8 + 2</span>

**Now run the cell by pressing shift+enter or by clicking the 'run cell button' (play symbol) in the toolbar at the top of this notebook.**

The numbers <span style="color:blue">8</span> and <span style="color:blue">2</span> you typed above are called literals. Literals are data inserted directly into your code. Here, we have used integer literals (e.g. 1, 2, -3, ... ), but more frequently we will use float literals (numbers with decimal points or given in scientific notation, even if they represent integers: 1.3, 10.0, 1e10, ...). These numbers are called “floating point” because of the manner in which they are encoded in the computer’s binary memory. 

The <span style="color:blue">+</span> symbol in first code cell is called an *operator*. Operators operate on the code on either side of them (called the operands), and produce some sort of result (e.g. the integer 10 in your code above). Other examples of arithmetic operators are: <span style="color:blue">\- </span>for subtraction, <span style="color:blue">\*</span> for multiplication, <span style="color:blue">/</span> for division, <span style="color:blue">\*\*</span> for exponent, <span style="color:blue">%</span> for modulus (returns the remainder of a division), and <span style="color:blue">//</span> for floor division (returns the result of the division rounded down to an integer). 

Some operators treat the values on either side of them the same way, e.g. <span style="color:blue">8 + 2</span> gives the same result as <span style="color:blue">2 + 8</span>. Others however are directional, for example the result of <span style="color:blue">3\*\*2</span> is different from <span style="color:blue">2\*\*3</span>. 

**Exercise 3: in the cells below, try out the behaviour of each of the operators given above - use one cell for each arithmetic statement. You can add as many cells as you like by using the 'insert cell below' (plus symbol) button. Make sure you understand their behavious (if not, do ask one of your demonstrators - they will be happy to explain).** 

<div style="background-color: #FFF8C6">

Note that in older versions of Python (2.x), as well as in many other programming languages, the operator <span style="color:blue">/</span> would return a floor division if both operands (i.e. the values either side) are integers, and a normal division if either or both operands are floats. 


The order of operations is as usual, i.e. multiplication and division before addition and subtraction. For operations on the same level, Python reads code from left to right, i.e. <span style="color:blue">20/5\*2</span> will give 8.0. To change the order, or make the order explicit (and hence code more readable) use round brackets, for example <span style="color:blue">20/(5\*2)</span> to give 2.0. The shorthand for scientific notation is  <span style="color:blue">1.234e5</span>, which means $1.234\times10^5$.

So far we have encountered integer and float data types - another type of literal is a string literal, which is a piece of text that does not constitute any code. You specify a string literal by surrounding it in matched single or double quotation marks, e.g:

In [0]:
"This is a string"

'This is a string'

You can find out the data type of any literal by using the <span style="color:blue">type()</span> command. 

**Exercise 4: run the examples in the following code cells:**

In [0]:
type(1)

int

In [0]:
type(1.0)

float

In [0]:
type("1")

str

In [0]:
type(1+1j)

complex

In [0]:
type([1,2])

list

The cells above illustrate some of the most common data types in Python. For a bit more information on lists and some other data types, have a look at the optional part at the end of the next section.

** Important: ** it is expected that you will cause your system to grind down to a halt when you do calculations with extremely large numbers. Don't worry - this is fine! If your system becomes unresponsive, restart your kernel. Change any offending code cells to markdown so that they won't be executed again, click on the next code cell, and choose 'Cell' --> 'Run all above' to re-run all your coding cells so far.

It is now time to combine what we have learned so far and put it into action. Experiment with operators, brackets, scientific notation, and different data types in the code cell below. From now on, the notebook will only display one code cell when it is time for you to try your coding skills; it is up to you to add as many cells as you need. It is strongly recommended you don't delete any of your code but keep as many examples in different cells as possible so both you and your demonstrator can easily refer back to what you have tried.

** Exercise 5: coding has many quirks that you will get used to with practice. Try and answer the following questions for yourself whilst experimenting with arithmetic operators:**

- **Do all answers make sense?**
- **Can you use numbers as big as you like? Is there an upper limit to the exponent in scientific notation?**
- **Can you use numbers as small as you like? Is there a lower limit to the exponent?**
- **Can numbers be as precise as you like? How many zeros do you need before 1.00000000000000000001 gets truncated? **
- **What happens if you don’t balance your brackets?**
- **What happens if you mix operands, for example add an integer to a float?**
- **Can you apply arithmetic operators to strings? If so, which, and what is the result?**

**Make sure to note down the answers (along with anything interesting or unexpected you find out) in your lab book!**

<div style="background-color: #FFF8C6">
You may have noticed floats do not have unlimited accuracy. This is an important feature of computer programming, not just a bug in Python. To read more about why this happens, have a look at [this tutorial.](https://docs.python.org/3/tutorial/floatingpoint.html) 

## 4. Variables and keywords<a id="variables"></a>

The <span style="color:blue">=</span> operator (sometimes called the assignment operator) allows you to store data in a *variable*. Variables are ubiquitous in computer programming, and are much like variables in maths. For example, if we want to create a variable x which has the value 4 we can simply write <span style="color:blue">x = 4</span>. Note that, unlike in algebra, the operator <span style="color:blue">=</span> is directional: the variable on the left of the <span style="color:blue">=</span> is always assigned the value of what is on the right of the <span style="color:blue">=</span>, not the other way around. You can retrieve a stored value simply by using its name. 

**Exercise 6: run the cells below and see what happens:**

In [0]:
x = 4
print(x)

4


In [0]:
x + 2

6

In [0]:
y = x + 2
print(y)

6


Note that in the code above we used the <span style="color:blue">print()</span> command to print the value of the variables to screen. In Python, you can also simply type the name of a variable to do this. However, this should be used with caution as this only works properly when a cell has only one output. 

**Exercise 7: to illustrate this, try running the  code cells below and pay careful attention to the output of each cell.**

In [0]:
a = 1
b = 2
a

1

In [0]:
a
b

2

In [0]:
print(a)
print(b)

1
2


So far, we have used single letters to name variables. This is usually not good practice - if you would simply assign every variable a letter of the alphabet, it would be very hard to decipher your code at a later date and understand what each variable stands for. It is therefore important to chose your variable names carefully. 

**Exercise 8: below, try assigning names to data provided by literals, or to the results of computation using operators. Try and find out the answers to the following questions:**

- **Can you identify the rules that govern the possible names? Some names to try: my_glorious_variable_3, True, 1value, my favourite value, A#B...**
- **Are the values case sensitive, i.e., is name the same as naMe?**
- **What happens when you give the same name to two different values?**
- **What happens when you give two different names to the same value?**
- **What happens if you store the result of a calculation involving a particular variable as that very variable?**

The reason why <span style="color:blue">A#B</span> didn’t work as a name was that <span style="color:blue">#</span> is Python’s comment character, which means “Take everything after this character until the end of the line and completely ignore it”. Comments are used to annotate source code to make it more human-readable, for example to describe in natural language what a complicated line of code does, to make it easier to understand.

The reason why <span style="color:blue">True</span> didn’t work as a variable name is because it is a Python keyword, one of the few words that has a special meaning to the language. A list of keywords is:
```python 
and        def       for      is      return
as         del       from     lambda  try
assert     elif      global   not     while
break      else      if       or      with
class      except    import   pass    yield
continue   finally   in       raise
```

Note that this list changes with different versions of Python. Jupyter Notebooks helpfully change the colour of a keyword to bold green, so you will immediately notice it if you use a keyword inadvertently. We will cover some of these keywords in this course, but not all of them.

<div style="background-color: #FFF8C6">

### Lists, tuples and dictionaries
It is often inconvenient to assign every piece of data a separate name - often data can be grouped together to make dealing with it easier. Several data types exist in Python that help with this, the most common of which are lists, tuples, and dictionaries.

Lists in Python are defined using square brackets surrounding zero or more comma separated literals: 
```python
some_primes = [2,3,5,7,11,13]
names_of_cats = ["Ginger", "Princess", "Zorxo the Clawful"]
```
Tuples behave very similarly to lists, but are immutable (i.e. they cannot be changed). Tuple literals are created by a writing a sequence of items separated by commas, optionally surrounded by parentheses. To get a tuple with only one element, you need to have a comma after the element.<br>
```python
my_tuple = 1,2,3
my_tuple = (1,2,3)        # equivalent
not_a_tuple = 1           # same as: not_a_tuple=1
a_tuple = 1,
a_tuple = ("first!",)     # here the first and only element of the tuple is "first!".
```

Many aspects of Python are implicit tuples. For instance, the assignment operator = will happily assign tuples of names to tuples of values:<br>
```python
A,B,C = 1,2,3
```
which is the same as:
```python
(A,B,C) = (1,2,3)
```
which is the same as:
```python
A = 1
B = 2
C = 3
```

This behaviour can be easily used to swap the names of data:<br>
```python
A,B = 1,2
A,B = B,A
print(A,B)   # prints 2,1
```


The third most common collection type used in Python is the Dictionary, or dict, which store mappings from keys to values. For every key, there is a value, which can be almost any Python object. Keys are usually strings, but it is possible to use certain other objects as keys. Dictionary literals are written as a comma-separated list of key:value pairs, with a colon separating key from value, surrounded by (curly) braces. Dict items are accessed using the key within square brackets.<br>
```python
student_grades = {"Simon": 60, "Jenny":68, "Laura":112}
student_grades["Laura"] = 100 # Change Laura's grade.
student_grades["Pug"] = 58    # New student!
print(student_grades["Jenny"])
68
```
<br>

# 5. SciPy and arrays<a id="arrays"></a>

Everything we have covered so far has been part of the core Python programming language. However, the core Python programming language does not include many mathematical functions that you might expect to use. So, for example, if you needed to use trigonometric functions such as sin, cos, etc, you would have to write your own code using to implement these. If you wanted to do numerical integration you would have to write the code. If you wanted to display results as plots, you would have to write (quite a lot of) code to do it, and so on. This would be tiresome and very time consuming. Fortunately, there are libraries of code that provide for most of these commonly requirements, and much more!

SciPy is a large collection of open-source libraries and tools brought together to give a powerful high-level environment for mathematical and scientific computing. To be able to use it, you first need to import the package as follows: 

In [0]:
import scipy as sp

<div style="background-color: #FFF8C6">

Note that many Python programmes make use of the NumPy library (often imported as np) instead of SciPy. For many mathematical operations NumPy is enough. SciPy however includes a host of routines that are useful for scientific programmings, such as special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. SciPy includes the NumPy routines, so in this course we will simply import SciPy instead of using NumPy.

Generally you will need to include this line at the top of your code or notebook (and make sure to run the cell). Be careful: when you reopen your notebook at a later time to continue or review your work, you will have to run the cell above again to be able to use SciPy's functions again. A good way of resuming work is to select the "Run All Above" option from the Cell menu, to ensure all previous cells have been executed before you carry on.  

You can now use all of SciPy's routines by calling them by their name preceded by <span style="color:blue">sp.</span> - for example to create an array A comprising the numbers 0 to 5:

In [0]:
A = sp.array([10,20,30,40,50,60,70,80,90,100])

As you can see, an array is a series of objects of the same type (integers in the example above). Each individual object in the array is called an element. You can access individual elements of an aray by specifying the index of the element in the array within square brackets. For example, the cell below first prints the entire array A and subsequently only the element with index 1:

In [0]:
print(A)
print(A[1])

[ 10  20  30  40  50  60  70  80  90 100]
20


When you run the above cell, the second output line might not be what you expected! This is because indices start from 0, so the first element has index 0, the second element index 1, and so forth. You can access any selection of elements from an array using indices; this is called slicing. 

**Exercise 9: look at the following list - can you predict what the result will be before running these statements? Pay careful attention to which elements are included in the slices. **
- <span style="color:blue">A[10]</span>
- <span style="color:blue">A[11]</span>
- <span style="color:blue">A[-1]</span>
- <span style="color:blue">A[1:3]</span>
- <span style="color:blue">A[1:]</span>
- <span style="color:blue">A[:5]</span>
- <span style="color:blue">A[0:6:2]</span>
- <span style="color:blue">A[::2]</span>

**Slicing is a very important concept in programming, so take some time experimenting with this in the cell below (again, add as many cells as you like in the notebook). Can you figure out the rules of slicing? Challenge: use slice notation to reverse array A in one line.**

You can even create 2D (or higher-dimensional) arrays, by nesting several 1D arrays within one array: one for each row. Run the following cell to see an example:

In [0]:
twoDarray=sp.array([[1,2],[10,20],[100,200]])
print(twoDarray)

[[  1   2]
 [ 10  20]
 [100 200]]


We can now access individual cells by taking the slice [row_index,column_index]. For example, the command below will print the element that is in the third row and second column of the 2D array above:

In [0]:
print(twoDarray[2,1])

200


<div style="background-color: #FFF8C6"> Experiment with taking slices out of a 2D array - these can be 2D arrays, 1D arrays, or single elements. Can you take it a step further and create a 3D array (i.e. a data cube) and take slices from it?

<div style="background-color: #00FF00"> Take a moment to discuss the rules of slicing with your demonstrator - and don't forget to note down your findings in your logbook!</div>

The <span style="color:blue">sp.array()</span> method works fine for short arrays, or to input a small number of measurement data points by hand, but would become tedious for creating longer arrays. A quicker way of defining arrays with a fixed increment is using <span style="color:blue">sp.arange()</span> or <span style="color:blue">sp.linspace()</span>. 

**Exercise 10: use Google or the help documentation on these two functions (by running e.g. <span style="color:blue">help(sp.arange)</span>) to find out how they work. Subsequently create an array which includes the numbers 0 - 100 (make sure to store your array in a variable). Next, try and create an array consisting of the numbers 100 - 200 in steps of 0.01. What is the difference between using <span style="color:blue">sp.arange()</span> and <span style="color:blue">sp.linspace()</span>?**

Arrays are very useful data structures, which we will use throughout this course. When you analyse the data you take in lab with Python, you should always store your data in arrays. One reason arrays are so powerful within Python is because you can do arithmetic operations on them. For example, to multiply every element of array A by two, simply run the following:

In [0]:
2*A

array([ 20,  40,  60,  80, 100, 120, 140, 160, 180, 200])

You can even multiply two arrays, for example:

In [0]:
A*A

array([  100,   400,   900,  1600,  2500,  3600,  4900,  6400,  8100, 10000])

**Exercise 11: refer back to section 3 and experiment using the various arithmetic operators on arrays, both with scalar values and arrays. For example, try:**
- <span style="color:blue">A + 10</span>
- <span style="color:blue">A + A</span>
- <span style="color:blue">A**3</span>

**Can you use all arithmetic operators on arrays, both with scalar values and two arrays? What happens if you try and apply an operator to two arrays of different lengths (i.e. a different number of elements)?**

## 6. Statistics with arrays<a id="stats"></a>

Here is a set of values obtained for the wave length of a sound wave.

* 0.76 m
* 0.79 m
* 0.84 m
* 0.75 m
* 0.80 m
* 0.79 m

We can now calculate the mean of these data with a simple set of commands:

In [0]:
x=(0.76,0.79,0.84,0.75,0.80,0.79)
mean_value=sp.mean(x)
print('The mean value is', mean_value)

The mean value is 0.788333333333


Note that you would not simply copy this value into your report - you would need to choose the correct number of significant numbers to quote! 

Similarly, we can calculate the standard deviation of the sample by calling the function <span style="color:blue">sp.std()</span>. This function calculates the sample standard deviation $s$ using the following formula:

$$s^2 = \frac{1}{n-1}\sum_{i=1}^n(x_i-\overline{x})^2$$ 

Here $x_i$ are the individual data points, $\overline{x}$ is the mean value of the data set, and $n$ is the number of data points.

**Exercise 12: below, calculate the standard deviation of our data set using the <span style="color:blue">std()</span> function which is in the SciPy package (i.e. call it by using <span style="color:blue">sp.std()</span>). Once again, try using the inbuilt help or Google to help you on your way. Tip: to ensure you use the *sample* standard deviation, set the keyword <span style="color:blue">ddof=1</span> when you call the <span style="color:blue">sp.std()</span> function. This ensures the denominator in the equation above is set to $n-1$ rather than $n$.**

We have calculated the mean $\overline{x}$ which is an estimate of the true value of the quantity we are measuring.  How accurate is this estimate?  The sample standard deviation $s$ is *not* a measure of the error in this estimate.  We need the standard error of the mean, written as $\sigma_m$, which tells you the accuracy with which the mean of the data points gives the *true* value of the quantity you are measuring. The standard error of the mean reduces with the number of data points taken, and is given by:

$$\sigma_m=\frac{s}{\sqrt{n}}$$

To calculate $\sigma_m$ with Python you need to know how many data points there are in our array which
you can get using the Python function <span style="color:blue">len()</span>. You also need to use the <span style="color:blue">sqrt()</span> function which is in the Scipy package. 

**Exercise 13: calculate the standard error of the mean for data set $x$.**

## 7. Reading and writing files<a id="files"></a>

In the previous section you have seen that Python is a powerful tool for statistics, once you have stored your measurement data in arrays - particularly because you can execute the same block of code to calculate the mean, standard deviation, and standard error of the mean on any data set you take in your different labs. However, you won't usually *record* your data directly in Python. Normally, you will save your data in a file, which you can read in using your Python code.

There are numerous ways in Python to read in data from files, each with their own pros and cons. For now, we will use the <span style="color:blue">loadtxt()</span> function, which is included in the SciPy package. It offers a straightforward way of reading in data sets which consist of columns of measurement points. 

**Exercise 14: to try this out, open the file [Resistivity.txt](Resistivity.txt). Looking at this data file (preferably in your browser rather than Windows Notepad), you will see it includes 3 columns, separated by spaces. The first column is temperature, the second the measured resistivity of copper, and the third that of aluminium. You may recognise these from your Measurements & Uncertainties tutorial if you have already done this! To read in this data file and print the data, run the following cell:**

In [0]:
data=sp.loadtxt("Resistivity.txt")
print(data)

[[  2.00000000e+02   1.12000000e-08   1.76000000e-08]
 [  2.20000000e+02   1.18000000e-08   2.02000000e-08]
 [  2.40000000e+02   1.37000000e-08   2.18000000e-08]
 [  2.60000000e+02   1.52000000e-08   2.30000000e-08]
 [  2.80000000e+02   1.55000000e-08   2.68000000e-08]
 [  3.00000000e+02   1.73000000e-08   2.86000000e-08]
 [  3.20000000e+02   1.92000000e-08   3.05000000e-08]
 [  3.40000000e+02   1.99000000e-08   3.33000000e-08]
 [  3.60000000e+02   2.16000000e-08   3.49000000e-08]]


**You will see that the data has been read into a 2D array. The <span style="color:blue">loadtxt()</span> function incorporates more features: it can also skip header rows and deal with different types of delimiters. As an example, look at the file [Resistivity.csv](Resistivity.csv), which is a csv file of the same data. Note that now the data is delimited by commas, and includes two header rows. Now run the cell below:**

In [0]:
T, R_Cu, R_Al = sp.loadtxt("Resistivity.csv", skiprows=2, delimiter=',', unpack=1)
print(T)
print(R_Cu)
print(R_Al)

We have changed three things: we use the <span style="color:blue">skiprows</span> keyword to skip the first two header rows, we use the <span style="color:blue">delimiter</span> keyword to set the delimiter to comma (a tab would be <span style="color:blue">delimiter='\t'</span>), and we have use the <span style="color:blue">unpack</span> keyword to store each column of the data in a separate array (T, R_Cu, and R_Al in this case).

<div style="background-color: #FFF8C6">

When reading in more complex data, the function <span style="color:blue">genfromtxt()</span> in the SciPy package can be more appropriate. It incorporates more flexibility, such as dealing with missing data.

Conversely, you may want to write data to an output file. To do this, we can use the <span style="color:blue">savetxt()</span> function included in SciPy. Below is a simple example of how to use this. 

**Exercise 15: run this and have a look at the resulting file. Tip: Windows Notepad does not recognize the column delimiters - it is better to open the file in your browser (using File --> Open File ... and choosing the relevant text file).**

In [0]:
sp.savetxt('Test_outputfile1.txt',data)

Note that <span style="color:blue">savetxt()</span> only takes one argument for the data to be printed, so if you want to print various 1D arrays (instead of the single 2D array in the example above), you need to combine them into a single 2D array. To do this successfully, you will need to use the SciPy <span style="color:blue">column_stack()</span> function. 

**Exercise 16: the two examples below illustrate this - extend the code to print out the two arrays and have a look at the result.**

In [0]:
combined_1 = [T,R_Cu,R_Al]
combined_2 = sp.column_stack([T,R_Cu,R_Al])

**Exercise 17: in the cell below, find the mean of the resistivity of copper and subtract it from the copper resistivity data set, so you are left with the residuals. Do the same for aluminium. Now save your new data set (which includes temperature, copper resistivity residuals, and aluminium resistivity residuals) to a file. Make sure to choose a sensible name for your output file! Use the <span style="color:blue">help()</span> function to find out what arguments and keywords the <span style="color:blue">savetxt()</span> function takes. Can you include a header line with the column names and space the columns with tabs?**

<div style="background-color: #FFF8C6">

You may have noticed that the formatting of your file may not be exactly to your liking. For example, the header names might not line up with your data columns, and the data may have too many decimal places. All this can by changing the formatting of the output that is written to file. For example, we can force the number of decimal places in the data by setting the following keyword in the <span style="color:blue">savetxt()</span> function: <span style="color:blue">fmt="%.i %.2e %.2e"</span>.
This ensures the first column is printed as an integer, and the other two columns are printed using scientific notation with two decimal places. If you have time, experiment with changing the format of the data in your output file. 

For a more comprehensive guide on formatting output, have a look at [this tutorial](https://www.python-course.eu/python3_formatted_output.php). Knowing how to format Python output appropriately will come in very handy in the future.

<div style="background-color: #FFF8C6">

A very powerful data package that can handle excel files is pandas. You can import pandas and read your data using 

```python
import pandas as pd
df=pd.read_csv('Resistivity.csv')
```

This stores the data in a data frame, which has many plotting and statistical functions attached to it. For those of you who already have a lot of Python programming experience, it is worth exploring the pandas package further.

## 8. Plotting data<a id="plotting"></a>

When you take measurements in lab, you will want to create a graph of your data. Python has a package that specialises in plotting: <span style="color:blue">matplotlib</span>. To use the plotting routines of this package, we only need to import the <span style="color:blue">pyplot</span> part of the matplotlib package. 

In [0]:
import matplotlib.pyplot as plt
%matplotlib inline

The second line in the cell above is necessary within Jupyter Notebooks to ensure the plots are created within the notebook itself. In most other environments (such as the Spyder IDE which you will use in Session 3) plots will be created in a separate window.

We can now use the <span style="color:blue">plot()</span> function to create a plot of our resistivity data:

In [0]:
plt.plot(T,R_Cu)
plt.show()

In essence the first line in the cell above creates a plot object, and the second line shows the plot (akin to creating a variable and printing it to screen with the the <span style="color:blue">print()</span> command). To be able to use a figure in your report, you will want to save it as an image or pdf file. To do this, instead of the <span style="color:blue">plt.show()</span> command, use the <span style="color:blue">plt.savefig()</span> command.

Looking at the plot above, you may guess that the resistivity increases linearly with time, but there is some scatter in the data. A line plot is therefore not the most suitable graph; instead we want to use a scatter plot. In fact, even if the data did follow a perfect line, we would still want to plot the data points themselves as well - otherwise we would not be able to tell if the graph is the result of two datapoints or two hundred! We can do that by specifying a plotting symbol as a third argument in the <span style="color:blue">plot()</span> function. Below we plot the resistivity for both copper and aluminium using different plotting symbols. We also save the image as a png file in the current working directory (check this for yourself!).

In [0]:
plt.plot(T,R_Cu,'x')
plt.plot(T,R_Al,'+')
plt.savefig("Resistivity_plot.png")

In general, when you create plots they can have three different functions:
1. A quick look at your data or model in the middle of your work to see what it looks like (i.e. work in progress).
2. A graph that you can show to someone else or save for later use.
3. A graph that you can use in a presentation, report, or publication.

For point 1, the above graph we created might suffice - it's a 'quick and dirty' plot that shows your data but doesn't include any information. For point 2, you would need to include at the very least enough information so that someone else (or yourself at a later point) can make sense of it. For example, the plot below includes axis titles and a legend to this purpose.

![Resistivity_plot2](Images\Resistivity_plot2.png)

For point 3, the layout of your graphs is incredibly important: they are the main showcase of your results! In your lab reports, you will need to create figures to a publishable standard (following the IEEE guidelines). In presentations, you may want to create bigger images with oversized fonts and thick markers of contrasting colours so that they will be projected legibly. An example of an improved Resistivity figure for this purpose is below. 

![Resistivity_plot2](Images\Resistivity_plot_improved_final.png)

**Exercise 18: take some time to try and recreate the figure above as exactly as possible - to do this you will need to find out a number of commands for yourself! To get you started, some code is given below to change the size of the plotting symbols, the various text elements, and the figure itself. Experiment with these before adding on the various features that are shown in the example plot. There are many more features you can change if you have time to research them.**

In [0]:
# Plot parameters: experiment with different values
params = {
   'axes.labelsize': 8,
   'font.size': 8,
   'legend.fontsize': 8,
   'xtick.labelsize': 8,
   'ytick.labelsize': 8,
   'figure.figsize': [6, 4]
   } 
plt.rcParams.update(params)

# Try and find out what happens when you vary the value for the mew and ms keyword
plt.plot(T,R_Cu, 'x', mew=1, ms=5, color='red') 
plt.plot(T,R_Al, '+', mew=1, ms=5, color='blue')
plt.savefig("Resistivity_plot_improved.png")


For your lab report, a useful way of proceeding is to first determine the size your figure will be in your report (e.g. width of one column) and then creating a figure of exactly that size in your code, using the same font size and type as the rest of your report. This ensures that you will not need to rescale your figure, which can lead to undesirable results. 

Note that once you have run <span style="color:blue">plt.rcParams.update(params)</span>, your subsequent plots will retain the same parameters. You can reset the plotting style in your notebook by running the following:

In [0]:
plt.style.use("default")
%matplotlib inline

<div style="background-color: #FFF8C6">
The <span style="color:blue">plt.style.use("default")</span> function loads the default style sheet. There are other pre-defined style sheets that change the look of your plots. To list all available style sheets, run: 

```python
print(plt.style.available)
```

Experiment with a few different style sheets. You can even create your own style sheet and load it in this way!

<div style="background-color: #00FF00"> Discuss your figure with a demonstrator. </div>

## 9. Log plots and Histograms<a id="histograms"></a>

Two other types of plots you will need to create frequently are logarithmic plots and histograms. Logarithmic plots are particularly good at highlighting different types of relationships between two parameters. Three examples are:
1. Sound intensity $\beta$ in dB: $$\beta = 10 \log{\frac{I}{I_0}},$$ where $I$ is the intensity in $\rm W\,m^{-2}$ and $I_0 = 1.00\times10^{-12}\rm W\,m^{-2}$ for air.
<p>
2. Radioactive decay: $$N = N_0 e^{-t/\tau},$$ 
<p>where $N$ is the number of particles in the sample, $N_0$ is the initial number of particles, and $\tau$ is the lifetime of the particles.
<p>
3. The period of oscillation $T$ of a pendulum (for small amplitudes): $$T = 2\pi \sqrt{\frac{L}{g}}.$$ Here $L$ is the length of the pendulum and $g$ is the gravitational constant.

Below is the code to create a plot for example 1. 

In [0]:
I_0 = 1e-12
I = sp.linspace(0.1,10,20)# Array with 20 intensity spanning 0.1 - 10 Wm^-2 
beta = 10*sp.log(I/I_0)# The sound intensity in dB
plt.plot(I,beta)
plt.plot(I,beta,'x')
plt.xlabel("Intensity (W/m^2)")
plt.ylabel("Sound intensity beta (dB)")
plt.xscale('log')
plt.show()

When you run the cell above, you will notice that the x-axis is scaled to logarithmically to allow the relationship between $\beta$ and $I$ to be displayed as a straight line. This is done using the <span style="color:blue">xscale('log')</span> command from the pyplot package. The sound intensity is plotted both as a line and as crosses. You will see that the data points are all clustered in the top right of the graph. This is because we have spaced the intensity linearly. It would be better to space the datapoints evenly in log space: to do this we can use the <span style="color:blue">logspace()</span> function that is included in SciPy. 

**Exercise 19: below, recreate the above plot but with data points evenly spaced in log space.**

For examples 2 and 3 we need two different types of logarithmic plots in order to display the relationships as a straight line. 

**Exercise 20: create both plots yourself below.**

<div style="background-color: #FFF8C6">
Often in axis labels you might want to use greek symbols or superscript (e.g. m$^2$). This can be achieved by using LaTeX's math text commands. For a simple example, replace the label commands of the first example plot in this section (sound intensity $\beta$ versus $I$) with:

```python
plt.xlabel(r"Intensity (W/m$^2$)")
plt.ylabel(r"Sound intensity $\beta$ (dB)")

```
Note the insertion of the 'r' in front of the label string, and the dollar signs around the parts that of the string that use math text commands. For a more comprehensive introduction, take a look at [this guide](https://matplotlib.org/users/mathtext.html).

### Histograms
Histograms are frequently used when displaying a set of repeated measurements. With the function matplotlib pyplot package it is straightforward to create a histogram for any array. For the following example, we use the [Dataset.txt](Dataset.txt) file, which includes 20 measurements of the speed of light (in units of $10^8\mathrm{ms}^{-1}$). Below, we create a histogram of the data.

In [0]:
data = sp.loadtxt('Dataset.txt')
plt.ylabel("Number of measurements")
plt.xlabel("Speed (m/s)")
plt.hist(data)
plt.show()

NameError: name 'plt' is not defined

We can immediately see there appears to be one outlier in the data. Again, you may have done this by hand in the Measurement & Uncertainties tutorial (if not yet, then you soon will!) - it should be clear that it is much easier to spot outliers when you plot a histogram than by looking at the individual measurements.

The question is whether to disregard the outlier when we calculate the mean and standard error of the mean. 

**Exercise 21: inspect the data and calculate the mean and standard deviation of the entire sample to get an idea of how far away the point is. Would it be reasonable to assume a mistake was made in this measurement?**

We can use Python to make a new data sample without this data point using the SciPy <span style="color:blue">delete()</span> function (remember that the elements in the array start counting at 0). Note however that we do not 'delete' the data point from our data set altogether - we still record the outlier and keep it in our data file. We use the <span style="color:blue">delete()</span> function to create a new array without the outlier, so we can do further statistics on this sample.

In [0]:
clean_data=sp.delete(data,5)
clean_data

**Exercise 22: recalculate the mean and standard deviation, and display a histogram of the new data sample. Has the result changed?**

When you have got used to calculating the mean, standard deviation, and standard error with Python, you will probably not want to use your calculator soon again to do repeated calculations of this kind!

<div style="background-color: #FFF8C6">

It can be instructive to plot two graphs side by side. Investigate the pyplot <span style="color:blue">subplot()</span> function to create 2 subplots so your histograms are displayed side by side.

<div style="background-color: #00FF00">Show all your plots to a demonstrator. </div>

<div style="background-color: #FFF8C6">

If you have time and have already completed a lab experiment, use Python to analyse your data and create suitable plots.

## Please complete the [Mentimeter Poll](https://www.menti.com/a08152) for this session 