# Data Analysis

# 1   Revision and some new commands

In this section, before moving on to fitting functions, you will revise some of the arrays and plotting instructions you learnt in Python 3 as well as learn a few new commands. Remember that an array is an object that is part of the `numpy` package (see Python 3, Section 1) and that the plotting instructions are part of the module `pyplot` contained in the package, `matplotlib` (see Python 3 Notebook 2, Section 2.1).

## 1.1  Revision: operating with arrays

First, let’s revise some of the things you learnt about how to create and operate with arrays in Python 3.

### Exercise 1.1

Write a program that creates two arrays with the following numbers:

x = (0.053, 0.042, 0.029, 0.025, 0.017, 0.010, 0.008, 0.002)

F = (3.55, 2.96, 2.31, 2.01, 1.50, 1.02, 0.697, 0.226)

and use the instructions you learnt in Python 3 to print the following:

1. the sum of all the elements in array `F`
2. the number of elements in array `x`
3. the ratio of arrays `F` and `x`.

Once you've answered the exercise, click on the <u>**+ 1 cell hidden** </u> button below to to see a possible solution.

In [None]:
# Example program that defines  two arrays and prints:
# (i) the sum of all the elements in one array, (ii) 
# the number of elements in one array and (iii) the 
# ratio of both arrays.
import numpy as np

# Create the arrays
x = np.array([0.053,0.042,0.029,0.025,0.017,0.010,0.008,0.002],float)
F = np.array([7.05,5.93,4.08,4.01,2.83,2.05,1.393,0.452],float)

# Print the sum of all elements in array F
print ('The sum of all elements in array F is:',sum(F))

# Print the length of array x
print ('The number of elements in array x is:',len(x))

# Print the ratio of both arrays:
print ('The array resulting from the ratio of the two arrays is:',F/x)

### &nbsp;

In [None]:
# Write your python code here.




## 1.2  Plotting: the appearance of the plot

In Python 3 you learnt how to plot a set of points: remember that if `x` and `y` are arrays, you can use the instruction

<code>plt.plot(x,y)</code>

This will plot a line joining all the pairs of points that will be smooth or jagged depending on, among other things, how many points you are plotting. You may, however, want to plot the data points instead of a line because you want to be able to ‘read’ the values that you have actually provided or calculated. If that is the case, you need to amend the instruction above and include additional information that indicates what style and colour the symbols should be. For example, if I want to plot a set of red squares, I will write the following.

In [None]:
# Example program that plots a set of points using 
# symbols instead of a line

import numpy as np
import matplotlib.pyplot as plt

# Create two arrays
x_values = np.array([1,2,3,4,5,6,7],float)
y_values = np.array([1.9,2.1,3.6,4.2,4.9,5.7,6.4],float)

# Plot the points
plt.plot(x_values,y_values,'sr')

Here in '`sr`', the ‘`s`’ indicates I want squares to be used and ‘`r`’ that I want them to be red. The order in which these are written is not important. There are many types of symbols that can be used (if you search on the web for `matplotlib` markers, you will find a comprehensive list). Some of them are:

o : a circle

^ : a triangle pointing up

< and > : triangles pointing left and right respectively

<code>*</code> : a star

D : a diamond.

In terms of colours, you can use:

b : blue

g : green

r : red

c : cyan

m : magenta

y : yellow

k : black

w : white.

### Exercise 1.2

Write a program that:

<ol>
   <li> plots the points with coordinates (x,F) from the sets in Exercise 1.1 </li> 
    <ul> 
 <li style="list-style-type: none;"> x = (0.053, 0.042, 0.029, 0.025, 0.017, 0.010, 0.008, 0.002),</li>
  <li style="list-style-type: none;"> F = (7.05, 5.93, 4.08, 4.01, 2.83, 2.05, 1.393, 0.452)</li>
   </ul> 
  <li> plots a straight line with gradient = 66 and intercept = 0.20.</li> 
</ol>

Once you've answered the exercise, click on the <u>**+ 1 cell hidden** </u> button below to to see a possible solution.

In [None]:
# Example program that plots a set 
# of points and a linear function

import numpy as np
import matplotlib.pyplot as plt

# Create the arrays
x = np.array([0.053,0.042,0.029,0.025,0.017,0.010,0.008,0.002],float)
F = np.array([7.05,5.93,4.08,4.01,2.83,2.05,1.393,0.452],float)

# Define the values of the gradient and the intercept
gradient = 134
intercept = 0.3

# Plot the points
plt.plot(x,F,'sr')
plt.plot(x,x*gradient+intercept)

### &nbsp;
:::{hint} Hint
:class: dropdown
You can use the x-coordinate values provided in 1 to plot the line.
:::

In [None]:
# Write your python code here.




## 1.3   Saving a plot to a file

Something especially useful is that you can save the plots that you produce to a file. You can then download this file and incorporate it in a document, for example, a TMA.

To do this, you should use: `savefig(file name)`. The file name should be in between " " and include an extension to indicate the format in which the figure should be saved. For example,  to save the figure in PNG format, the file name should be "name.png". To save in JPEG format, "name.jpeg" and in PDF, "name.pdf". 

The file to which the figure is saved will appear on the menu on the left where the Notebooks for this week are listed. Note that the figure will also appear in the Notebook. To download the file, right-click on its name in the menu and then select Download from the menu that appears.

In [None]:
# Example program that plots a set of points using matplotlib
# and saves the figure in a file in PNG format

import numpy as np
import matplotlib.pyplot as plt

# Create two arrays
x_values = np.array([1,2,3,4,5,6,7,8],float)
y_values = np.array([0.3,1.2,2.7,3.6,4.2,4.6,5.2,7.4],float)

# Plot the points
plt.plot(x_values,y_values,'>b')

plt.savefig("figure.png")

## 1.4   A few more plotting commands

A couple of instructions that  are very useful are `plt.xlabel` and `plt.ylabel`. These allow you to include axis labels in your plots and the example below shows you how to do this. 

In [None]:
# Example program that includes axis labels in a plot

import numpy as np
import matplotlib.pyplot as plt

# Create two arrays
x_values = np.array([1,2,3,4,5,6,7],float)
y_values = np.array([1.9,2.1,3.6,4.2,4.9,5.7,6.4],float)

# PLot the points
plt.plot(x_values,y_values,'sr')
plt.xlabel('time / seconds')
plt.ylabel('position / metres')

## 1.5   Another way of inputting data: reading from a csv file

So far, whenever you’ve needed to use some data (a set of numbers) in a program, you have written a list or an array containing it. However, there may be situations where you are provided with a file containing the required data. You don’t want to retype the data in your program! You want to be able to read it directly from the file. 

A lot of data is provided as Comma-Separated Values, or csv.   Fortunately, Python has a module, called `csv` for reading and operating with csv files.  You will notice that, in addition to the notebooks for this week, the folder <code>Python 4</code> in the OCL  contains a file named <code>Extension_force.csv</code>. The example below shows how to read this file and store the two columns of data in it into different lists.

In [None]:
# Example of how to read the content of a csv file.

import csv                                  # import the csv module. Since the name is short, 
                                            # no need to give it a 'nickname' using as 

# Create empty lists to store values 
extension = []
force = []

with open('Extension_force.csv', mode='r') as input_file: # open CSV file from which data will be read
  data_of_extension = csv.DictReader(input_file)       # read and store data

  for i_row in data_of_extension:            # Iterating over each row to create two lists
    extension.append(i_row['Extension'])     # Append value in column labelled Extension to list extension
    force.append(i_row['Force'])             # Append value in column labelled Force to list force

# Print each column
print("Extension:", extension)
print("Force:", force)

Here, I have used the instruction `csv.DictReader` to store the values in the file in a data type called a dictionary. The details of why dictionaries are useful are not important for this example, but you may use this data type in the future. Having read the data in the file into a dictionary, I can then use the instruction `i_row[  ]` to create a  list for each of the columns in the file. I do this by reading each of the rows in the file using the instruction <code>for i_row in data_of_extension</code>. I then save the values from  the column labelled Extension in the file by appending it to the list `extension`. Similarly, I save the values from the  column labelled Force  by appending it to the list `force`. (Things are a bit more complicated if the columns are not labelled, but you won't encounter a case like this in SM123). 
Note again that the block which saves  the values from `data_of_extension` is indented because it must be inside the `with` block to work and so is the one that splits  `data_of_extension`  into two lists

### Exercise 1.3

Write a program that:

<ol>
   <li> plots the points with coordinates (extension,force) from the file <code>Extension_force.csv</code> </li> 
    <li> prints the result of dividing force by extension for every data pair in the input file</li>
   <li> saves the plot into a file with an appropiate name</li>  
</ol>
Ensure the axes in your plot are correctly labelled. 

Once you've answered the exercise, click on the <u>**+ 1 cell hidden** </u> button below to to see a possible solution.

In [None]:
# Example of how to read the content of a csv file, store it, operate with the data 
# and then plot it

import csv                                        # import the csv module
import numpy as np                                # import numpy
import matplotlib.pyplot as plt                   # import pyplot

# Create empty lists to store values
extension = []
force = []

with open('Extension_force.csv', mode='r') as input_file: # open CSV file from which data will be read
    data_of_extension = csv.DictReader(input_file)       # read and store data
    
# Iterating over each row
    for i_row in data_of_extension:              # Read information from each row in data_of_extension
        extension.append(i_row['Extension'])     # Append value in column labelled Extensionto list extension
        force.append(i_row['Force'])             # Append value in column labelled Force to list force

# Create arrays with the force and extension data to simplify the calculation of the ratio
xarray=np.array(extension,float)
Farray=np.array(force,float)
# Print the ratio force/extension
print("force/extension:", Farray/xarray)

#Plot the points
plt.plot(extension,force,'sr')                # Here, I could have also used plt.plot(xarray,Farray,'sr')  
plt.xlabel('Extension / m')
plt.ylabel('Force / N')
plt.savefig("Force_and_extension_plot.png")

### &nbsp;
You will see that in the suggested solution above I've done something that I previously said would not work: when plotting, I used the lists I created rather than the arrays. This works because when a list contains numbers, python will *in some specific cases* treat it as an array. This will not work if you try to divide force by extension by using the lists instead of the arrays (you can check this!). We recommend you follow best practice and always convert lists into arrays when you are going to use them for plotting, as input into built-in functions, etc.

**In this notebook you revised some commands related to arrays and learnt a few new ones that allow  better plots to be generated.  You also learnt  how to save a plot into a file and how to read data from a file to use it in a program. You should now  return to the VLE to read about fitting a straight line.**