## Assessment: Least squares analysis

In the least squares analysis method, a linear function *y = mx + c* is fitted to a series of data points $x_i\,y_i$.

The gradient *m* and the intercept *c* are calculated using

$$ m = \large{\large{\frac{S_{x}S_{y}-NS_{xy}}{S^{2}_{x}-NS_{xx}}}}$$ 

$$ c = \large{\large{\frac{S_{y}-mS_{x}}{N}}} $$

where *N* is the number of data points and

$$ \large{\large{S_{x}=\sum_{n=1}^{N}x_{i} }} $$ 

$$ \large{\large{S_{y}=\sum_{n=1}^{N}y_{i} }} $$

$$ \large{\large{S_{xy}=\sum_{n=1}^{N}x_{i}y_{i} }}$$

$$ \large{\large{S_{xx}=\sum_{n=1}^{N}x_{i}^{2} }}$$

You are to write a program that performs a least squares analysis on the $x_i\,y_i$  data points stored in the file [PopulationData.csv](PopulationData.csv) (click on the link to open the file).

First read the $x_i\,y_i$ data points into a numpy array. Note that the population data is recorded in 'thousands'. We will wish to present all our population results in units of 'billions'.

## Task 1: Create fit function [20%]
You should pass your data structure to this function called LSfit.

This will calculate and return the values of fit parameters:
- *m* (gradient) 
- *c* (intercept) 
using the equations above.

## Task 2: Create residuals function [20%]

This function called **LSresid** takes three inputs:

1. your data structure
2. *m* (gradient) 
3. *c* (intercept) 

and for each data point, calculates:

- the fitted value, $y_i^{fit}=mx_i+ c$
- the error, $e_i= y_i - y_i^{fit}$

The function should return all this information in a single numy array with 4 columns:
1. $x_i$
2. $y_i$
3. $y_i^{fit}$
4. $e_i$

## Task 3: Create a function to print results [20%]

This funciton called **LSprint** prints out, to the screen:
- the fit parameter $m$ in units of million per year
- a table of the individual data points, the fitted value and the % error

The rows in the table should be formatted **exactly** as follows:

    ---------------------------------
     Year Pop.[Bn]  Fit[Bn]   Error % 
    ---------------------------------
     1950   2.54     2.26      12.4
     1951   2.58     2.33      10.7
     1952   2.63     2.41       9.1
     …etc
     2014   7.30     7.20       1.4
     2015   7.38     7.28       1.4
    ---------------------------------
    
Note that the desired units of population in the table are 'billions' abbreviated as 'Bn'.

## Task 4: Create a function to visualise results [30%]
This function called **LSplot** takes, as input:
- your data structure
- values of *m* and *c* calculated in **LSfit**

The function creates two subplots:

- subplot a:
    - the population data as red triangles
    - the fitted line shown in green 
    - legend shown
    
- subplot b:
    - the fit residuals versus $x_i$ 
    - as a dashed black line
    - no legend needed

- Saves the plot to a graphical format suitable for publishing 