In [None]:
# In Python it is standard practice to import the modules we need at the very top of our scripts
import numpy as np
import matplotlib.pyplot as plt

# SciPy

SciPy, like NumPy, is one of many modules that can be used to extend Python's functionality. In this exercise we will explore some of the functions available in SciPy to fit functions to data.

SciPy includes functions for integration, interpolation, signal processing, linear algebra and more. Between NumPy, SciPy and Matplotlib you should be able to accomplish any task you need for the P2 Skills course. The SciPy documentation is identical in style to the NumPy documentation so you should already feel comfortable finding your way around and reading documentation for functions.

## Importing SciPy Submodules

You have probably noticed that to call a function from, for example, NumPy we use a dot, e.g. `np.zeros()`, and sometimes we call a function from a submodule, e.g. `np.random.randint()`. However, a small quirk in SciPy, due to its size, is that submodules have to be loaded individually.

As such, it is unlikely that you will ever just `import scipy as sp` but you will more likely import specific submodules, such as in the cell below where we import just the 'stats' submodule.

In these scenarios we can then access the `sem` function with `stats.sem`, which calculates the standard error of the mean.

In [None]:
from scipy import stats

# Least Squares Fitting of a Function to Data

An important tool in data analysis is the ability to fit a function to raw data in order to extract parameters of a model; likewise, measuring the 'goodness of fit' of a function can be used to test an underlying hypothesis.

There are many ways to do fitting in Python, from writing your own code to using pre-built codes, such as those found in SciPy. In this exercise we will explore how to use linear least squares regression to fit a linear function, $y = mx + c$, to some example data. We will extract and plot the fit residuals, defined as the difference between the true data and the equivalent data modelled by our function, and measure the goodness of fit by the correlation coefficient. We will also briefly touch on the concept of confidence intervals.

For this we will use the `stats.linregress` function. You should start by reading the  SciPy documentation for this function ([here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html#scipy.stats.linregress)). Other alternative are available both in SciPy and in other modules.

For the purposes of this exercise we will use some simple data that we can be fairly sure fits a linear function:

In [None]:
# Example Data
x = np.arange(10)
m = 5
c = 10
y_jittered = m*x + c + 2*np.random.randn(10)  # we include some random 'jitter' for realism

plt.scatter(x,y_jittered)
plt.xlabel('Independent Variable')
plt.ylabel('Dependent Variable')
plt.title('My Example Data')
plt.show()

# Exercise 6: Fitting a Linear Function (3 Marks)

The SciPy `stats.linregress` function does all the work for us. Read the documentation for this function and do the following steps in the cells below.

1. Determine the linear least-squares regression for this dataset. What value do you expect for the slope and intercept? Print your expected and actual slope and intercept.
2. On the same axes, plot the raw data, the linear function of best-fit and the line corresponding to the theoretical data. Save this figure as `linear_fitting.png`.
3. Use your search engine of choice, or Wikipedia, to investigate the 'Coefficient of Determination' or 'r-squared' value. In a Markdown cell describe what the r-squared value represents in a linear least-squares regression. In a Python cell, print the r-squared value for this regression.
4. In a new Python cell, recreate this plot for different amounts of jitter. How does r-squared vary with the amount of noise?

In [None]:
# Put your answer to question 1 here



In [None]:
# Put your answer to question 2 here



## What $r^2$ Represents

Write your answer here



In [None]:
# Put your answer to question 3 here



In [None]:
# Put your answer to question 4 here



**Comment on the outcomes of question 4**

