![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)

# Curve Fitting in Python 

Curve fitting is the buisiness of slapping a line (or curve) which best passes through data. By fitting functions through data, we are able to use these fitted functions in order to extrapolate and interpolate new values that we may not have measured. Let's take a look at how we can do that in Python. To begin, we import all the functions we shall need. 

In [None]:
# Numerical python package to allow us to do math quickly 
import numpy as np
# Ploting library 
import matplotlib.pyplot as plt
%matplotlib inline
# SciPy - package which allows us to fit curves to data
from scipy.optimize import curve_fit
# Helper function to make graphs without clutterin up this notebook
from plotter import plot_fit

To begin our function fitting - we first need to have some data in order to fit! As this is simply a tutorial, let's just generate some data in order to test our fitting functions. This way we'll _know_ exactly what parameters our curve fitting functions should find. In this case, we define a linear equation of form

$$ f(x) = m \; x + b $$

in Python below

In [None]:
def linear_function(x, m, b):
    return m * x + b

Fantastic! Now, let's generate some $x$ points for our data. To do this we will use the `np.linspace` function which creates an equaly spaced set of data. We do this below

In [None]:
# Here, we're creating 15 equally spaced numbers from 0 to 10
x_data = np.linspace(0,10,15)
print(x_data)

Now that we have our $x$ values, let's _generate_ a set of $y$ values using our `linear_function` Python function

In [None]:
y_data = linear_function(x =x_data, m = 0.5, b=-1 )
print(y_data)

Where we can test to see if this worked by plotting our data below

In [None]:
plt.scatter(x_data, y_data)
plt.xlabel("$x$", size =15)
plt.ylabel("$y$", size = 15)
plt.show()

Where that's fantastic! We've generated a set of linear points. However, thos points are perfectly in line! We'll certainly never see that in an actual measurement scenario. Let's add some noise to our $y$ points. 

In [None]:
# here we're adding normally distrubuted noise to our distributions
noise_strength = 0.1
y_noise = noise_strength*np.random.normal(size=x_data.size)
y_data = y_data + y_noise
plt.scatter(x_data, y_data)
plt.xlabel("$x$", size =15)
plt.ylabel("$y$", size = 15)
plt.show()

There we go! Now that's a little more realistic for function fitting. Now that we've generated some data, let's see how we fit a function using Python.

## The `curve_fit` Function
We will be using the `curve_fit` function that we imported earlier. Curve fit takes the following data:
1. A Python function which describes the function we'd like to fit, in this example `linear_function(x,m,b)`
2. A set of $x$ data points
3. A set of $y$ data points 
    * We need to have the same number of points in $x$ and $y$
    
Once we call this function, `curve_fit` will return two things:
1. `values`: A list of best fit parameters, in our case, it will return a list of `[m, b]`
2. `fit_quality`: An estimate of the variance associated with our fitted function, and the data we used to generate it. These values will give us uncertainty surrounding our best fit parameters

Let's take a look at that function in action


In [None]:
# Here we're fitting a curve to linear_function using x_data and y_data
values, fit_quality  = curve_fit(linear_function, x_data, y_data)
fit_quality = np.sqrt(np.diag(fit_quality))

print("Our slope and intercept are:", values)
print("With uncertainty", fit_quality)

Where that's pretty good! With an original slope of $m = 0.5$, our fitted slope $m^\prime$ came out to be $m^\prime = 0.5 \pm 0.05$, and our original intercept of $b=-1$ was fitted to be $b^\prime = -1.08 \pm 0.09$ (Note: Your values may be  different because the noise added is different every time!). Let's take a look at what the plot looks like

In [None]:

plot_fit(linear_function, values, fit_quality, "$f(x) = mx + b$",x_data,y_data)

And that's all thre is to it! Play around with the amount of noise you add to the function before fitting it. How does that affect your estimates for best fit parameters as well as your estimates for uncertainty? 

# Fitting Non-linear functions

Fitting non-linear functions to data with python is exactly the same as linear functions! Let's take a look at how we can fit data to a parabolic function of the form 

$$ f(x) = a\;(x + b)^2 + c $$ 

in the cell below. Note we'll move a little faster this time and add our noise in the same plot!

In [None]:
def quadratic(x, a, b, c):
    return a * (x + b) **2 + c

x_quad = np.linspace(-20,20, 40)
y_quad = quadratic(x_quad, .5, 5, 1)

noise_strength = 5
y_noise = noise_strength * np.random.normal(size=x_quad.size)

y_quad = y_quad + y_noise

plt.scatter(x_quad, y_quad)
plt.xlabel("$x$", size =15)
plt.ylabel("$y$", size = 15)
plt.show()


And now we fit it exaclly as we have before

In [None]:
values_q, fit_quality_q = curve_fit(quadratic, x_quad, y_quad)
fit_quality_q = np.sqrt(np.diag(fit_quality_q))

print("Values for a, b and c:", values_q)
print("Uncertainty for a, b and c:", fit_quality_q)

In [None]:
plot_fit(quadratic, values_q, fit_quality_q, "$f(x) = a (x + b)^2 + c$", x_quad, y_quad)

How well did that fit work? Do the values we recovered from the curve fitting parameters line up with the parameters we used to generate the data set? 


## Fitting More Non-Linear Functions

We can also fit even _more_ non linear functions. For example, let's generate and then fit some data generated by a normal distribution defined by

$$ f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \; \exp \left({-\frac{(x -\mu)^2}{2 \sigma^2} }\right)$$

In [None]:
def normal_function(x, sigma, mu):
    return 1/(np.sqrt(2.0 * np.pi * sigma **2 ))* np.exp(- (x - mu)**2/(2.0 * sigma**2))

x = np.linspace(-5,5,50)
y = normal_function(x, 1,0)
y_noise =  .01 *np.random.normal(size=x.size)
y = y + y_noise
plt.plot(x,y)

In [None]:
values, fit_quality  = curve_fit(normal_function, x, y)
fit_quality = np.sqrt(np.diag(fit_quality))
print("Sigma and mu values:", values)
print("Uncertainty in sigma and mu:", fit_quality)

In [None]:
plot_fit(normal_function, values, fit_quality, "Normal Distrubtion",x,y)

Play around with different amounts of noise and different functions to get a feel for how it works. 

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)