![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&urlpath=notebooks/curriculum-notebooks/Mathematics/CurveFitting/curve-fitting.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>

# Curve Fitting with Python 

Curve fitting involves finding a line or curve which best passes through data. By fitting functions through data, we are able to [extrapolate](https://en.wikipedia.org/wiki/Extrapolation) and [interpolate](https://en.wikipedia.org/wiki/Interpolation) new values that may not have been measured.

Let's take a look at how we can do that in Python. First we will import some code libraries.

In [None]:
# Numerical python package to allow us to do math quickly 
import numpy as np
# Ploting library 
import matplotlib.pyplot as plt
%matplotlib inline
# SciPy package which allows us to fit curves to data
from scipy.optimize import curve_fit
# A plotting function we'll use later
def plot_fit(func, fit_params, err_params, func_type, x, y):
    f = plt.figure(figsize = (12,8))
    ax = f.add_subplot(111)
    ax.scatter(x,y, label = "data")
    ax.plot(x, func(x, *fit_params), label = "fit")
    plt_string = "Best Fit Parameters:\n "
    for i in range(len(fit_params)):
        plt_string += str(i+1) + ": %+.3f $\pm$ %.3f \n " % (fit_params[i], err_params[i])
    plt.text(.65, 0.1,plt_string,
         horizontalalignment='left',
         verticalalignment='center',
         transform = ax.transAxes, fontsize = 16)
    ax.set_xlabel("$x$", size = 20)
    ax.set_ylabel("$y$", size =20)
    ax.legend(prop={'size': 20})
    try:
        plt.title(func_type,size = 20)
    except:
        pass
    plt.show()
print('Libraries imported and plot_fit defined')

To begin our function fitting we first need to have some data to fit. As this is simply a tutorial, let's just generate some data in order to test our fitting functions. This way we'll _know_ exactly what parameters our curve fitting functions should find. In this case, we'll define a linear equation:

$$ y = m \; x + b $$

Then generate some $x$ points for our data. To do this we will use the `np.linspace` function which creates an equaly spaced set of data. Here we're creating 15 equally spaced numbers from 0 to 10.

In [None]:
def linear_function(x, m, b):
    return m * x + b

x_data = np.linspace(0, 10, 15)
print(x_data)

Now that we have our $x$ values, let's _generate_ a set of $y$ values using our Python function called `linear_function`.

In [None]:
y_data = linear_function(x = x_data, m = 0.5, b = -1)
print(y_data)

Where we can test to see if this worked by plotting our data below

In [None]:
plt.scatter(x_data, y_data)
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("y vs x")
plt.show()

That's fantastic! We've generated a set of linear points. However, those points are perfectly in line! We won't usually see that in an actual measurement scenario. Let's add some noise to our $y$ points. Here we're adding normally distributed noise.

In [None]:
noise_strength = 0.1
y_noise = noise_strength*np.random.normal(size=x_data.size)
y_data = y_data + y_noise
plt.scatter(x_data, y_data)
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("noisy y vs x")
plt.show()

There we go! Now that's a little more realistic for function fitting. Now that we've generated some data, let's see how we fit a function using Python.

### The `curve_fit` Function
We will be using the `curve_fit` function [from SciPy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html) that we imported earlier. It takes the following data:
1. A Python function which describes the function we'd like to fit, in this example `linear_function(x,m,b)`
2. A set of $x$ data points
3. A set of $y$ data points 
    * We need to have the same number of data points in $x$ and $y$
    
Once we call this function, `curve_fit` will return two things:
1. `values`: A list of best fit parameters, in our case, it will return a list of `[m, b]`
2. `fit_quality`: An estimate of the variance associated with our fitted function, and the data we used to generate it. These values will give us uncertainty surrounding our best fit parameters

Let's take a look at that function in action, here we're fitting a curve to `linear_function` using `x_data` and `y_data`.

In [None]:
values, fit_quality  = curve_fit(linear_function, x_data, y_data)
fit_quality = np.sqrt(np.diag(fit_quality))

print("Slope:", values[0], "with uncertainty:", fit_quality[0])
print("y-intercept:", values[1], "with uncertainty:", fit_quality[1])

That's pretty good!

With an original slope of $m = 0.5$, our fitted slope came out to be $m^\prime = 0.5 \pm 0.05$. Our original intercept of $b=-1$ was fitted to be $b^\prime = -1.03 \pm 0.06$ (Note: Your values may be  different because the noise added is different every time.)

Let's take a look at what the plot looks like, using the `plot_fit` function we defined earlier.

In [None]:
plot_fit(linear_function, values, fit_quality, "$y = mx + b$", x_data, y_data)

And that's all thre is to it! Play around with the amount of noise you add to the function before fitting it. How does that affect your estimates for best fit parameters as well as your estimates for uncertainty? 

# Fitting Non-Linear Functions

Fitting non-linear functions is similar. Let's take a look at how we can fit data to a parabolic function of the form 

$$ y = a\;(x + b)^2 + c $$ 

Note we'll move a little faster this time and add our noise in the same plot!

In [None]:
def quadratic(x, a, b, c):
    return a * (x + b)**2 + c

x_quad = np.linspace(-20, 20, 40)
y_quad = quadratic(x_quad, .5, 5, 1)

noise_strength = 5
y_noise = noise_strength * np.random.normal(size=x_quad.size)

y_quad = y_quad + y_noise

plt.scatter(x_quad, y_quad)
plt.xlabel("$x$", size =15)
plt.ylabel("$y$", size = 15)
plt.show()

Now we fit it just like we did before.

In [None]:
values_q, fit_quality_q = curve_fit(quadratic, x_quad, y_quad)
fit_quality_q = np.sqrt(np.diag(fit_quality_q))

print("Values for a, b and c:", values_q)
print("Uncertainty for a, b and c:", fit_quality_q)

In [None]:
plot_fit(quadratic, values_q, fit_quality_q, "$y = a (x + b)^2 + c$", x_quad, y_quad)

How well did that fit work? Do the values we recovered from the curve fitting parameters line up with the parameters we used to generate the data set? 


## Fitting More Non-Linear Functions

We can also fit even _more_ non linear functions. For example, let's generate and then fit some data generated by a normal distribution defined by

$$ y = \frac{1}{\sqrt{2 \pi \sigma^2}} \; \exp \left({-\frac{(x -\mu)^2}{2 \sigma^2} }\right)$$

In [None]:
def normal_function(x, sigma, mu):
    return 1/(np.sqrt(2.0*np.pi*sigma**2))*np.exp(-(x-mu)**2/(2.0*sigma**2))

x = np.linspace(-5, 5, 50)
y = normal_function(x, 1, 0)
y_noise = .01*np.random.normal(size=x.size)
y = y + y_noise
plt.plot(x, y)
plt.xlabel("$x$", size =15)
plt.ylabel("$y$", size = 15)
plt.show()

In [None]:
values, fit_quality = curve_fit(normal_function, x, y)
fit_quality = np.sqrt(np.diag(fit_quality))
print("Sigma and mu values:", values)
print("Uncertainty in sigma and mu:", fit_quality)
plot_fit(normal_function, values, fit_quality, "Normal Distribution", x, y)

# Conclusion

This notebook provided an introduction to curve fitting in Python using the `SciPy` library, and visualizing the curve fits using `matplotlib`.

Next you can try [curve fitting with real-world data](./curve-fitting-data.ipynb).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)