## Fitting and Plotting a Curve in Python

If we just want a simple linear fit, we can use a spreadsheet.  Sometimes, though, we might want to fit a curve to data that is not governed by a linear, polynomial or expontial equation.  

<br> This is the case with the viscosity/ temperature relationship, which is often expressed as an Arrhenius equation.  This equation has the form:

$$\mu = \mu_0 e^{B/T} $$

<br> where B is a constant related to intermolecular energies and $\mu_0$ is a constant that defines a limit. The temperature $T$ must be in Kelvin. 

<br> In this notebook, we'll use this equation to fit a curve to data for the viscosity of water.

### Organizing the Data

Move the data from Moodle to a place on your google drive.  We'll then "mount" that drive to Colab so that you can upload the data.  You'll get a pop-up window that will ask for permission to access your Drive: click "accept" in this window:

In [None]:
from google.colab import drive
drive.mount('/gdrive')



Change the `home` variable to the location of the data file on your Google drive.  If you changed the name of the file, you'll need to change `filename` as well:

In [None]:
import pandas as pd
import numpy as np

home = '/gdrive/MyDrive/Colab Notebooks/ENGR_290/Curve_fitting/'
filename = 'viscosity_test_data'
file = home + filename + ".xlsx"
vis_data = pd.read_excel(file)
vis_data

The data is stored as a DataFrame called `vis_data`.  For ease of use, we'll change the titles of the columns, and put the data for water into a Series.

In [None]:
vis_data.columns = ['water_temp','water_vis','ethenol_temp','ethenol_vis',
                'diethyl_temp','diethyl_vis']
water = pd.Series(data=vis_data.water_vis.values,index=vis_data.water_temp.values)
water.index

### Defining the Arrhenius curve

Here's a function to create a plot of an Arrhenius curve.  `T_array` will be an array of our x-axis values.

<br> There are 4 possible lines to calculate equation, and four possible return lines.  Choose the correct one, and remove the other 3 (either delete them or comment them out):

In [None]:
def arrhenius(mu_0,B,T_array):
    # Choose the best way to calculate the viscosity value
    mu = mu_0 * np.exp(B/(T_array+273.15))

    # Choose the appropriate return line
    return pd.Series(index = T_array,data=mu)


### Using `leastsq` to find the best fit

The shape of our curve is determined by the values of $B$ and $\mu_0$.  We want to find the values for these parameters that fits the data, and we'll use the least squares method to do that.  Python, fortunately, can save us a lot of trouble.

<br> Remember that to use least squares, we need to define an error function: a function that finds the difference between the curve and the data at each point (this is written as $y_i - y_c$ in our class notes).

In [None]:
def error_func(params, data):
    arrh = arrhenius(params[0],params[1],data.index)
    errors = arrh - data
    return errors


We can run this with some estimated parameters:

In [None]:
params = [0.01,1300]
arrh = arrhenius(params[0],params[1],water.index)
#water.plot(ylabel = 'viscosity (cP)', xlabel = 'Temperature (C)',
#           title = 'Viscosity of Water', style = 'o', label='water',legend=True);
arrh.plot()

error_func(params,water)

First, look at this plot and convince yourself that the list of errors matches with the plot.  If that seems in order, run `leastsq`, which is a SciPy algorithm that will minimize the error function we just wrote:

In [None]:
import scipy.optimize as spo
best_params, fit_details = spo.leastsq(error_func, params, water)


Now we'll plot the fitted curve with the known data and the error at each known point:

In [None]:
arrh_best = arrhenius(best_params[0],best_params[1],water.index)
water.plot(ylabel = 'viscosity (cP)', xlabel = 'Temperature (C)',
           title = 'Viscosity of Water', style = 'o', label='Data',legend=True);
arrh_best.plot(label='Fitted Arrhenius', legend =True)
error_func(best_params,water)


Easy-peasy, mac-and-cheesy!  We just found the parameters that give us the best fitted curve.  

✅ ✅ Create a cell below this one to print out these "best_params".

### Finding the standard error of the fit

Now you can do a little work 😀  Look at the equation for the standard error of the fit in the class notes, and calculate that for our model.  Follow each step, and print out your answer for each step to make sure it makes sense.  If you forgot how to do these things, look them up! (i.e. Google "How to square each element of an array python")

In [None]:
# Make an array of y_i - y_c for each known value
error_array = water - arrh_best


# Now square each point in the error_array
sqerr_array = error_array**2

# Find the sum of the squared errors
sum_err = sum(sqerr_array)

# Find nu (assume that the order of the fit m = 2)
nu = len(error_array)-(2+1)

# Divide sum_err by nu and take the square root
s_yx = np.sqrt(sum_err/nu)
s_yx


### Plotting with error bars

The standard error of the fit does *not* tell us the error in each data point.  Instead, it tells us the potential error in the fit: how far off the fit might be.   So we want to draw our error bars not from the data points, but from the curve itself.

<br> We can do this using standard error bars.  Notice that we've only added one keyword argument to this plot command:

In [None]:
water.plot(ylabel = 'viscosity (cP)', xlabel = 'Temperature (C)',
           title = 'Viscosity of Water', style = 'o', label='Data',legend=True);
arrh_best.plot(label='Fitted Arrhenius', legend =True, yerr = s_yx);

But since the error is in the curve, it actually more sense to show a "bounds" for the entire curve.  We can do that using MatPlotLib, which is a powerful tool for plotting in Python (and in fact is the code on which our Series and DataFrame plot() functions are built).

<br> `fill_between` defines an upper and lower bound for the curve, and shades in the region.  `alpha` defines the transparency of the shaded region.

In [None]:
import matplotlib.pyplot as plt
water.plot(ylabel = 'viscosity (cP)', xlabel = 'Temperature (C)',
           title = 'Viscosity of Water', style = 'o', label='Data',legend=True);
arrh_best.plot(label='Fitted Arrhenius', legend =True);
plt.fill_between(arrh_best.index, arrh_best.values - s_yx, arrh_best.values + s_yx,
                 color = 'gray', alpha = 0.3);

So now that you have a model for this type of plot, you can use it to display your own viscosity data.  Yippee!

### Exercise

✅ ✅ Create a plot showing the error bounds for a fitted Arrhenius curve for the Ethenol data.   Notice that you will not need to do a lot of coding to do this: recognize what tools we've already created and use those tools!