<img style="float: right;" src="http://www2.le.ac.uk/liscb1.jpg">
Scipy is the first place to look for general-purpose scientific functionality.  The Scipy library is enormous and varied, so covering all the features is a course unto itself.  Having said that, one of the most commonly used features in scipy is it's fitting routines, one if which is demonstrated below.

## Curve fitting and optimization  

Scipy comes with a number of fitting routines.  One that can be extremely useful is `curve_fit`, which fits a function to a given set of data using a least-squares minimization.  Here, we'll fit some (made up) data to the Michaelis-Menten equation:  

$$V = \frac{V_{max}[S]}{K_m+[S]}$$  

First, we define the Michaelis-Menton equation:

In [None]:
def michaelis_menton(s, km, vmax):
    return (vmax*s) / (km + s)

Now we pick some substrate concentrations we (would) do our measurements at:

In [None]:
import numpy as np

In [None]:
substrate_concentrations = np.array([0.01, 0.1, 0.2, 0.5, 0.8, 1.5, 3])

For convenience when plotting, we'll create a numpy array of 1000 values between 0 and 3 for our x-axis:

In [None]:
substrate_concentration_range = np.linspace(0, 3, 1000)

Ok, lets make sure this looks right:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
km = 0.1
vmax = 0.5

initial_velocities = michaelis_menton(s=substrate_concentrations, km=km, vmax=vmax)
mm_curve = michaelis_menton(substrate_concentration_range, km, vmax)


# Plot all the things!
plt.scatter(substrate_concentrations, initial_velocities)
plt.plot(substrate_concentration_range, mm_curve)

plt.ylabel('Velocity')
plt.xlabel('initial [S]')

# In M-M kinetics, the Km is the substrate concentration where you've reached half-max rate
plt.vlines(km, ymin=0, ymax=vmax/2, linestyle='dashed')
plt.hlines(vmax/2, xmin=0, xmax=km, linestyle='dashed')

plt.xlim(xmin=0, xmax=3)
plt.ylim(ymin=0, ymax=0.6)

Ok, so lets now simulate some noisy data (with Normally distributed noise):

In [None]:
number_of_concentrations = len(substrate_concentrations)
ten_percent_noise = (np.random.normal(scale=0.1, size=number_of_concentrations)) + 1

simulated_data = michaelis_menton(substrate_concentrations, km, vmax,) * ten_percent_noise

ok, so how does our simulation look?

In [None]:
plt.plot(substrate_concentration_range, mm_curve, label='ground truth')
plt.scatter(substrate_concentrations, simulated_data, label='simulated data', color='green')
plt.legend()

Now we can try to fit our original equation.  In order to do this, we need to provide a guess for the parameters that the algorithm can start from.  For something as simple as the M-M equation, even quite bad guesses will do.

In [None]:
initial_guess = (100, simulated_data[-1])  # Km and Vmax.  Note the Km is a truly horrible guess, given our data

In addition, we can provide the algorithm with bounds - regions of allowed values for the parameters.  In this case, it's not necissary, but it's always good to have a sanity check (in this case, both parameters must be positive.)

In [None]:
lower_bounds = (0, 0)  # Km, Vmax
upper_bounds = (np.inf, np.inf)  # Km, Vmax

In [None]:
from scipy.optimize import curve_fit

initial_guess = (100, simulated_data[-1])  # Km and Vmax.  Note the Km is a terrible guess, given our data

fitted, covariance = curve_fit(f=michaelis_menton,
                               xdata=substrate_concentrations,
                               ydata=simulated_data,
                               p0=initial_guess,
                               bounds=(lower_bounds, upper_bounds)
                              )
print('Km:', fitted[0])
print('Vmax:',fitted[1])

Remember, always look at your data as much as possible!

In [None]:
calculated_curve = michaelis_menton(substrate_concentration_range, fitted[0], fitted[1])

plt.plot(substrate_concentration_range, mm_curve, label='ground truth')
plt.scatter(substrate_concentrations, simulated_data, label='simulated data', color='green')
plt.plot(substrate_concentration_range, calculated_curve, label='fitted')

plt.legend()

### Exercise 1:  Compute the error
Compute the error of the fit.  
*Hint: Read the docstring of curve_fit using `.?`, and look at what the function returns*

__Bonus:__
Remake the plot, but shade the error bounds.  
1. The maximum positive error is when you subtract the standard deviation from the Km and add it to the Vmax
2. Calculate the curve for the maximum and minimum errors as we did above
3. Ask google how to fill between curves in matplotlib
4. Try setting `alpha=0.2` in the plotting function.