Non-Linear Curve Fitting, Part 1
=========================

<div class="overview-this-is-a-title overview">
<p class="overview-title">Overview</p>
<p>Questions</p>
    <ul>
        <li>How can I analyze enzyme kinetics data in Python?</li>
        <li>What is the process for non-linear least squares curve fitting in Python?</li>
    </ul>
<p>Objectives:</p>
    <ul>
        <li> Create a pandas dataframe with enzyme kinetics data from a .csv file</li>
        <li> Add velocity calculations to the dataframe</li>
        <li> Perform the non-linear regression calculations</li>
    </ul>
</div>

In [35]:
# import the libraries we need
import os # to create a filehandle for the .csv file
import pandas as pd # for importing the .csv file and creating a dataframe
import numpy as np # for calculations and datatyping. ***Jessica - is this necessary?***
from scipy import stats # for performing non-linear regression

In [2]:
pwd

'/Users/pac8612/Desktop/python-scripting-biochemistry/biochemist-python/chapters'

In [3]:
cd ../..
pwd

SyntaxError: invalid syntax (<ipython-input-3-4b2397b7e8da>, line 1)

In [4]:
cd ../..  # Move to the standard python-scripting-biochemistry directory as starting point

/Users/pac8612/Desktop/python-scripting-biochemistry


In [6]:
datafile = os.path.join('biochemist-python', 'chapters', 'data', 'AP_kinetics.csv') # filehandle created
print(datafile)  # filehandle confirmed

biochemist-python/chapters/data/AP_kinetics.csv


In [9]:
AP_kinetics_df = pd.read_csv(datafile)  # Use pandas to create a dataframe of the alkaline phosphatase kinetics data
AP_kinetics_df  # dataframe confirmed

Unnamed: 0,pNPP (mM),0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,...,2.75,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5
0,20.0,0.07177,0.147847,0.206699,0.284211,0.373206,0.413397,0.502392,0.585646,0.620096,...,0.813158,0.818182,0.895694,0.954545,1.087321,1.159809,1.171292,1.227273,1.377273,1.478469
1,10.0,0.066743,0.137615,0.196101,0.277982,0.333716,0.404587,0.457569,0.550459,0.631651,...,0.794725,0.833945,0.93922,0.963303,1.083716,1.100917,1.193119,1.226147,1.255046,1.417431
2,7.0,0.067785,0.127595,0.201361,0.268481,0.34557,0.418671,0.488449,0.536962,0.568196,...,0.745633,0.829367,0.820728,0.958291,1.016772,1.010127,1.163639,1.244051,1.199525,1.368987
3,4.0,0.061224,0.126122,0.181837,0.24,0.3,0.382041,0.424286,0.470204,0.562041,...,0.673469,0.742041,0.795918,0.822857,0.955102,0.950204,1.040816,1.124082,1.128367,1.17551
4,2.0,0.053793,0.098276,0.152069,0.211034,0.266379,0.304138,0.36931,0.426207,0.446897,...,0.551897,0.63931,0.706034,0.745862,0.791379,0.811034,0.844138,0.903103,0.982759,1.034483
5,1.0,0.038289,0.082895,0.114868,0.162632,0.205263,0.246316,0.279079,0.318947,0.373026,...,0.447237,0.454737,0.528553,0.541579,0.586184,0.644211,0.704605,0.738947,0.7575,0.757895
6,0.7,0.033797,0.06825,0.101391,0.13125,0.162422,0.206719,0.225094,0.26775,0.286453,...,0.371766,0.378,0.405234,0.454781,0.4725,0.546,0.557813,0.608344,0.629672,0.662813
7,0.4,0.024,0.043846,0.067154,0.090462,0.109615,0.137077,0.164769,0.175385,0.216,...,0.261462,0.282462,0.297,0.319846,0.332308,0.380308,0.392308,0.436154,0.416538,0.461538
8,0.2,0.012955,0.027818,0.038864,0.056182,0.068864,0.077727,0.091636,0.106909,0.1215,...,0.1575,0.160364,0.184364,0.194727,0.206591,0.207273,0.236455,0.250364,0.246136,0.272727
9,0.2,0.014318,0.027273,0.040091,0.056727,0.065455,0.084273,0.098318,0.111273,0.117818,...,0.156,0.171818,0.1755,0.187091,0.2025,0.220364,0.224864,0.250364,0.269455,0.280909


### Datatype
Now that we have imported our date, we need to check the datatypes for the numbers. We must ensure that the numbers are floats, rather than strings, so we can do calculations on them.

Notice that the df.dtypes command gives the overall datatype for the dataframe as an `object`, but also lists the datatypes for each of the columns.

In [12]:
AP_kinetics_df.dtypes # checking to see if the numbers are strings or floats

pNPP (mM)    float64
0.25         float64
0.5          float64
0.75         float64
1            float64
1.25         float64
1.5          float64
1.75         float64
2            float64
2.25         float64
2.5          float64
2.75         float64
3            float64
3.25         float64
3.5          float64
3.75         float64
4            float64
4.25         float64
4.5          float64
4.75         float64
5            float64
dtype: object

In [15]:
display(list(AP_kinetics_df.columns.values)) 

['pNPP (mM)',
 '0.25',
 '0.5',
 '0.75',
 '1',
 '1.25',
 '1.5',
 '1.75',
 '2',
 '2.25',
 '2.5',
 '2.75',
 '3',
 '3.25',
 '3.5',
 '3.75',
 '4',
 '4.25',
 '4.5',
 '4.75',
 '5']

### Calculating initial velocities

The first column in our dataframe is the pNPP concentration in mM ('pNPP (mM)'). The other colulmn headers are the times in minutes for the kinetic data. Notice that these are listed as strings. To calculate initial velocities, these need to be changed to floats.

We need to set up the column headers as our x values. For the y values, we need to skip the first value ('pNPP (mM)') and then use the remaining values (A-405 as a function of time) to calculate slopes and get our initial velocities.

In [16]:
display(list(AP_kinetics_df.columns.values[1:])) 

['0.25',
 '0.5',
 '0.75',
 '1',
 '1.25',
 '1.5',
 '1.75',
 '2',
 '2.25',
 '2.5',
 '2.75',
 '3',
 '3.25',
 '3.5',
 '3.75',
 '4',
 '4.25',
 '4.5',
 '4.75',
 '5']

In [17]:
AP_kinetics_df.columns.values[1:].astype('float64')

array([0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 , 2.75,
       3.  , 3.25, 3.5 , 3.75, 4.  , 4.25, 4.5 , 4.75, 5.  ])

In [96]:
xdata = AP_kinetics_df.columns.values[1:].astype('float64')
print(xdata)

[0.25 0.5  0.75 1.   1.25 1.5  1.75 2.   2.25 2.5  2.75 3.   3.25 3.5
 3.75 4.   4.25 4.5  4.75 5.  ]


In [144]:
AP_kinetics_df.drop(columns = 'pNPP (mM)', inplace=True)
AP_kinetics_df

Unnamed: 0,0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,2.5,2.75,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5
0,0.07177,0.147847,0.206699,0.284211,0.373206,0.413397,0.502392,0.585646,0.620096,0.753589,0.813158,0.818182,0.895694,0.954545,1.087321,1.159809,1.171292,1.227273,1.377273,1.478469
1,0.066743,0.137615,0.196101,0.277982,0.333716,0.404587,0.457569,0.550459,0.631651,0.674312,0.794725,0.833945,0.93922,0.963303,1.083716,1.100917,1.193119,1.226147,1.255046,1.417431
2,0.067785,0.127595,0.201361,0.268481,0.34557,0.418671,0.488449,0.536962,0.568196,0.64462,0.745633,0.829367,0.820728,0.958291,1.016772,1.010127,1.163639,1.244051,1.199525,1.368987
3,0.061224,0.126122,0.181837,0.24,0.3,0.382041,0.424286,0.470204,0.562041,0.612245,0.673469,0.742041,0.795918,0.822857,0.955102,0.950204,1.040816,1.124082,1.128367,1.17551
4,0.053793,0.098276,0.152069,0.211034,0.266379,0.304138,0.36931,0.426207,0.446897,0.506897,0.551897,0.63931,0.706034,0.745862,0.791379,0.811034,0.844138,0.903103,0.982759,1.034483
5,0.038289,0.082895,0.114868,0.162632,0.205263,0.246316,0.279079,0.318947,0.373026,0.394737,0.447237,0.454737,0.528553,0.541579,0.586184,0.644211,0.704605,0.738947,0.7575,0.757895
6,0.033797,0.06825,0.101391,0.13125,0.162422,0.206719,0.225094,0.26775,0.286453,0.318281,0.371766,0.378,0.405234,0.454781,0.4725,0.546,0.557813,0.608344,0.629672,0.662813
7,0.024,0.043846,0.067154,0.090462,0.109615,0.137077,0.164769,0.175385,0.216,0.230769,0.261462,0.282462,0.297,0.319846,0.332308,0.380308,0.392308,0.436154,0.416538,0.461538
8,0.012955,0.027818,0.038864,0.056182,0.068864,0.077727,0.091636,0.106909,0.1215,0.135,0.1575,0.160364,0.184364,0.194727,0.206591,0.207273,0.236455,0.250364,0.246136,0.272727
9,0.014318,0.027273,0.040091,0.056727,0.065455,0.084273,0.098318,0.111273,0.117818,0.137727,0.156,0.171818,0.1755,0.187091,0.2025,0.220364,0.224864,0.250364,0.269455,0.280909


In [146]:
AP_kinetics_df.iloc[:,:] # Learn to extract the A-405 values for the first concentration

Unnamed: 0,0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,2.5,2.75,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5
0,0.07177,0.147847,0.206699,0.284211,0.373206,0.413397,0.502392,0.585646,0.620096,0.753589,0.813158,0.818182,0.895694,0.954545,1.087321,1.159809,1.171292,1.227273,1.377273,1.478469
1,0.066743,0.137615,0.196101,0.277982,0.333716,0.404587,0.457569,0.550459,0.631651,0.674312,0.794725,0.833945,0.93922,0.963303,1.083716,1.100917,1.193119,1.226147,1.255046,1.417431
2,0.067785,0.127595,0.201361,0.268481,0.34557,0.418671,0.488449,0.536962,0.568196,0.64462,0.745633,0.829367,0.820728,0.958291,1.016772,1.010127,1.163639,1.244051,1.199525,1.368987
3,0.061224,0.126122,0.181837,0.24,0.3,0.382041,0.424286,0.470204,0.562041,0.612245,0.673469,0.742041,0.795918,0.822857,0.955102,0.950204,1.040816,1.124082,1.128367,1.17551
4,0.053793,0.098276,0.152069,0.211034,0.266379,0.304138,0.36931,0.426207,0.446897,0.506897,0.551897,0.63931,0.706034,0.745862,0.791379,0.811034,0.844138,0.903103,0.982759,1.034483
5,0.038289,0.082895,0.114868,0.162632,0.205263,0.246316,0.279079,0.318947,0.373026,0.394737,0.447237,0.454737,0.528553,0.541579,0.586184,0.644211,0.704605,0.738947,0.7575,0.757895
6,0.033797,0.06825,0.101391,0.13125,0.162422,0.206719,0.225094,0.26775,0.286453,0.318281,0.371766,0.378,0.405234,0.454781,0.4725,0.546,0.557813,0.608344,0.629672,0.662813
7,0.024,0.043846,0.067154,0.090462,0.109615,0.137077,0.164769,0.175385,0.216,0.230769,0.261462,0.282462,0.297,0.319846,0.332308,0.380308,0.392308,0.436154,0.416538,0.461538
8,0.012955,0.027818,0.038864,0.056182,0.068864,0.077727,0.091636,0.106909,0.1215,0.135,0.1575,0.160364,0.184364,0.194727,0.206591,0.207273,0.236455,0.250364,0.246136,0.272727
9,0.014318,0.027273,0.040091,0.056727,0.065455,0.084273,0.098318,0.111273,0.117818,0.137727,0.156,0.171818,0.1755,0.187091,0.2025,0.220364,0.224864,0.250364,0.269455,0.280909


In [150]:
ydata = AP_kinetics_df.iloc[0]
print(ydata)

0.25    0.071770
0.5     0.147847
0.75    0.206699
1       0.284211
1.25    0.373206
1.5     0.413397
1.75    0.502392
2       0.585646
2.25    0.620096
2.5     0.753589
2.75    0.813158
3       0.818182
3.25    0.895694
3.5     0.954545
3.75    1.087321
4       1.159809
4.25    1.171292
4.5     1.227273
4.75    1.377273
5       1.478469
Name: 0, dtype: float64


We could do this one row at a time, but the best approach is to add a column to the dataframe that gives the slope of the line. We'll create that column first. Then we'll use the extinction coefficient for p-nitrophenol to create a second column where the initial velocity is given in mM/min.

slope, intercept, rvalue, pvalue, stderr = stats.linregress(xdata,ydata)
print('Slope', slope)

I just want the slope from the linregress function, not the other terms. I could not find a tool online to do this without actually putting in the code to do least squares linear regression, so I'm going to introduce a function that uses linregress, but only returns the slope.

In [151]:
def slope_only(xdata, ydata):
    slope, intercept, rvalue, pvalue, stderr = stats.linregress(xdata, ydata)
    return slope

slope = slope_only(xdata,ydata)
print(slope)
#AP_kinetics_df.drop(columns = ['Slope', 'Slope2'], inplace=True)
AP_kinetics_df
# slope_only(xdata,ydata)

0.284376731281203


Unnamed: 0,0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,2.5,2.75,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5
0,0.07177,0.147847,0.206699,0.284211,0.373206,0.413397,0.502392,0.585646,0.620096,0.753589,0.813158,0.818182,0.895694,0.954545,1.087321,1.159809,1.171292,1.227273,1.377273,1.478469
1,0.066743,0.137615,0.196101,0.277982,0.333716,0.404587,0.457569,0.550459,0.631651,0.674312,0.794725,0.833945,0.93922,0.963303,1.083716,1.100917,1.193119,1.226147,1.255046,1.417431
2,0.067785,0.127595,0.201361,0.268481,0.34557,0.418671,0.488449,0.536962,0.568196,0.64462,0.745633,0.829367,0.820728,0.958291,1.016772,1.010127,1.163639,1.244051,1.199525,1.368987
3,0.061224,0.126122,0.181837,0.24,0.3,0.382041,0.424286,0.470204,0.562041,0.612245,0.673469,0.742041,0.795918,0.822857,0.955102,0.950204,1.040816,1.124082,1.128367,1.17551
4,0.053793,0.098276,0.152069,0.211034,0.266379,0.304138,0.36931,0.426207,0.446897,0.506897,0.551897,0.63931,0.706034,0.745862,0.791379,0.811034,0.844138,0.903103,0.982759,1.034483
5,0.038289,0.082895,0.114868,0.162632,0.205263,0.246316,0.279079,0.318947,0.373026,0.394737,0.447237,0.454737,0.528553,0.541579,0.586184,0.644211,0.704605,0.738947,0.7575,0.757895
6,0.033797,0.06825,0.101391,0.13125,0.162422,0.206719,0.225094,0.26775,0.286453,0.318281,0.371766,0.378,0.405234,0.454781,0.4725,0.546,0.557813,0.608344,0.629672,0.662813
7,0.024,0.043846,0.067154,0.090462,0.109615,0.137077,0.164769,0.175385,0.216,0.230769,0.261462,0.282462,0.297,0.319846,0.332308,0.380308,0.392308,0.436154,0.416538,0.461538
8,0.012955,0.027818,0.038864,0.056182,0.068864,0.077727,0.091636,0.106909,0.1215,0.135,0.1575,0.160364,0.184364,0.194727,0.206591,0.207273,0.236455,0.250364,0.246136,0.272727
9,0.014318,0.027273,0.040091,0.056727,0.065455,0.084273,0.098318,0.111273,0.117818,0.137727,0.156,0.171818,0.1755,0.187091,0.2025,0.220364,0.224864,0.250364,0.269455,0.280909


In [153]:
# AP_kinetics_df['new'] = slope_only(xdata,ydata)
AP_kinetics_df.drop(columns = 'new', inplace=True)
# AP_kinetics_df['Slope2'] = slope_only(xdata, AP_kinetics_df.iloc[1,1:21])
AP_kinetics_df
# AP_kinetics_df  # Now I know how to add a column to the dataframe

Unnamed: 0,0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,2.5,2.75,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5
0,0.07177,0.147847,0.206699,0.284211,0.373206,0.413397,0.502392,0.585646,0.620096,0.753589,0.813158,0.818182,0.895694,0.954545,1.087321,1.159809,1.171292,1.227273,1.377273,1.478469
1,0.066743,0.137615,0.196101,0.277982,0.333716,0.404587,0.457569,0.550459,0.631651,0.674312,0.794725,0.833945,0.93922,0.963303,1.083716,1.100917,1.193119,1.226147,1.255046,1.417431
2,0.067785,0.127595,0.201361,0.268481,0.34557,0.418671,0.488449,0.536962,0.568196,0.64462,0.745633,0.829367,0.820728,0.958291,1.016772,1.010127,1.163639,1.244051,1.199525,1.368987
3,0.061224,0.126122,0.181837,0.24,0.3,0.382041,0.424286,0.470204,0.562041,0.612245,0.673469,0.742041,0.795918,0.822857,0.955102,0.950204,1.040816,1.124082,1.128367,1.17551
4,0.053793,0.098276,0.152069,0.211034,0.266379,0.304138,0.36931,0.426207,0.446897,0.506897,0.551897,0.63931,0.706034,0.745862,0.791379,0.811034,0.844138,0.903103,0.982759,1.034483
5,0.038289,0.082895,0.114868,0.162632,0.205263,0.246316,0.279079,0.318947,0.373026,0.394737,0.447237,0.454737,0.528553,0.541579,0.586184,0.644211,0.704605,0.738947,0.7575,0.757895
6,0.033797,0.06825,0.101391,0.13125,0.162422,0.206719,0.225094,0.26775,0.286453,0.318281,0.371766,0.378,0.405234,0.454781,0.4725,0.546,0.557813,0.608344,0.629672,0.662813
7,0.024,0.043846,0.067154,0.090462,0.109615,0.137077,0.164769,0.175385,0.216,0.230769,0.261462,0.282462,0.297,0.319846,0.332308,0.380308,0.392308,0.436154,0.416538,0.461538
8,0.012955,0.027818,0.038864,0.056182,0.068864,0.077727,0.091636,0.106909,0.1215,0.135,0.1575,0.160364,0.184364,0.194727,0.206591,0.207273,0.236455,0.250364,0.246136,0.272727
9,0.014318,0.027273,0.040091,0.056727,0.065455,0.084273,0.098318,0.111273,0.117818,0.137727,0.156,0.171818,0.1755,0.187091,0.2025,0.220364,0.224864,0.250364,0.269455,0.280909


In [109]:
AP_kinetics_df

Unnamed: 0,pNPP (mM),0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,...,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5,Slope
0,20.0,0.07177,0.147847,0.206699,0.284211,0.373206,0.413397,0.502392,0.585646,0.620096,...,0.818182,0.895694,0.954545,1.087321,1.159809,1.171292,1.227273,1.377273,1.478469,0.016002
1,10.0,0.066743,0.137615,0.196101,0.277982,0.333716,0.404587,0.457569,0.550459,0.631651,...,0.833945,0.93922,0.963303,1.083716,1.100917,1.193119,1.226147,1.255046,1.417431,0.016002
2,7.0,0.067785,0.127595,0.201361,0.268481,0.34557,0.418671,0.488449,0.536962,0.568196,...,0.829367,0.820728,0.958291,1.016772,1.010127,1.163639,1.244051,1.199525,1.368987,0.016002
3,4.0,0.061224,0.126122,0.181837,0.24,0.3,0.382041,0.424286,0.470204,0.562041,...,0.742041,0.795918,0.822857,0.955102,0.950204,1.040816,1.124082,1.128367,1.17551,0.016002
4,2.0,0.053793,0.098276,0.152069,0.211034,0.266379,0.304138,0.36931,0.426207,0.446897,...,0.63931,0.706034,0.745862,0.791379,0.811034,0.844138,0.903103,0.982759,1.034483,0.016002
5,1.0,0.038289,0.082895,0.114868,0.162632,0.205263,0.246316,0.279079,0.318947,0.373026,...,0.454737,0.528553,0.541579,0.586184,0.644211,0.704605,0.738947,0.7575,0.757895,0.016002
6,0.7,0.033797,0.06825,0.101391,0.13125,0.162422,0.206719,0.225094,0.26775,0.286453,...,0.378,0.405234,0.454781,0.4725,0.546,0.557813,0.608344,0.629672,0.662813,0.016002
7,0.4,0.024,0.043846,0.067154,0.090462,0.109615,0.137077,0.164769,0.175385,0.216,...,0.282462,0.297,0.319846,0.332308,0.380308,0.392308,0.436154,0.416538,0.461538,0.016002
8,0.2,0.012955,0.027818,0.038864,0.056182,0.068864,0.077727,0.091636,0.106909,0.1215,...,0.160364,0.184364,0.194727,0.206591,0.207273,0.236455,0.250364,0.246136,0.272727,0.016002
9,0.2,0.014318,0.027273,0.040091,0.056727,0.065455,0.084273,0.098318,0.111273,0.117818,...,0.171818,0.1755,0.187091,0.2025,0.220364,0.224864,0.250364,0.269455,0.280909,0.016002
