# Lab 10: Enzyme Kinetics (Part 1)
## Instructions
The two notebooks named `enzyme-kinetics-part1.ipynb` and `enzyme-kinetics-part2.ipynb` contain all the code needed to fully analyze the steady-state and kinetic measurements from Experiment 10. Make sure to read the instructions in each block carefully, and also read the comments on the code.

Here are all the parts across **both** notebooks:

Part 1:
- A: Plotting all the UV-VIS scans for p-nitrophenol
- B: Finding the molar absorption coefficient of p-nitrophenol
- C: Extrapolating the rate of reaction from the kinetic trials

Part 2:
- D: Fitting the Michaelis-Menten curve to the data
- E: Fitting the Lineweaver-Burk equation to the data

***The code cells must be run in order unless specified otherwise.*** Since the data analysis needed for this experiment is more rigorous, I've split the code needed into two parts. This first part can also be done on Excel, hence using python to obtain the molar absorption coefficient and the rate laws is ***optional*** and can instead be run on **Excel**. ***However***, we *highly* encourage you to use python as this is *faster* and a more efficient way to get your rates.

***Part 2 must be run on python*** as Excel does not have the functions needed to perform the Michealis-Menten fit.

I have indicated all points where you need to interact/change the code with the heading below:

## Task #0

There are **thirteen** tasks in total for *this notebook*, this *includes* points where you need to change filenames. **Note:** Task #8 has two parts.

**Part C** is where all the kinetic data is fitted, since most of the code will remain the same for each of the seven trials I've indicated the parts that need to be changed with larger comment blocks which look like this:  
`'''CHANGE CODE BELOW'''`  

`'''CHANGE CODE ABOVE'''`

## Part A: Plotting the UV-VIS Scans

## Importing the UV-VIS Scan Data
## Task #1

Follow these instructions carefully:

1. First find the csv file that contains the scan data of the p-nitrophenol absorbance for each different concentration. This is **not** the kinetic trial, it should be the first measurement you took in lab.
2. Upload it to the JupyterHub into the *same* folder that this notebook is stored in.
3. Add the filename of your datafile by replacing the `ADD YOUR FILENAME HERE` in the code below with your filename. But make sure to keep the .csv and the quotes!

In [None]:
# We need to import two modules
import pandas as pd 
import numpy as np

# This is the scan data, wavelength and absorbance
# Make sure to change the filepath by replacing "ADD YOUR FILENAME HERE" with the name of your file
# Keep the quotes and the .csv at the end!
scan_data = pd.read_csv(filepath_or_buffer="ADD YOUR FILENAME HERE.csv", sep="\t", encoding="UTF-16 LE").dropna(axis=1)

# We"ll print the dataframe to see what our data looks like
print(scan_data)

### Plotting the scans
This is the same procedure as we have been doing where we use `matplotlib` to plot

#### Storing each measurement as a variable
Since all the measurements share the same x-axis, we can define a variable that stores the wavelength column, so we don't have to call it every time using the `pandas` notation.

In [None]:
import matplotlib.pyplot as plt
# Here we define a wavelength variable
# Then, after the equals sign, we call the dataframe variable, which is scan_data
# To get just the wavelength column, we first add square brackets [] and inside 
# the brackets add the *exact* name of that column in quotation marks
wavelength = scan_data["Wavelength(nm)"]
print(wavelength)

## Task #2

Run the cell below to find out what the names of all of the columns in your file are:

In [None]:
# Print the whole dataframe
# So we can see the column names
print(scan_data.head())

Now, for each scan that you want to plot, you can define the absorbance variable similarly: 
- By naming the variable (using unique names for each scan) and then identifying it in the dataframe by using the *exact column heading*. I suggest copying and pasting it directly from the print out of the full dataframe *above* (the code block right underneath the Task #2 label re-prints the whole dataframe so it's easier for you). 
- I've done the four for you **except** you will need to change the column name inside the quotes to the appropriate column in your data.
- If you have more than three scans you want to plot, just copy and paste the code for p_nitrophenol_trial_1 and change the trial number at the end and column name inside the quotes.

In [None]:
# Copy and paste this code for however many scans you want to plot
# Only things you need to change for each scan are:
# 1. The variable name
# 2. The column name (to pick the correct scan to plot) inside the quotations
p_nitrophenol_trial_1 = scan_data["ENTER THE NAME OF THE FIRST SAMPLE HERE"]

# Add the rest of the samples below this line:
p_nitrophenol_trial_2 = scan_data["ENTER THE NAME OF THE SECOND SAMPLE HERE"]
p_nitrophenol_trial_3 = scan_data["ENTER THE NAME OF THE THIRD SAMPLE HERE"]
p_nitrophenol_trial_4 = scan_data["ENTER THE NAME OF THE FOURTH SAMPLE HERE"]



## Task #3
#### Adding data to the plot
By now you should be very familiar with how to do this: so try adding just the data for *your first sample* in an *unformatted* graph

In [None]:
# Plotting the first data
# replace the x and y with the variables you defined above
test_plot, = plt.plot(x, y, label='1')

# We'll also Add a legend so we can see the label
plt.legend()

# show the plot
plt.show()

Now to combine all the information we've learned to create a nicely formatted plot by adding the data for each trial individually.

## Task #4
You will add your data to the graph below the large comment:
 `'''PLOTTING THE DATA'''`

We will need to copy and paste the line that reads `plt.plot(x, y, label='Sample 1')` for each of your concentration trials.

For ease, I've added the code to plot Samples 1-4, if you have more scans to plot, you will need to copy and paste the lines as mentioned above.

Make sure to replace the "x" and the "y" with appropriate variables and edit text after `label=` with the sample name you would like on the legend of the plot. ***Note:*** labels *have to* be in-between parentheses. I have set up the first plot for you as an example.

In [None]:
# Create a canvas for the figure thats 8pt wide and 6pt tall
plt.figure(figsize=(8, 6))

# This changes the fontsize of the ticks on the axis
plt.tick_params(labelsize=14)

# This labels the axis and also changes their fontsize
plt.xlabel("Wavelength (nm)", fontsize=16)
plt.ylabel("Absorbance (a.u.)", fontsize=16)

''' PLOTTING THE DATA '''
# copy and paste the line below for all the scans replacing the x and y with the variables you defined above and the label with the sample name
plt.plot(x, y, label='Sample 1')

# I've added upto sample 4, you can add more if you have more samples
plt.plot(x, y, label='Sample 2')
plt.plot(x, y, label='Sample 3')
plt.plot(x, y, label='Sample 4')

# Don't need to edit this but I will explain everything I am doing here
# Set limits for the x and y axis
plt.xlim(325, 525)
plt.ylim(-0.05, 3)

# Add the plot legend
plt.legend(fontsize=16)

# This makes sure that the plot is formatted correctly
plt.tight_layout()
plt.show()

## Part B: Calculating the molar absorption coefficient of p-nitrophenol
The molar absorptivity ($\epsilon$) of a material can be found using the Beer-Lambert Law:
$$A=\epsilon_{400} lc$$
where 
- $A$ is the absorbance
- $l$ is the path length in cm
- $c$ is the concentration in M  

The UV-VIS data you collected in lab measured the absorbance at $\lambda=400nm$ for different concentrations. To find $\epsilon$, we can plot absorbance vs. concentration and do linear regression. The outputted slope will be the molar absorptivity.

### Finding the absorbance at 400 nm
We will basically be doing python's version of hitting **CTRL/CMD + F** to find a particular value. The syntax may look complicated at first, but know that the only thing we need to worry about is the number that's highlighted. The code below just asks python to select the row in each column where the wavelength equals to 398 nm.

In [None]:
# This will print all of the column values where the wavelength is 398
print(scan_data.loc[scan_data["Wavelength(nm)"] == 398])

## Task #5
We need to do the same thing but for *400 nm* so edit the line below to reflect that change.

In [None]:
# This is the absorbance at 400 nm for the samples
# This code is incomplete, you need to fill in the question mark
abs_400 = scan_data.loc[scan_data["Wavelength(nm)"] == ?]
                        
# This will print the absorbance at 400 nm for the samples
print(abs_400)

## Task #6
#### Store the Absorbance Data in Excel
To make things easier, you can store the absorbance at 400 nm and the concentration of each run in the Excel sheet labelled **"molar_absorptivity_determination.xlsx"** which should be stored in the same folder as this notebook. Note that no units have been added to the column headings, it is your responsibility to know what the correct units are.

**DO NOT CHANGE ANY OF THE COLUMN HEADINGS. THIS WILL RESULT IN AN ERROR IN THE CODE** 

The procedure is the same as lab 8:

1. Download the excel file
2. Enter your data in the respective columns
3. Save the file
4. Reupload the file into the same folder. **Do not change the name of the file**
5. Click "overwrite" when the dialog asks.

#### Uploading the Absorbance Data
As always, we will be uploading the data using `pandas`

In [None]:
# Upload the calibration data from the excel file and store it as a dataframe
calibration_data = pd.read_excel("molar_absorptivity_determination.xlsx", engine="openpyxl")

# Storing the concentration and absorbance values in variables
concentration = np.asarray(calibration_data["Concentration"])
absorbance = np.asarray(calibration_data["Absorbance"])

# Test if this works
print(concentration, absorbance)

### Fitting the data using a simple linear model to obtain the Molar Absorption Coefficient

This is the same exact procedure as the previous two labs, so you should be very familiar with it! You'll also notice that most of this code below was copied and pasted directly from lab 8 and just the variables names are changed!

## Task #7
Add the x and y variables that we want to fit for this regression (after the equals signs)

In [None]:
from scipy.stats import linregress

# performing the linear regression
regression_results = linregress(x=, y=)

# this will print if the linear regression worked
print('Regression is successful!')

#### Fit results
For ease, we'll store the fitted slope, intercept, the errors, and the $r^2$ as variables, and then print the results.  
***Make sure to save your value for the slope, this is the molar absorption coefficient: $\epsilon_{400}$***

In [None]:
# Here's the fitted slope and intercept
slope = regression_results.slope
intercept = regression_results.intercept

# And the errors from the fit as well
slope_err = regression_results.stderr
intercept_err = regression_results.intercept_stderr

# Find the r^2
r_squared = regression_results.rvalue**2

# Printing the results
print(f"Slope = {slope} ± {slope_err}")
print(f"Intercept = {intercept} ± {intercept_err}")
print(f"r^2 = {r_squared}")

#### Plotting the experimental data and linear fit

This is the same as the previous two labs.

## Task #8
This task is in two parts
### Task 8.1
Below the section headed `'''CHANGE AXIS LABELS'''`, there are two commands labeling the x-axis (`plt.xlabel`) and y-axis (`plt.ylabel`). The actual labels are in-between quotation marks, and inside the quotes I've added a question mark in place of where the **units** for the axes should be. Replace this question mark only with the appropriate units.
### Task 8.2
Add the data to the plot. Below the section headed `'''PLOTTING THE DATA'''`, replace the `x` and `y` with the appropriate variables to plot *both* your experimental data and the fitted data.

In [None]:
# This is to create 500 equally spaced points from 0 to your maximum concentration + 20%
concentration_fit = np.linspace(start=0, stop=0.00015 * 1.2, num=500)
# Now we can calulate the fitted absorbances using the parameters from the fit
absorbance_fit = concentration_fit * slope + intercept

# Create a canvas for the figure thats 8pt wide and 6pt tall
plt.figure(figsize=(8, 6))

# This changes the fontsize of the ticks on the axis
plt.tick_params(labelsize=14)

'''CHANGE AXIS LABELS'''
# This labels the axis and also changes their fontsize
# Replace the ? with the correct units
plt.xlabel("Concentration (?)", fontsize=16)
plt.ylabel("Absorbance (?)", fontsize=16)

# This changes the limits of the x and y axis
plt.xlim(0, 0.00015)
plt.ylim(0, 3.5)

'''PLOTTING THE DATA'''
# Plot the experimental data as a scatter plot hence the "o"
# Replace the x and y with the appropriate variables you defined above
experimental_plot, = plt.plot(x, y, "o", markersize=8, color="red", label="Experimental Data")
# The fitted data is plotted as a smooth line, hence the "-"
# Replace the x and y with the appropriate variables you defined above
fitted_plot, = plt.plot(x, y, "-", color="black", label="Linear Fit")

# add text to the plot in the top left corner
plt.text(0.05, 0.95, 
         f"m = {round(slope, -2)} ± {round(slope_err, -2)}" + 
         f"\nc = {round(intercept, 3)} ± {round(intercept_err, 3)}" +
         f"\n$r^2$ = {r_squared:0.4g}",
         ha="left", va="top", transform=plt.gca().transAxes, fontsize=16)

# A legend to help identify our plots
plt.legend(fontsize=16)

# This makes sure that the plot is formatted correctly
plt.tight_layout()

## Part C: Extrapolating the Rate of Reaction

This block will take the longest to run as we need to calculate and store the rates of reaction from all seven kinetics trials. Therefore, to make things easier, we will re-import all the necessary modules in this part. This will allow you to **restart your progress from this part without having to run the cells above**.

### Absorbance to concentration using Beer's Law
In order to get the rate of reaction in the correct units, we'll need to convert the absorbance values into concentrations using Beer's Law and the molar absorption coefficient we calculated in part A.

## Task #9
We've stored Beer's Law in something known as a *function* which are used in python (for our data science purposes) to define any equation or formula. This is done, so we do not have to type out the formula each time we want to use it, and can instead just call the function, and it will apply the formula to our set of inputs. You can read more about functions in the python tutorial.

The `beers_law` function contains "inputs":
- `A` which is the absorbance
- `e` which is the molar absorption coefficient
- `l` which is the path length, but this is set constant at 1

It will then "output" `c` which is the concentration.

I have purposefully left out one part of the Beer-Lambert law formula in the function below, make sure to complete it.

In [None]:
# Beer's law but we keep the path length constant at 1 cm
# The formula for Beer's law below is incomplete, make sure complete it
def beers_law(A, e, l=1.0):
    c = A
    return c

## Task #10
Now we need to add the molar absorption coefficient. Replace the `?` with the molar absorption coefficient we calculated in part A.

In [None]:
# Replace the question mark with the molar absorptivity 
# which is the slope of the fit we did in part A
molar_absorptivity = ?

## Task #11
### Importing the kinetic data
1. First, upload all the kinetic data you uploaded into the JupyterHub.
2. Next, change the filepath below to the specific kinetic trial you are analyzing.

In [None]:
# Re-importing the modules we need
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

'''CHANGE THE FILEPATH BELOW'''
kinetic_data = pd.read_csv(filepath_or_buffer="ENTER YOUR FILENAME HERE", sep="\t", encoding="UTF-16 LE")
'''CHANGE THE FILEPATH ABOVE'''

print(kinetic_data)

### Finding the linear regime
We'll want to extract the "Time" and "Absorbance" columns as numpy arrays and store them in separate variables. We'll see why this is important soon.

In [None]:
# From the dataframe, we"ll collect the time and absorbance column
# then using .to_numpy() we"ll convert it to an array
time = kinetic_data["Time(sec)"].to_numpy()
absorbance = kinetic_data["ABS"].to_numpy()

It will also be helpful to quickly plot the data to see what it looks like. This is a very rough plot only meant for visualization, so we will not format it.

In [None]:
# Rough plot with dots for the points
plt.plot(time, absorbance, ".")
plt.xlabel("Time (s)")
plt.ylabel("Absorbance (a.u.)")
plt.show()

We need to extrapolate only the *initial* rate of reaction as that is the regime where we can assume that the concentration remains the same:
$$[S]_t = [S]_0$$
In other words, we want the slope of only the *linear* portion of our data, which occurs at early times. If we try to fit the whole data, we'll get the wrong rate of reaction as our assumption no longer holds true as the reaction progresses. Therefore, we need a way to truncate our data to shorter times.  
This is where having an array is very helpful. We can tell python to create a *subset* of our total time and absorbance arrays that satisfies a specific condition:  

`short_array = main_array[condition]`

In this case, the condition would be the data that was taken at times before the substrate concentration starts changing, and we get a non-linear dependance. For example, if my reaction was only linear for the first 50s, I would make my shorter arrays for time and absorbance using this formulation:
- `time_short = time[(time >= initial_time) & (time < max_time)]`
- `absorbance_short = absorbance[(time >= initial_time) & (time < max_time)]`

To make this even easier, I defined a `max_time` variable which defines what the maximum time we want to plot is. This way you only need to change one line for kinetic trial. I also added an `initial_time` condition incase something went wrong in the first few seconds of your measurement.

## Task #12
Look at the graph above and find the portion of time where the data looks linear. You want to truncate the data so it is only linear. Change the `initial_time` and `max_time` variable to what you think is the limit of the linear regime (i.e. the maximum time the data looks linear) and plot it.

In [None]:
# The maximum amount of time we want to plot
# Change this for each trial depending on what your data looks like
'''CHANGE THE INITIAL_TIME AND MAX_TIME BELOW'''
initial_time = 0
max_time = 120
'''CHANGE THE INITIAL_TIME AND MAX_TIME ABOVE'''

# We'll create subsets of the arrays, so they are 
# truncated to the maximum time we set above
absorbance_short = absorbance[(time >= initial_time) & (time < max_time)]
time_short = time[(time >= initial_time) & (time < max_time)]

# Now let's plot this
plt.plot(time_short, absorbance_short, ".")

plt.xlabel("Time (s)")
plt.ylabel("Absorbance (a.u.)")
plt.show()

### Fitting
This will be the exact same as before, but now we have different independent and dependent variables. Therefore, we'll use the `linregress` class again to obtain the slope, intercept, and $r^2$ values. I've added all the variables you will need. Note that we also changed the absorbances to concentration in order to get the rate constant in the correct units.

In [None]:
# First we need to convert the absorbances to concentrations
# We'll use Beer's law and our calculated molecular absorptivity
# Then we'll convert the concentrations to mM
concentration = beers_law(absorbance_short, molar_absorptivity) * 1000

# Fitting the kinetic data
regression_results = linregress(x=time_short, y=concentration)

# Here's the fitted slope and intercept
slope = regression_results.slope
intercept = regression_results.intercept

# And the errors from the fit as well
slope_err = regression_results.stderr
intercept_err = regression_results.intercept_stderr

# Find the r^2
r_squared = regression_results.rvalue**2

# Printing the results
print(f"Slope = {slope} ± {slope_err}")
print(f"Intercept = {intercept} ± {intercept_err}")
print(f"r^2 = {r_squared}")

We'll also want to see how this fit looks, so we'll plot the fit and experimental data the same as before, repeating the steps from Part B.

In [None]:
# We won't need to create a faux time scale, we can just use the full one from the measurement :)
fitted_concentration = time * slope + intercept

# Create a canvas for the figure thats 8pt wide and 6pt tall
plt.figure(figsize=(8, 6))

# This changes the fontsize of the ticks on the axis
plt.tick_params(labelsize=14)

# This labels the axis and also changes their fontsize
plt.xlabel("Time (s)", fontsize=16)
plt.ylabel("Concentration (mM)", fontsize=16)

# now we plot
# plotting the experimental data as a scatter plot hence the "o"
plt.plot(time_short, concentration, "o", markersize=8, color="red", label="Experimental Data")
# the fitted data is plotted as a smooth line, hence the "-"
plt.plot(time, fitted_concentration, "-", color="black", label="Linear Fit")

# add text to the plot in the bottom right corner
plt.text(0.95, 0.05, 
         f"m = {slope:0.5g} ± {slope_err:0.5g}" + 
         f"\n$r^2$ = {r_squared:0.4g}",
         ha="left", va="top", transform=plt.gca().transAxes, fontsize=16)

# A legend to help identify our plots
plt.legend(fontsize=16)

# This makes sure that the plot is formatted correctly
plt.tight_layout()

## Task #13
### Repeat for all kinetic trials
These steps can be repeated for all other kinetic measurements. Most of the code will remain the same, all we need to change is:
- The filename in `pd.read_csv`
- The `initial_time` and `max_time` variable, which will vary for each trial

Do this for all the trials to obtain the rate of reaction, you can store your data in the Excel sheet labeled **"enzyme_kinetics_data.xlsx"**. Columns have been labelled for you, but without any units.