# Plotting and Fitting a Dose-Response Curve

## Introduction 

Today we will be plotting and fitting a dose-response curve. The data that we will be working with is whole-cell electrophysiology data, an example trace is shown below. We will be plotting the current responses to different concentrations of the mu-opioid receptor (MOR) agonist morphine for males versus females. 

When MORs are activated they in turn activate G-protein Inwardly-Rectifying Potassium channels (GIRKs), which is what the current responses below are actually measuring. These channels allow potassium to leave the cell, and this is what creates the electrical current that we are measuring. Because there could be variable numbers of GIRK channels on a cell, the morphine current responses will be normalized to the current responses induced by the alpha2-adrenergic receptor agonist UK, which also acts on GIRKs. Because UK is very potent, it will give us the maximum GIRK response possible. 

I have already manually calculated the current responses for each drug so the file we will be working with is a csv file of these values.

Below is an example trace with morphine application, block with the MOR antagonist naloxone (NLX), and then application of UK followed by the appriopriate (alpha2) antagonist, ida. 

<img src="trace1.png" width = "800" >



## Loading the data 

First, we need to import the modules we will need and load the data. We have seen all of these before except for the `curve_fit` function from [scipy.optimize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html), which will allow us to fit a curve to our dose response plot.  

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np

pd.options.display.max_rows = 7

### Exercise

Next, we need to import the file that we'll be working with, which is `dose_response_male.csv`. So please import the csv file under the variable `dose_response_male`. This file is organized so that the first column is the concentration, this will be an appropriate index so set this first column as the index column. 

In [None]:
# Answer
filename = "dose_response_male.csv"

dose_response_male = pd.read_csv(filename, index_col=0)
dose_response_male

## Working with our data

### Exercise

Great, now we have loaded our dataframe and can work with it. Later on when we fit a dose response curve we will need the log values of each concentration. So create a new column in our data frame titled `'Log[M]'` that contains the log values. Hint: I included a column in the dataframe that contains each concentration in Molar. 

In [None]:
# Answer
dose_response_male['Log[M]'] = np.log10(dose_response_male['concentration_in_M'])
dose_response_male

### Exercise

Now, as I mentioned, we will be plotting normalized values. The morphine current will be plotted as a percentage of the UK current. So let's add another column to our dataframe titled `'percent_UK'` that contains the normalized values. 

In [None]:
# Answer 
dose_response_male['percent_UK'] = 100 * dose_response_male['morphine_current'] / dose_response_male['UK_current']
dose_response_male

### Exercise

Our data contains multiple replicates for each concentration. However, we want to plot the average normalized current for each concentration. How might we group (hint) the data based on concentration and then calculate the average? Store your final answer in the variable `mean_responses_male`.

In [None]:
# Answer 
grouping_male = dose_response_male.groupby(['concentration_in_M'])
mean_responses_male = grouping_male['percent_UK'].mean()
mean_responses_male

## Plotting our data

### Exercise

Now let's plot our data using `matplotlib.pyplot`! For this plot, it will be best if the x axis is a log scale. Use `mean_responses_male.plot?` for help in doing this. In addition, the default is that there will be a line connecting the data, but we want to fit our own line later on. To do this you will need to add an argument for the type of `marker` to use for each point as well as an argument to change the default `linestyle` to `''`. [Here](https://matplotlib.org/api/markers_api.html) is a list of markers you could use. Don't forget to add labels for each axis and a plot title. 

In [None]:
# Answer 
mean_responses_male.plot(logx=True, marker='o', linestyle='')
plt.xlabel('Log[M]')
plt.ylabel('% UK')
plt.title('Morphine Dose Response')
plt.show()

### Exercise

Great, now lets add some error bars. First let's calculate the standard error of the mean and store them in the variable `sem_responses_male`.

In [None]:
# Answer 
sem_responses_male = grouping_male['percent_UK'].sem()
sem_responses_male

### Exercise 

Now let's add error bars to our graph. The default for error bars is to not have a cap on them. Let's add some. 

In [None]:
# Answer 
mean_responses_male.plot(logx=True, marker='o', linestyle='', yerr=sem_responses_male, capsize=3, elinewidth=1)
plt.xlabel('Log[M]')
plt.ylabel('% UK')
plt.title('Morphine Dose Response')
plt.show()

## Fitting a Curve 

### Exercise

Now we will fit a a curve to the data. But first we need to have the x-values and the y-values as separate numpy arrays. So let's start with the y-values first... What are the y-values in this dataset? Create a numpy array of these values and store it in the variable `y_data_male`.

In [None]:
# Answer 
y_data_male = np.array(mean_responses_male)
y_data_male

### Exercise 

Now let's handle the x-values... The x-values are the concentrations for each response. However, there is a problem. We currently have an array of the mean responses, but we don't have an array of the concentrations used. We can't just use the concentration column from the dataframe, because there are multiple of each concentration. To fix this, let's make an array of the "Log[M]" column that only contains (hint) unique (/hint) values. Store this in the variable `log_values`. 

In [None]:
# Answer
log_values = dose_response_male['Log[M]'].unique()
log_values

### Exercise

OK, now we can fit the data! But with what equation? A very important step in curve fitting is knowing what curve to fit to your data, and knowing enough about your system/experiment to know what sort of models (or curves) fit. We will give you the equation for this dataset, but your own dataset may be different, and may require a different fit equation.

For this data (and dose-response curves in general), the Hill equation is appropriate. It is commonly used to fit a variety of dose-response curves in biological and biochemical systems. More on the Hill equation can be found [here](https://en.wikipedia.org/wiki/Hill_equation_(biochemistry)).

First we need to define a function for the Hill equation. This will be called by the `curve_fit` function later. The function we will write takes parameters `x`, and all of the constants in the equation we are using `(bottom, top, EC50, hillslope)`. The constants affect the shape of the curve, and this is what the `curve_fit` function will help us figure out the values of.

-  `EC50` is the concentration of agonist that gives a response half way between Bottom and Top. 
-  `Top` and `Bottom` are plateaus in the units of the Y axis.
-  The equation we will use assumes that the dose response curve has a standard slope, equal to a `hillslope` (or slope factor) of 1.0

<img src='hill_eq.png' width='500' >

Can you **def**ine a function that takes the appriopriate parameters and returns the equation above?

Note: by convention, the scipy `curve_fit` function we are using wants `x`, followed by the parameters/constants.

In [None]:
# Answer 
def hill_eq(x, bottom, top, EC50, hillslope):
    y = bottom + (top-bottom)/(1+10**(np.log10(EC50)-x))
    return y 

### Exercise

Now we can use scipy's `curve_fit` function, which we imported from `scipy.optimize` earlier. Curve_fit returns two arrays- popt and pcov- that contain the optimal parameters and covariance, respectively. We mostly care about popt, as those are the constants we want to describe the curve (`bottom, top, EC50, hillslope`). These can be named whatever you want them to be, but you will need to set two variables when calling the `curve_fit` function, the first will store the popt values and the second the pcov values.

See if you can figure out how to construct the curve fit statement. Store the values `curve_fit` returns in the variables `male_fit` and `pcov`. Use `curve_fit?` for help writing the statement.

Note: you can ignore the RunTimeWarning. If you print out `male_fit`, you should still see the array it returns. 

In [None]:
# Answer
male_fit, pcov = curve_fit(hill_eq, log_values, y_data_male)
male_fit

## Plotting the curve fit

### Exercise

OK, now let's plot the values and curve using `matplotlib.pyplot`. To plot the curve, simply use `log_values` for x, and use the following format to plot the curve: `plt.plot(x, function(x, *popt))`.

In [None]:
# Answer 
plt.plot(log_values, hill_eq(log_values, *male_fit))

### Exercise 

Now we will add the data we plotted earlier to visualize how good our fit is. To do this, use `plt.errorbar`. Don't forget your labels!

In [None]:
plt.plot(log_values, hill_eq(log_values, *male_fit), color='deepskyblue')

# Answer 
plt.errorbar(log_values, mean_responses_male, yerr=sem_responses_male, marker='o', linestyle='', capsize=3, color = 'blue')
plt.xlabel('Log[M]')
plt.ylabel('% UK')
plt.title('Morphine Dose Response')
plt.show()

## Adding the female dose response

Great! Our fit looks good! Now let's do the same thing for the female dose response. 

### Exercise 

First, load the file (`'dose_response_female.csv'`)

In [None]:
# Answer
dose_response_female = pd.read_csv('dose_response_female.csv', index_col=0)
dose_response_female

### Exercise

Now create a column with the normalized morphine current responses. 

In [None]:
# Answer 
dose_response_female['percent_UK'] = 100 * dose_response_female['morphine_current'] / dose_response_female['UK_current']
dose_response_female

### Exercise

Now find the average for each concentration. 

In [None]:
# Answer 
grouping_female = dose_response_female.groupby(['concentration_in_M'])
mean_responses_female = grouping_female['percent_UK'].mean()
mean_responses_female

### Exercise 

Now find the sem values. 

In [None]:
# Answer 
sem_responses_female = grouping_female['percent_UK'].sem()
sem_responses_female

### Exercise 

Now create an array of the female mean responses.

In [None]:
# Answer
y_data_female = np.array(mean_responses_female)
y_data_female

### Exercise 

We can use the same function we defined above for the hill equation to again find the optimal parameters in order to fit the curve for the famale dose-response curve. This time use the variables `female_fit` and `pcov`. 

In [None]:
# Answer 
female_fit, pcov_f = curve_fit(hill_eq, log_values, y_data_female)
female_fit

### Exercise 

Now let's plot the mean responses for males and females along with their curve fits all on the same plot! And let's add a legend this time. 

In [None]:
# Answer
plt.plot(log_values, hill_eq(log_values, *male_fit), color='deepskyblue')
plt.plot(log_values, hill_eq(log_values, *female_fit), color='plum')
plt.errorbar(log_values, mean_responses_male, yerr=sem_responses_male, marker='o', linestyle='', capsize=3, color = 'blue')
plt.errorbar(log_values, mean_responses_female, yerr=sem_responses_female, marker='o', linestyle='', capsize=3, color='purple')
plt.xlabel('Log[M]')
plt.ylabel('% UK')
plt.title('Morphine Dose Response')
plt.legend(['Male Fit', 'Female Fit', 'Male', 'Female'])
plt.show()

## Working with the curve_fit constants

### Exercise 

Now we have a beautiful plot, but it would be nice to know what values were used to fit the curve. These values are stored in  the popt array, but let's convert them to a series to make them more easily readable. First make a list of the labels (`['bottom', 'top', 'EC50 (M)', 'hill slope']`) and then use `pd.Series` to make a series with an index of the fit labels from your list. Do this for both males and females.

In [None]:
# Answer 
fit_labels = ['bottom', 'top', 'EC50 (M)', 'hill slope']
fit_series_m = pd.Series(male_fit, index = fit_labels)
fit_series_f = pd.Series(female_fit, index = fit_labels)
print("Males: \n" + str(fit_series_m))
print("\nFemales: \n" + str(fit_series_f))

To make the EC50 values easier to read we can convert it to nM and then print out a statement that rounds it to two decimal points:

In [None]:
ec50_f = fit_series_f['EC50 (M)'] *1e9
ec50_m = fit_series_m['EC50 (M)'] *1e9
print("The EC50 for females is " + str(round(ec50_f, 2)) + " nM")
print("The EC50 for males is " + str(round(ec50_m, 2)) + " nM")