<a href="https://colab.research.google.com/github/TheTrappist/teaching/blob/main/Biochem6761/AU23/02_Fitting_AU23.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Let's do some simple plotting!

# The following are import statements. They load additional
# functions stored in libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

x = np.linspace(0, 10, 100)

# This is a definition of a function
def exponent(x,a):
  y = np.power(x,a)
  return(y)

y = exponent(x,2)

plt.style.use("bmh")

# Let's make a simple plot
fig, ax = plt.subplots()
ax.plot(x,y)

#print(plt.style.available)

**This is a text cell. It won't trigger any code to run and can be used to annotate (describe) segments of code.**

# This is a cell storing text and equations.

The simple hyperbolic binding formula assumes that the concentration of receptor is very low:

$f_{bound}=\frac{[L]_{total}}{[L]_{total}+K_D}$


The fuller (and more accurate) quadratic form explicitly accoutns for the concentration of protein/receptor:

$f_{bound}=\frac{([R]_{total}+[L]_{total}+K_D)-\sqrt{([R]_{total}+[L]_{total}+K_D)^2-4\times[R]_{total}\times{[L]_{total}}}}{2\times[R]_{total}}$

In both equations:

- $f_{bound}$ is the fraction of receptors that have a ligand bound
- $K_D$ is the dissociation constant for the ligand
- $[L]_{total}$ is the total concentration of the ligand in solution
- $[R]_{total}$ is the total concentration of protein (receptor) in solution


A fun paper for reference that blasts most published binding analyses:https://elifesciences.org/articles/57264

In [None]:
# Rewrite the functions described in the previous cell in actual code

def f_bound_hyperbolic(L_tot,K_D):
  f_bound = 1 / (1 + K_D/L_tot)
  return f_bound

def f_bound_quadratic(L_tot, K_D, R_tot):
  # recast into standard quadratic equation for clarity
  a = R_tot
  b = -(R_tot+L_tot+K_D)
  c = L_tot
  f_bound = (-b - np.sqrt(b**2-4*a*c)) / (2*a)
  return f_bound



In [None]:
# Use the functions we just defined to make some plots!

# Define a concentration range and parameters.
L_tot = np.linspace(0.1,500,500)
K_D = 50
R_tot = 300

f_hyper = f_bound_hyperbolic(L_tot,K_D)
f_quad = f_bound_quadratic(L_tot, K_D, R_tot)

fig, axs = plt.subplots(2,1)
axs[0].plot(L_tot, f_hyper)
axs[1].plot(L_tot, f_quad)
axs[1].set_xlabel('$L_{tot}$')


# Can you make the quadratic plot for a wide range of R_tot values?

# Put your code here!

In [None]:
# Now, let's simulate some biding data and see if we can fit it

def simulate_quad_data(ligand_conc_array, R_tot, K_D, noise_std):
  f_bound_array = f_bound_quadratic(ligand_conc_array, K_D, R_tot)
  noise = np.random.normal(scale=noise_std, size = f_bound_array.size)
  result_with_noise = f_bound_array + noise
  return result_with_noise


# simulate data
sim_array = np.logspace(0.01,2.7,6) # Set the x-values for simulated data
noise_std = 0.03 # Set the noise level of the data
sim_y_values = simulate_quad_data(sim_array, R_tot, K_D, noise_std)

# plot the simulated data
fig2, ax2 = plt.subplots()
ax2.plot(sim_array, sim_y_values, ls='', marker='o', color='g')


In [None]:
# Load the nonlinear least squares fit function:
from scipy.optimize import curve_fit


# plot the simulated data again
fig2, ax2 = plt.subplots()
ax2.plot(sim_array, sim_y_values, ls='', marker='o', color='g')

# fit the simulated data

# this is the actual fitting step:
popt, pcov = curve_fit(f_bound_quadratic, sim_array, sim_y_values)

# If you want to constrain one of the parameters, use "bounds":
#popt, pcov = curve_fit(f_bound_quadratic, sim_array, sim_y_values,
#                       bounds=([0,R_tot],[np.inf,R_tot*1.0001]))

# print fit results
print("Kd from quadratic fit:", popt[0])

# plot fit results
f_quad_fit = f_bound_quadratic(L_tot, popt[0], popt[1])
ax2.plot(L_tot, f_quad_fit)

# does fitting with a hyperbolic function work as well?
#popt, pcov = curve_fit(f_bound_hyperbolic, sim_array, sim_y_values)
#f_hyper_fit = f_bound_hyperbolic(L_tot, popt[0])
#ax2.plot(L_tot, f_hyper_fit)
#print("Kd from hyperbolic fit:", popt[0])

In [None]:
# Now, let's learn how to read and analyze some actual data!

# Pull the file from the Github repository

binding_data = pd.read_csv('https://raw.githubusercontent.com/TheTrappist/teaching/main/Biochem6761/AU23/Sample_data/AU23_bindingData_01.csv')
print(binding_data.head())

In [None]:
# plot the loaded data
fig3, ax3 = plt.subplots()
ligand_nM = np.array(binding_data['ligand_nM'])
frac_bound = np.array(binding_data['frac_bound'])
ax3.plot(ligand_nM, frac_bound, ls='', marker='o', color='g')


In [None]:
# Let's fit these data


# Write your own code here to fit the data from the CSV

In [None]:
# How confident should we be in the fit? Let's try resampling

num_samples = 10
rng = np.random.default_rng() # initialize random number generator

fig, ax = plt.subplots()

points_per_sample = len(ligand_nM)

Kd_values = np.zeros(num_samples) # initialize array for storing Kd values
for i in range(num_samples):
  indices = rng.integers(points_per_sample, size=points_per_sample)
  x_vals = ligand_nM[indices]
  y_vals = frac_bound[indices]

  # Create a fit
  popt, pcov = curve_fit(f_bound_quadratic, x_vals, y_vals)

  Kd_values[i] = popt[0] # Save the Kd value from this fit
  ax.plot(x_vals, y_vals, ls='', marker='o')

  # plot the fit
  #x_sim = np.arange(0,160,5)
  #y_sim = f_bound_quadratic(x_sim, popt[0], popt[1])
  #ax.plot(x_sim, y_sim)


#print(Kd_values)
#fig, ax = plt.subplots()
#ax.hist(Kd_values)


# Estimate confidence intervals

#sorted = np.sort(Kd_values)
#lower_ind = int(num_samples * 0.025)
#upper_ind = int(num_samples * 0.975)
#print(sorted[lower_ind], sorted[upper_ind]) # 95% conf. interval!
