# Update folding rates

We will load the folding rates for protein G that we computed in dbfold_test. We willl then re-compute the abc to bc unfolding rate, and likewise the bc to abc folding rate, using new unfolding simulations that were initialized from the abc state. However, we do not actually use this data in the paper since these initial states were drawn from simulation snapshots prior to convergence--these were quite heterogenous and led to an appearence of abbarently fast unfolding of substructure a which was, in fact a result of misclassification. We in fact obtain better predictions of the true refolding rates if we use the unfolding simulations in dbfold_test, whcih were initialized from the native state. Nonetheless, this notebook provides a working example of how this re-computation of folding rates with new data could be done

In [1]:
import dbfold
from dbfold.dbfold import Protein as Protein
import dbfold.analyze_structures as analyze_structures
import dbfold.load_data as load_data
import dbfold.compute_PMF as compute_PMF
import dbfold.folding_rates as folding_rates
import dbfold.kinetic_model as kinetic_model
import dbfold.nonnative_states as nonnative_states
import matplotlib.pyplot as plt



In [3]:
proteinG = Protein('protein G','1igd_data/1igd_0.200_20_Emin.pdb')
proteinG.obtain_PMFs(eq_dir = '1igd_data')
proteinG.obtain_folding_rates()

hi
The equilibrium log datapath has been set as 1igd_data/Equilibrium_log_data.dat and the scorepath as 1igd_data/Equilibrium_scores.dat
PMFs already exist! Loading now...
PMFs successfully loaded
Substructures set for protein
Value of f = 1.7 set for protein
Folding info already exists! Loading now...
Folding info successfully loaded


Load the unfolding data starting from the abc state:

In [1]:
%matplotlib tk
proteinG.runHMM(starting_state='abc', unfolding_dir='1igd_data', score_filename='Unfolding_scores_start_abc.dat' )

NameError: name 'proteinG' is not defined

In [5]:
proteinG.form_clusters(T_A = 10)

A value of T_A = 10 will be used to form clusters
Doing loop clustering...
We have obtained the following clusters: 
 {0: ['abcd'], 1: ['bcd'], 2: ['acd'], 3: ['abc', 'ac'], 4: ['cd'], 5: ['bc', 'c'], 6: ['ab', 'a'], 7: ['b', '∅']} 
 Please make sure that unfolding rates fit well to Arrhenius equation before proceeding 


In [7]:
activation_energies, prefactors, mean_transition_rates, Ns=folding_rates.Arrhenius_fit([ 0.9, 0.925, 0.95, 0.975, 1], [ (3,5), (5,7)],
                                                                         proteinG.temp_unfolding_info['combined_trajs'],
                                                                         proteinG.temp_unfolding_info['PDB files'], 
                                                                         'Pathway 1', min_trans=1)

Computing temperature 0.9
Computing temperature 0.925
Computing temperature 0.95
Computing temperature 0.975
Computing temperature 1
0.9653364152724645
0.969197324924611


In [11]:
proteinG.folding_info['clusters_dic']

{0: ['abcd', 'acd'],
 1: ['bcd', 'cd'],
 2: ['abd', 'ad'],
 3: ['abc', 'ac'],
 4: ['bd', 'd'],
 5: ['bc', 'c'],
 6: ['ab', 'a'],
 7: ['b', '∅']}

In this case, the old and new cluster numbering correspond exactly

In [11]:
proteinG.update_folding_rates([ 0.9, 0.925, 0.95, 0.975, 1],[ (3,5), (5,7)], {3:3, 5:5, 7:7},N_trials=1000, min_trans = 1 )

Computing temperature 0.9
Computing temperature 0.925
Computing temperature 0.95
Computing temperature 0.975
Computing temperature 1
0.9653364152724645
0.969197324924611
Running 1000 bootstrap trials to compute error on unfolding rates...
0 trials completed
100 trials completed
200 trials completed
300 trials completed
400 trials completed
500 trials completed
600 trials completed
700 trials completed
800 trials completed
900 trials completed
Bootstrap complete
Computing cluster free energies...
Inferring unknown folding rates from detailed balance...


We can plot these rates

In [12]:
fontsize = 20
labelsize = 20
TM = 0.879

folding_rates.plot_folding_rates('1igd_data/Folding_info.dat', '$\emptyset/b$ -> (b)c', [(7,5)], fontsize = fontsize, labelsize = labelsize,              
                                                               legend_fontsize = fontsize, colors = ['gold'], temp_norm = TM, legend = False)

plt.xlim((0.82, 1.14))
plt.ylim((5*10**(-11), 10**(-7)))

folding_rates.plot_folding_rates('1igd_data/Folding_info.dat', '(b)c -> a(b)c', [(5,3)], fontsize = fontsize, labelsize = labelsize,              
                                                               legend_fontsize = fontsize, colors = ['mediumblue'], temp_norm = TM, legend = False)
plt.xlim((0.82, 1.14))
plt.ylim((5*10**(-11), 10**(-7)))



  deltaG[i,j]=G[j]-G[i]


(4.9999999999999995e-11, 1e-07)

The (b)c to a(b)c transition, in particular, shows very different quantiative and qualitative behavior as a funciton of temperature as compared to the prediction if Unfolding_scores.dat is used. But we expect these new predictions to be inaccurate for the reasons explained at the beginning of this notebok 

In [10]:
import importlib

importlib.reload(dbfold.folding_rates)



<module 'dbfold.folding_rates' from '/Users/amirbitran/Dropbox/CurrentStuff/Harvard/Shakhnovich Lab/dbfold/dbfold/folding_rates.py'>