# Protein Folding: Exploring the Folding Funnel Hypothesis


## Introduction

### The protein folding problem
- General idea: structure --> function (note exception of instrinically disorered proteins).  
- Why it's difficult - huge search space.  Explain "energy landscape"
- Native state (the folded state) is assumed to be the global energy minimum

### The folding funnel
- Folding funnel hypothesis - nature has selected robust folding pathways, such that regardless of what shape a protein starts at in its unfolded state, it finds a way to the native state.  Analogy of a landscape, where all rivers eventually find their way to the ocean, despite getting stuck temporarily in a local valley.
- Include image from wikipedia on folding funnel

### Approaches
- Energy functions 




## Reframe of the funnel visualization

Perhaps another way to describe the funnel hypothesis is that, as one travels through the energy landscape and gets closer to the native state, the energy should decrease.  This offers us a more literal visualization that can be used to describe the actual space of possible conformations.  Take the following 2D example:

*Include image of path through 2D heatmap representing path through energy landscape, going from high to low energy.*

Using each torsion angle as a dimension, so that the total volume is 360 (degrees) ^ n .  Laaaaarrrgge.



##  Testing assumptions

(Clarify about doing energy minimization for each energy calculation to eliminate steric clashes from arbitrary coordinate assignment. Include PyMol visualization of each (could this even be interactive in jupyer notebook vs. just a screenshot)?

We'll need **Biopython** and **OpenMM**.


###   1.   Do PDB files from the protein data bank corespond to the global minimum?

(do this in a script and then import the data for plotting in a code block)

Just read an abstract related to this, go back and check it out in full later. 

https://chemrxiv.org/articles/A_Pareto-optimal_approach_for_protein_structure_evaluation_using_Amber_and_Rosetta_energy_functions_/5314828


###   2.   Does the energy level increase reliably with the distance from the native structure?

(do this in a script and then import the data for plotting in a code block


## Possible search algorithm

If both of those assumptions turn out to be correct, then it might be possible to "follow the funnel" to the global minimum.  Like gradient descent, but instead of local, it exploits the hypothesized global "funneledness".  

I don't know shit about stats, but it seems like it's possible to determine the relative importance of each parameter in a model (in this case, the parameters are backbone angles).  If 20-30 samples for each parameter is enough, this very well could be doable. (https://stats.stackexchange.com/questions/276/how-large-should-a-sample-be-for-a-given-estimation-technique-and-parameters).  

(Could be drastically sped up on a 8-GPU VM on GCP.  Could even create a broker for multiple GPU-endowed machines if it still needs more processing power.  Check out becnhmarks - http://openmm.org/about.html#benchmarks).

Outline of the algorithm (very general idea):

1. Start at the origin
1. Sample n vectors of the form (x1, x2, ..., x_m), where x_i is randomly (or optimally?) selected from: [0, r, -r].  Let's just say r = 10 degrees.  m is the number of backbone angles 
1. Calculate the energy level of the protein at each position after energy minimization.  (Be sure to update the coordinate to reflect the local minimum's position)
1. Use some statistical test (MLE?) to determine which angles are important for minimizing the energy calculation, and translate this (how?) into "best guess vector" for what direction the global minimum is in.  
1. Move according to the "best guess vector" and repeat steps 2-4.
1. TODO - How to determine stopping criteria?  
1. TODO - How to refine (reduce magnitude) the best guess vector so that it can better estimate the global minimum.


## Follow up - the applicability of the folding funnel

The folding funnel model is assumed to be a result of natural selection **(citation needed)**.  As such, randomly selected amino acid sequences should (in theory) not exhibit a folding funnel.  If we try the search algorithm on random peptides, it shouldn't be able to converge.  This could provide evidence of the natural selection hypothesis.  If it does converge, this would suggest that, somehow, the global funnel is intrinsic to polypeptides (how??).

Another option might be to create a very small amino acid sequence that can have a guarenteed minimum energy, either by clever manipulation of the force field (e.g. AMBER), or by exhaustive search via gradient descent (which might be possible for very small proteins).

### References

###### Protein Structure website
Good overview of basic protein structure concepts, especially torsion angles
https://proteinstructures.com/Structure/Structure/Ramachandran-plot.html

###### Topography of funneled landscapes determines the thermodynamics and kinetics of protein folding*
Explores how the slope of the funnel relates to the folding rate
http://www.pnas.org/content/109/39/15763

###### The protein folding funnel and its discontents*
Discusses the folding funnel can be a misleading construct.
http://wavefunction.fieldofscience.com/2011/06/protein-folding-funnel-and-its.html
Original paper (thanks Sam!) - https://www.nature.com/articles/nchembio.565

###### BioPython reference
Used for loading PDB files, manipulating structure, and passing coordinates into OpenMM
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class implemented in Python. Bioinformatics 19: 2308–2310

###### Practical conversion from torsion space to Cartesian space for in silico protein synthesis
Provides algorithm for converting between specified angles and cartesian coordinates...need to check this out - possibly use ChemCoord package, which cites this study.  Need to check if it supports polypeptide chain.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8235&rep=rep1&type=pdf

###### PyRosetta
Trying to use for construction for loading PDB file, manipulating structure, and apssing into OpenMM (as alternative to BioPython, since BioPython can't convet between cartersian / dihedral angles)
https://www.rosettacommons.org/about/pubs

###### AMBER function finding global minimum vs. Rosetta
https://chemrxiv.org/articles/A_Pareto-optimal_approach_for_protein_structure_evaluation_using_Amber_and_Rosetta_energy_functions_/5314828

###### OpenMM references
Used for energy minimization and calculation.
https://simtk.org/plugins/publications/index.php/?group_id=161

**TODO** - cite the actual papers (presumably all of them)