# FRETpredict tutorial (Hsp90)

In [1]:
import MDAnalysis
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import os
import pandas as pd
import seaborn as sns
import yaml
from FRET import FRETpredict
import warnings

warnings.filterwarnings('ignore')

### Quick biological background

Cartoon representation of the open structure of Hsp90 (apoprotein). Chains A and B are colored in red and blue, respectively. On each chain, the residues that will be used for FRET Efficiency calculations are represented as spheres.

![title](Hsp90_chains.png)

Closed structure of Hsp90 (Holo protein, with ATP)

![title](Hsp90_closed_chains.png)

# Single protein structure

In this part of the tutorial, we will perform FRET Efficiency calculations using only a single residue pair to put the rotamer libraries on.<br>
First, let's have a look at the possible rotamer libraries we can use and how they're called:

In [2]:
with open('lib/libraries.yml') as f:
    libraries = yaml.load(f)
libraries

{'ATTO 390 C2R': {'author': 'D Montepietra, G Tesei, JM Martins, MBA Kunze, RB Best, K Lindorff-Larsen',
  'citation': 'TBD',
  'filename': 'T39_C2R_cutoff30',
  'licence': 'GPLv3',
  'mu': ['C1', 'C10 and resname T39'],
  'negative': [],
  'positive': [],
  'r': ['C7 and resname T39']},
 'ATTO 390 L1R': {'author': 'D Montepietra, G Tesei, JM Martins, MBA Kunze, RB Best, K Lindorff-Larsen',
  'citation': 'TBD',
  'filename': 'T39_L1R_cutoff30',
  'licence': 'GPLv3',
  'mu': ['C1', 'C10 and resname T39'],
  'negative': [],
  'positive': [],
  'r': ['C7 and resname T39']},
 'ATTO 425 C2R': {'author': 'D Montepietra, G Tesei, JM Martins, MBA Kunze, RB Best, K Lindorff-Larsen',
  'citation': 'TBD',
  'filename': 'T42_C2R_cutoff30',
  'licence': 'GPLv3',
  'mu': ['C1', 'C10 and resname T42'],
  'negative': [],
  'positive': [],
  'r': ['C7 and resname T42']},
 'ATTO 425 L1R': {'author': 'D Montepietra, G Tesei, JM Martins, MBA Kunze, RB Best, K Lindorff-Larsen',
  'citation': 'TBD',
  'file

Every Rotamer Library name is composed of three parts: the producer (AlexaFluor, ATTO, Lumiprobe), the peak wavelength (e.g. 488, 550, 647), and the linker that connects the dye to the protein (C1R, C2R, C3R, L1R, L2R, B1R).<br>
To learn more about the rotamer libraries, see the Tutorial "Generate new rotamer libraries".

Now, we will select the parameters for the FRET Efficiency calculation: 
- The residues to put the rotamer libraries on, their protein chain, and the corresponding experimental data file.
For this tutorial we're going to use a residue pair which FRET Efficiency was obtained on an open (APO) structure of Hsp90. 
- The rotamer libraries we will use: AlexaFluor dyes 594 and 568, with C1R linkers.
- The Universe object for the protein structure.

In [3]:
res1 = 452
chain_res1 = 'A'
dye_1 = 'AlexaFluor 594'
linker_1 = 'C1R'

res2 = 637
chain_res2 = 'B'
dye_2 = 'AlexaFluor 568'
linker_2 = 'C1R'

# Experimental data to compare our predictions.
exp_data_file = '452_532_637_647N_APO.txt'

# Create MDAnalysis.Universe object for the protein (open Hsp90)
u = MDAnalysis.Universe('test_systems/Hsp90/openHsp90.pdb')

Let's create an instance of the FRETpredict class

In [5]:
# Clustering cutoff used for the creation of the rotamer library. It can be 10, 20, or 30.
cutoff = 10

FRET = FRETpredict(protein=u, residues=[res1, res2], temperature=293, 
                   chains=[chain_res1, chain_res2], 
                   donor=dye_1, acceptor=dye_2, electrostatic=True,
                   libname_1=f'{dye_1} {linker_1} cutoff{cutoff}',
                   libname_2=f'{dye_2} {linker_2} cutoff{cutoff}', 
                   output_prefix=f'tutorials/test/E{cutoff}')

Run FRET efficiency calculations.

In [6]:
FRET.run()


Frame 1/1

Done.


Retrieve experimental data from file for the comparison.

In [6]:
filename = 'test_systems/Hsp90/MDA/histograms/{}'.format(exp_data_file)

if os.path.isfile(filename):
    
        df = pd.read_csv(filename, skiprows=6, nrows=40, header=None, sep='\s+')
        Ex = np.average(df[0], weights=df[1])

Create DataFrame of the data (experimental and predicted) and show the results

In [7]:
results = []

data = pd.read_pickle(r'tutorials/test/E{:d}-data-{:d}-{:d}.pkl'.format(cutoff, res1, res2))
results.append({'res': '{:d}-{:d}'.format(res1, res2),
                'k2': float(data.loc['k2']),
                'Ex': Ex,
                'Es': float(data.loc['Estatic']),
                'Ed': float(data.loc['Edynamic1']),
                'Ed2': float(data.loc['Edynamic2']),
                'diffS': np.abs(Ex-data.loc['Estatic']),
                'diffD': np.abs(Ex-data.loc['Edynamic1']), 
                'diffD2': np.abs(Ex-data.loc['Edynamic2'])})
        
results_df = pd.DataFrame(results).set_index('res')

In [8]:
# Show results
results_df

Unnamed: 0_level_0,conformation,k2,Ex,Es,Ed,Ed2,chi2S,chi2D,chi2D2
res,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
452-637,open,0.936203,0.592377,0.72088,0.948272,0.985024,0.128502,0.355895,0.392647


**For single frame or structures predictions, the best averaging method is "Static" (Es).**

# Protein trajectory (open conformations)

Now, we will see how to predict FRET Efficiency values for protein trajectories. The procedure is exactly the same, as the only difference is the Universe objects now containing multiple structures.<br>
We will use the same residues and chromophores as in the previous example (here reported for clarity).

In [8]:
res1 = 452
chain_res1 = 'A'
dye_1 = 'AlexaFluor 594'
linker_1 = 'C1R'

res2 = 637
chain_res2 = 'B'
dye_2 = 'AlexaFluor 568'
linker_2 = 'C1R'

# Experimental data to compare our predictions.
exp_data_file = '452_532_637_647N_APO.txt'

# Create universe for protein trajectory (3 different open conformations of Hsp90).
u_traj = MDAnalysis.Universe('test_systems/Hsp90/conf.pdb', 
                             'test_systems/Hsp90/Hsp90_open_all.xtc')

Create instance of the FRETpredict class

In [9]:
# Clustering cutoff used for the creation of the rotamer library. It can be 10, 20, or 30.
cutoff = 10

FRET = FRETpredict(protein=u_traj, residues=[res1, res2], temperature=293, 
                   chains=[chain_res1, chain_res2], 
                   donor=dye_1, acceptor=dye_2, electrostatic=True,
                   libname_1=f'{dye_1} {linker_1} cutoff{cutoff}',
                   libname_2=f'{dye_2} {linker_2} cutoff{cutoff}', 
                   output_prefix=f'tutorials/test/E{cutoff}_traj')

Run FRET Efficiency calculations

In [10]:
FRET.run()


Frame 1/3

Frame 2/3

Frame 3/3

Done.


Retrieve experimental data from file for the comparison

In [11]:
filename = 'test_systems/Hsp90/MDA/histograms/{}'.format(exp_data_file)

if os.path.isfile(filename):

        df = pd.read_csv(filename, skiprows=6, nrows=40, header=None, sep='\s+')
        Ex = np.average(df[0], weights=df[1])

Create DataFrame for the data 

In [12]:
results_traj = []

data_traj = pd.read_pickle(r'tutorials/test/E{:d}_traj-data-{:d}-{:d}.pkl'.format(cutoff, 
                                                                                  res1, res2))
results_traj.append({'res': '{:d}-{:d}'.format(res1, res2),
                     'k2': data_traj.loc['k2', 'Average'],
                     'Ex': Ex,
                     'Es': data_traj.loc['Estatic', 'Average'],
                     'Ed': data_traj.loc['Edynamic1', 'Average'],
                     'Ed2': data_traj.loc['Edynamic2', 'Average'],
                     'diffS': np.abs(Ex-data_traj.loc['Estatic', 'Average']),
                     'diffD': np.abs(Ex-data_traj.loc['Edynamic1', 'Average']), 
                     'diffD2': np.abs(Ex-data_traj.loc['Edynamic2', 'Average'])})
        
results_traj_df = pd.DataFrame(results_traj).set_index('res')

In [13]:
# Show results
results_traj_df

Unnamed: 0_level_0,k2,Ex,Es,Ed,Ed2,diffS,diffD,diffD2
res,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
452-637,0.603643,0.592377,0.023215,0.025707,0.950529,0.569163,0.56667,0.358151


**For trajectory data or multiple protein structures, the best averaging method is "Dynamic2" (Ed2)**