# <span style="color:teal">RBFE Network Tutorial - Setup</span>
This is the RBFE (Relative Binding Free Energy) analysis jupyter notebook for the September 2022 CCPBioSim Workshop.
It includes core as well as <span style="color:purple">extra</span> options.

**<span style="color:teal">Authors</span>**
 - Anna Herz (@annamherz)
 - This is adapted from the FEP BioSimSpace Tutorial written by Jenke Scheen (https://github.com/michellab/BioSimSpaceTutorials/tree/main/04_fep).

**<span style="color:teal">Reading Time:</span>**
~ XX

##### <span style="color:teal">Required knowledge</span> 
 - Basic python
 - Part 1 of this workshop (Introduction to Alchemistry with BioSimSpace)
    - this should include basic knowledge of the principles behind RBFE

##### <span style="color:teal">Learning objectives</span>  
 - Setup an FEP (Free Energy Perturbation) pipeline using BioSimSpace and SOMD.
 - Analyse and plot the results.

### <span style="color:teal">Table of Contents</span>  
1. [Introduction](#intro)
    1.1 [RBFE: A Brief Overview](#theory)     
    2.2 [Implementation in BioSimSpace](#implementation)    
    2.3 [Loading the System](#loading)      
2. [Setup of RBFE simulations](#abfe)   
    2.1 [Theory: A Brief Overview](#theory)     
    2.2 [Implementation in BioSimSpace](#implementation)    
    2.3 [Loading the System](#loading)     

<span style="color:pink">Further reading </span> references some sections of the [LiveComs Best Practices for Alchemical Free Energy Calculations](https://livecomsjournal.org/index.php/livecoms/article/view/v2i1e18378).

**<span style="color:teal">Jupyter Cheat Sheet</span>**
- To run the currently highlighted cell and move focus to the next cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- To run the currently highlighted cell and keep focus in the same cell, hold <kbd>&#x21E7; ctrl</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;
- You can find the full documentation at [biosimspace.org](https://biosimspace.org).


In [1]:
# import libraries
import BioSimSpace as BSS
import os
import glob
import csv
import numpy as np
from alchemlyb.visualisation import plot_mbar_overlap_matrix as _plot_mbar_overlap_matrix
from alchemlyb.visualisation import plot_ti_dhdl as _plot_ti_dhdl
import math
import pandas as pd

# define all the folder locations
main_folder =  os.getcwd()
print(main_folder)
# other folders

ModuleNotFoundError: BioSimSpace currently requires the Sire Python interpreter: www.siremol.org

## <span style="color:teal">Analysis</span>

Once we have obtained our results, we want to analyse them. The basics of this analysis have already been covered in the introduction to alchemistry part of this workshop. As it would take some time to analyse all of these runs now, they have already been analysed to give the RBFE result in a csv file format. These files are located in the 'output' folder. It is best practice to run repeats of the simulations, which is why there are multiple results files, one for each repeat.

First we will look at 

In [None]:
# import from path (github clone) as a conda install is not available yet for freenrgworkflows.
import sys
sys.path.insert(1, '../freenrgworkflows/networkanalysis/')

import networkanalysis
import experiments
import stats


In [None]:
# analyse all the runs in the network and place them into a dictionary for later
bound_pmf_dict = {}  # for the intial results
free_pmf_dict = {}
bound_matrix_dict = {}  # for the overlap matrix
free_matrix_dict = {}
diff_dict = {}  # for the result for that transformation

# we will also create a list of all the perturbation names for the analysis as well
perturbations = []

for line in open("./execution_model/network.dat", "r"):
    lig_0 = line.split()[0]
    lig_1 = line.split()[1]
    pert = f"{lig_0}~{lig_1}"
    pmf_free, overlap_matrix_free = BSS.FreeEnergy.Relative.analyse(f'outputs/SOMD/{pert}/free')
    pmf_bound, overlap_matrix_bound = BSS.FreeEnergy.Relative.analyse(f'outputs/SOMD/{pert}/bound')
    freenrg_rel = BSS.FreeEnergy.Relative.difference(pmf_bound, pmf_free)
    bound_pmf_dict.update({pert: pmf_bound})
    bound_matrix_dict.update({pert: overlap_matrix_bound})
    free_pmf_dict.update({pert: pmf_free})
    free_matrix_dict.update({pert: overlap_matrix_free})
    diff_dict.update({pert: freenrg_rel})
    perturbations.append(pert)


artificially have some results that are not great and look at these more specifically. have these outputs ready to carefully consider.


In [None]:
# check the overlap for each perturbation
for pert in perturbations:
    bound_overlap = bound_matrix_dict[pert]
    BSS.FreeEnergy.Relative.check_overlap(bound_overlap)
    free_overlap = free_matrix_dict[pert]
    BSS.FreeEnergy.Relative.check_overlap(free_overlap)

##### <span style="color:teal">Comparing to experimental binding affinities</span>

Next, we want to visualise our results whilst comparing them to experimental.
In this example here, TYK2 has binding affinities in Ki, and can be converted using ΔG = RTlnK . It is important at this stage to make sure that the units match (kcal/mol).

The binding affinities have already been converted in the experimental.csv files. Using these we will then also calculate the ΔΔG.

In [None]:
# create a dictionary for the experimental values
exper_val_dict = {}
# now we can also create a dictionary with all the experimental values for the perturbations
exper_diff_dict = {}

# open the file with the experimental values
for line in open('experimental.csv', 'r'):
    lig = line.split(",")[0]
    exper = line.split(",")[1]
    exper_err = 0.4
    exper_val_dict.update(lig:(exper, exper_err))

# calculate the experimental RBFEs
for line in open("./execution_model/network.dat", "r"):
    lig_0 = line.split()[0]
    lig_1 = line.split()[1]
    pert = f"{lig_0}~{lig_1}"
    exper_ddG = exper_val_dict[lig_1][0] - exper_val_dict[lig_0][0]
    exper_err = math.sqrt(math.pow(exper_val_dict[lig_0][1], 2) + math.pow(exper_val_dict[lig_1][1], 2))
    exper_value = (exper_ddG, exper_err)
    exper_diff_dict.update({pert:exper_value})

In [None]:
# now we have diff_dict, exper_diff_dict, and a list of all our perturbations.
# create a data frame with all of these and their errors

data = []

for pert in perturbations:
    data.append([pert, diff_dict[pert][0], diff_dict[pert][1], exper_diff_dict[pert][0], exper_diff_dict[pert][1]])

df = pd.DataFrame(data, columns=['perturbation','calc_deltadeltaG','calc_err','exper_deltadeltaG','exper_err'])

print(df)

# this can also be saved as a csv

Now, we can plot our results against the experimental data.

In [None]:
# plot a scatter plot

plt.rc('font', size=12)
fig, ax = plt.subplots(figsize=(8,8))

# get these based on which column the data is in.
x = (np.array(df.iloc[0:len(pert),3])).reshape(-1,1)
y = np.array(df.iloc[0:len(pert), 1])

scatterplot = [plt.scatter(x[:4], y[:4], zorder=10)]

#plotting error bars
y_er = np.array(df.iloc[0:len(pert), 2])
x_er = np.array(df.iloc[0:len(pert), 4])
plt.errorbar(x , y,
            yerr=y_er,
            # xerr=x_er,   # comment this line to hide experimental error bars \
                        # as this can sometimes overcrowd the plot.
            ls="none",
            lw=0.5, 
            capsize=2,
            color="black",
            zorder=5
            )

# plot 1/2 kcal bounds:
plt.fill_between(
                x=[-100, 100], 
                y2=[-100.25,99.75],
                y1=[-99.75, 100.25],
                lw=0, 
                zorder=-10,
                alpha=0.3,
                color="grey")
# upper bound:
plt.fill_between(
                x=[-100, 100], 
                y2=[-99.5,100.5],
                y1=[-99.75, 100.25],
                lw=0, 
                zorder=-10,
                color="grey", 
                alpha=0.2)
# lower bound:
plt.fill_between(
                x=[-100, 100], 
                y2=[-100.25,99.75],
                y1=[-100.5, 99.5],
                lw=0, 
                zorder=-10,
                color="grey", 
                alpha=0.2)

# get the bounds. This can be done with min/max or simply by hand.
all_freenrg_values_pre = []
x = (np.array(x_data.iloc[0:len(pert),3])).tolist()
y = (np.array(y_data.iloc[0:len(pert), 1])).tolist()
all_freenrg_values_pre.append(x)
all_freenrg_values_pre.append(y)

all_freenrg_values = []
for sublist in all_freenrg_values_pre:
    for item in sublist:
        all_freenrg_values.append(item)

min_lim = min(all_freenrg_values)   
max_lim = max(all_freenrg_values)

# for a scatterplot we want the axis ranges to be the same. 
plt.xlim(min_lim*1.3, max_lim*1.3)
plt.ylim(min_lim*1.3, max_lim*1.3)

plt.axhline(color="black", zorder=1)
plt.axvline(color="black", zorder=1)

#plt.xlabel('ΔΔG for experimental (kcal/mol)')
#plt.ylabel('ΔΔG for calculated (kcal/mol)')
plt.ylabel("Computed $\Delta\Delta$G$_{bind}$ / kcal$\cdot$mol$^{-1}$")
plt.xlabel("Experimental $\Delta\Delta$G$_{bind}$ / kcal$\cdot$mol$^{-1}$")

plt.savefig(f'r2_correlation.png')


We can also plot a bar graph of the results.

 <span style="color:pink">Further reading </span>: 8.7
