# Requirements
For this tutorial we are going to use DynaSig-ML python package. The instructions for installation are available in: https://dynasigml.readthedocs.io/en/latest/install_guide.html

The following python libraries are also required:
-'nrgten'

-'json'

-'sklearn'

-'matplotlib'

-'scipy'

In [None]:
from dynasigml.dynasig_df import DynaSigDF
from dynasigml.dynasig_ml_model import DynaSigML_Model
import numpy as np
import json
from sklearn.metrics import r2_score, roc_auc_score

### Preparing files for DynaSig-ML

After running FlexAID we selected the poses with the top 10 CFs for Mu opioid receptors.

The CF value of each pose is written in the file 'mu_data.txt' and the poses of each are registered in the directory './mu_poses/'.

The values of Emax were obtained from CHEMBl library and are also listed in 'mu_data.txt'. The CHEMBL code for Mu is CHEMBL233.

The tables with all values of Emax are listed if the files: './mu_data.txt' and './kappa_data.txt'.
For this first step we are going to construct three lists:
   * file_names:
           This list contains the path to each pose file
   * exp_data:
           List of Emax and CFs for each pose (needs to be in the same order as file_names)
   * list_of_atypes:
           List of .atomtypes files of all ligands
   * list_of_masses:
           List of .masses files of all ligands, these files specifies the ligand centroid and the name of the mass
   * output_name_file: 
           string with name of the data frame file generated by DynaSigDF
   * exp_labels:
           string with the labels for the values present in the exp_data file
   * betas:
           velues of beta we are going to test

In [None]:
file_names=[]
exp_data=[]
list_of_atypes=[]
list_of_masses=[]
exp_labels=['Emax','CF']
betas=[1]
with open('./mu_data','r') as t1:
    files=t1.readlines()
    for file in files:
        file_names.append('./mu_files/{}'.format(file.split()[0]))
        exp_data.append([file.split()[1]])
        list_of_atypes.append('./mu_files/{}_flexaid.atomtypes'.format(file.split()[0][:-4]))
        list_of_masses.append('./mu_files/{}_flexaid.masses'.format(file.split()[0][:-4]))

### Reading the FlexAID matrix

This function opens the contact matrix and invert the contact interaction values

In [None]:
def flex_aid_matrix():
    matrix=np.array([np.zeros(41)]*41)

    with open("FlexAID.dat","r") as t1:
        texto=t1.readlines()
        for line in texto:
            matrix[int(line.split("=")[0].split('-')[0])][int(line.split("=")[0].split('-')[1])]=-float(line.split("=")[1])
    return(matrix)

### Running DynasigML

We import the function DynaSigDF to create a data frame called 'mu_dsdf' that will contain the dynamical signature and the experimental values of Emax. This data frame will then be used to train LASSO

In [None]:
dsdf = DynaSigDF(file_names, exp_data, exp_labels, output_name_file, beta_values=betas,added_atypes_list=list_of_atypes, added_massdef_list=list_of_masses)

### Running Leave-one-ligand out performance test

At the moment of this publication the leave_one_out test is not implemented inside DynaSigML. We are going to define some functions that are going to construct our training set and our test set for every ligand. In this validation we are going to use all 10 poses of one ligand as a testing set and use all other poses of the other ligands as a training test and repete this analysis to each ligand.
The functions we are going to define are:

In [None]:
def encom_flexaid(filename ,added_atypes='', added_massdef=''):
    print(filename)
    matrix=flex_aid_matrix()
    return ENCoM(filename, interact_mat=matrix, added_atypes=added_atypes, added_massdef=added_massdef)

output_name_file='./dsdf/mu_flexaid_dsdf'
exp_labels=['Emax']
betas=[np.e**(x/2) for x in range(-6, 7)]


dsdf = DynaSigDF(file_names, exp_data, exp_labels, output_name_file, beta_values=betas,added_atypes_list=list_of_atypes, added_massdef_list=list_of_masses, models=[encom_flexaid], models_labels=["ENCoM_flexaid"])
