# Model-guided ligand optimization 


## This notebook exemplifies our framework for model-guided ligand optimization.

### Optimization Objectives for Sequences: 
#### 1) Selective potency at GCGR 
#### 2) Selective potency at GLP-1R
#### 3) High-potency at both receptors.

## Imports

In [1]:
from pathlib import Path
from peptide_models.optimize_main import main

## Running the Software: Parameter Specifications

### To effectively use the software, please configure three key parameters:
- __Number of Generations__ (``NUM_GENERATIONS``) : specify the total number of optimization cycles you wish to execute. This determines how many times the optimazation process will run.
- __Number of Seed Sequences__ (``NUM_SEEDS``) : indicate the number of seed sequences from which samples wil be generated in each optimazation cycle. This sets the initial pool for each generation.
- __Sequences to Save__ (``NUM_SEQ_TO_SAVE``): define how many best sequences, based on predicted potentiates, should be retained within each of the three groups for every generation. 

In [2]:
NUM_GENERATIONS = 3
NUM_SEEDS = 10
NUM_SEQ_TO_SAVE = 50

In [3]:
path2training_data = Path('../data/FASTA_files/training_data_msa.fasta')
# Peptide optimizer path
path2models = Path('../models')

# Path to save FASTA files with the samples
out_path_FASTA = Path('../results', 'ligand_design', 'samples', 'FASTA_files')
# Path to save Excel spreadsheets with predictions
out_path_predictions = Path('../results', 'ligand_design', 'samples', 'predictions')

In [6]:
main(training_data_path=path2training_data,
     path_to_trained_models=path2models,
     out_path_ensemble_predictions=out_path_predictions,
     out_path_fasta_files=out_path_FASTA,
     num_seeds=NUM_SEEDS,
     num_generations=NUM_GENERATIONS,
     num_seqs_to_save=NUM_SEQ_TO_SAVE)

Created path to store data.
Created path to store data.
Loading data ...
Created path to store data.
Created path to store data.
Training set size:125,
Initial number of samples:71304,
Number of repetitions:1665,
Number of samples in the 0 generation:69639
Data set size:69639
Loading models...
Number of models in the ensemble:12
loading model:../models/model_multi0.h5
loading model:../models/model_multi1.h5
loading model:../models/model_multi10.h5
loading model:../models/model_multi11.h5
loading model:../models/model_multi2.h5
loading model:../models/model_multi3.h5
loading model:../models/model_multi4.h5
loading model:../models/model_multi5.h5
loading model:../models/model_multi6.h5
loading model:../models/model_multi7.h5
loading model:../models/model_multi8.h5
loading model:../models/model_multi9.h5
Predicting for samples ...
Total no. of sequences :69639
Found 5 peptides with high potency at both receptors.
Found 137 peptides with high potency against hGCGR.
Found 4144 peptides with