# Model Training and Evaluation 

This notebook provides a training guide for the ensemble of deep multitask neural networks
described in our paper. 

By default, the training is performed using the provided dataset of human GCG/GLP-1 analogs, however, below we describe how to train using your own dataset.

## Imports

In [None]:
from pathlib import Path
import warnings
from peptide_models.train_main import main

warnings.filterwarnings('ignore')

## To train the enesamble of deep multi-task neural network models with your own dataset please follow to the steps (1- 3) and run the code cells below. 

### 1) Please update the data path to reflect the location of your training data file. For instance:

``data_path = Path('../data/<my_training_data.xlsx>')``

### 2) Please be aware that the data should be stored in an Excel spreadsheet format, using the '.xlsx' exension, and organised as follows:
- __column 1: header - alias__ 
column with names of your molecules (string)
- __column 2: header - sequence__
amino acid sequences of your molecules (string) 
- __column 3: header - EC50_LOG_T1__
1st target values in the log scale (float)
- __column 4: header - EC50_LOG_T2__
2nd target values in the log scale (float)

### 3) Please modify the output path to designate where your training results, including the models, will be saved. For instance:

``output_path = Path('../my_results', 'training')`` 


In [None]:
# Path to the training dataset
data_path = Path('../data/training_data.xlsx')
# Path to store the trained models and training metadata
output_path = Path('../results', 'training')

## Default Configuration

Please note that the default configuration of the ensemble includes 12 multi-task neural network models, and the training outcomes undergo 6-fold cross validation to ensure robustness and reliability.

## Training Duration 

It's important to note that the training of the model on the provided dataset which consists of 125 human GCG/GLP-1 analogs is estimated to take approximately 2 hours on a processor with specifications 2.3 GHz 8-Core Intel Core i9.

In [None]:
main(out_path=output_path,training_data_path=data_path, seed=21)