# BioPKS Pipeline Tutorial

In this tutorial, we go over and explain many of the settings/ parameters that users can tune in order to run BioPKS Pipeline. 

In [None]:
from biopks_pipeline import biopks_pipeline
from DORA_XGB import DORA_XGB
import warnings
warnings.simplefilter('ignore')

import os
os.chdir('../../BioPKS-Pipeline/notebooks')
print("Now working in:", os.getcwd())

Users need to specify the following input parameters when attempting to synthesize a molecule *in silico* with BioPKS Pipeline:

`pathway sequence`: this can be either of the lists `['pks']` or `['pks', 'bio']` and it determines if only PKSs should be used to synthesize a target molecule, or if post-PKS modifications should be allowed as well. The goal of BioPKS Pipeline is to expand the chemical space accessible by merging PKSs and regular, monofunctional enzymes in biology, so it is likely that using both types of enzymes will result in a higher chance of reaching one's target chemical than using either alone.

`target_smiles`: the SMILES string of the desired target chemical. Regardless of whether the input SMILES string contains stereochemical information about a molecule or not, BioPKS Pipeline will automatically convert the input SMILES to its canonical form and remove any chirality. This is because in this work, we have focused only on getting the correct 2D structure of a target molecule rather than its 3D structure. We anticipate future releases of our tool to be able to achieve the correct 3D structure of a target.

`target_name`: the name of the input target. This will be used to save BioPKS Pipeline's results in the folder `data/results_logs`.

In [1]:
pathway_sequence = ['pks','bio']  # choose between ['pks'] or ['pks','bio']
target_smiles = 'C(C1C(C(C(C(=O)O1)O)O)O)O'
target_name = 'gluconic_lactone'
pks_release_mechanism = 'thiolysis' # choose from 'cyclization' or 'thiolysis'

In addition to the parameters above, users can also specify additional parameters via a config file. The config file that we will be using for this tutorial is in `notebooks/input_config_file_tutorial_1.json`. Following are the parameters that users can tune within the config file:

`pks_starters_filepath`: list of PKS starter units available to RetroTide (default: `../biopks_pipeline/retrotide/data/starters.smi`)

`pks_extenders_filepath`: list of PKS extender units available to RetroTide (default: `"../biopks_pipeline/retrotide/data/extenders.smi"`)

`pks_starters`: list of PKS **starter units** to run BioPKS Pipeline with. This is the list that users can edit to control which starting acyl-CoA derivatives can be used when designing synthesis pathways for small-molecules. In our manuscript, for non-aromatic molecules, we constrained the list of starter units to only malonyl-CoA ("mal"), methylmalonyl-CoA ("mmal"), methoxymalonyl-CoA ("mxmal"), hydroxymalonyl-CoA ("hmal"), and allylmalonyl-CoA ("allylmal"). This list of starter units can be written as: `["mal", "mmal", "mxmal", "hmal", "allylmal"]`. For non-aromatic molecules, however, we allowed all starter units to be used by BioPKS Pipeline. This can be enabled by simply writing "all" for this field in `pks_starters`.

`pks_extenders`: similar to above, this is the list of PKS **extender units** to run BioPKS Pipeline with. This is the list that users can edit to control which extender acyl-CoA derivatives can be used when designing synthesis pathways for small-molecules. In our manuscript, for non-aromatic molecules, we constrained the list of extender units to only malonyl-CoA ("mal"), methylmalonyl-CoA ("mmal"), methoxymalonyl-CoA ("mxmal"), hydroxymalonyl-CoA ("hmal"), and allylmalonyl-CoA ("allylmal"). This list of extender units can be written as: `["mal", "mmal", "mxmal", "hmal", "allylmal"]`. For non-aromatic molecules, however, we allowed all starter units to be used by BioPKS Pipeline. This can be enabled by simply writing "all" for this field in `pks_starters`.

`pks_similarity_metric`: chemical similarity metric to use for ranking RetroTide's PKS products. We recommend using `mcs_without_stereo`, which prioritizes the maximum common substructure (MCS) between an intermediate PKS molecule and the final, target product. In performing this comparison, we ignore any stereochemical considerations. Other options include `atompairs` and `atomatompath`. Users are also welcome to add their own chemical similarity metrics of interest in `biopks_pipeline/retrotide/retrotide.py`.

`non_pks_similarity_metric`: chemical similarity metric to use for ranking DORAnet's post-PKS products. We also recommend using `mcs_without_stereo` here. This metric enables BioPKS Pipeline to retrieve the most chemically similar post-PKS product with respect to the target chemical in the event that the final target is not reached.

In [None]:
#config_filepath = 'input_config_file_tutorial_1.json'
config_filepath = os.path.join('input_config_file_tutorial_1.json')

In [None]:
post_pks_rxn_model = DORA_XGB.feasibility_classifier(cofactor_positioning = 'add_concat',
                                                     model_type = "spare")

In [None]:
biopks_pipeline_object = biopks_pipeline.biopks_pipeline(
                                             pathway_sequence = pathway_sequence,
                                             target_smiles = target_smiles,
                                             target_name = target_name,
                                             feasibility_classifier = post_pks_rxn_model,
                                             pks_release_mechanism = pks_release_mechanism,
                                             config_filepath = config_filepath)

In [None]:
### ----- Start synthesis -----
if __name__ == "__main__":
    biopks_pipeline_object.run_combined_synthesis(max_designs = 4)
    #biopks_pipeline_object.save_results_logs()