# DORAnet Example: Hybrid Pathways
Follow the presentation using the Colab notebook below!

Install doranet from PyPI using `pip`.

In [None]:
!pip install doranet

The `enzymatic` module wraps various assumptions required for an enzymatic network expansion. Similarly, The `synthetic` module can be used for a chemical network expansion. The `post_processing` module wraps functions which produce various kinds of output (PDFs, etc.).

In [None]:
import doranet.modules.enzymatic as enzymatic
import doranet.modules.synthetic as synthetic
import doranet.modules.post_processing as post_processing

Next, we select the molecules to be used for the expansions.

Molecules are written in the form of SMILES (simplified molecular-input line-entry system). A SMILES is a string for describing the structure of a molecule. SMILES of specific molecules can be found on PubChem, Wikipedia and many other websites.

https://pubchem.ncbi.nlm.nih.gov/

https://www.wikipedia.org/


Here we define the starters, helpers, and target as SMILES in sets. We can also use names of files which contain the SMILES. Each file can be a txt, csv, or other similar text file, and within the file each line is one SMILES.

For example: `starters = "starters.txt"`

"helpers" are only used in chemical expansions. They are the molecules that can react with starters, but they can not be the only reactants of a reaction. Helpers are optional, but it should be noted that many chemical rules require common helpers like oxygen and water.

Another difference is, in the final PDF file, helpers are not shown as pictures of molecules. Instead they are noted in the text with the reaction names.



In [None]:
user_starters = {   }  # try adding ethanol, 'CCO'
user_helpers = {  }      # try adding water, oxygen, 'O', 'O=O'
user_target = {  }        # try adding acetic acid, 'CC(O)=O'
job_name = "Acetic_acid_hybrid"

Next, a bio reaction network can be generated using the `enzymatic.generate_network()` function in the `"forward"` direction.  The expansion depth is a single generation.  The forward expansion here uses the enzymatic operators (reaction rules) that came with DORAnet.

The enzymatic operators are stored in a tsv file under `doranet/modules/enzymatic`. Each entry contains the UniProt ID of the known reactions where the operator was from.

In [None]:
forward_network = enzymatic.generate_network(
    job_name = job_name,
    starters = user_starters,
    gen = 1,
    direction = "forward",
    )

A retrosynthetic network with chemical rules can be generated using the `synthetic.generate_network()` function in the `"retro"` direction, similar to the forward network above.  In this case, the "starter" molecule set is simply the target molecule.

In [None]:
retro_network = synthetic.generate_network(
    job_name = job_name,
    starters = user_target,
    helpers = user_helpers,
    gen = 1,
    direction = "retro",
    )

Next, the `post_processing` function is called, which searches the two networks for connections between the starting set of molecules and the target.  This function produces various kinds of output, including a PDF document.  

For the best graph layout in the pathway PDF file, it is recommended to install pygraphviz and Graphviz. They can be installed using this single command:

`conda install conda-forge::pygraphviz`

Otherwise a custom layout will be used for the PDF file, which may have difficulties for complex pathways.

If using pre-existing network files on disk, we can use the names of the network files here.

For example, `networks = {"Acetic_acid_hybrid__forward_saved_network"}`

In [None]:
post_processing.one_step(
    networks = {
        forward_network,
        retro_network
        },
    total_generations = 2,
    starters = user_starters,
    helpers = user_helpers,
    target = user_target,
    job_name = job_name,
    )

# Extra 1 Post-processing in Steps

The one-step post-processing function contains 4 steps within itself:
1. pretreat_networks: combines multiple networks, sanitizes the reactions, and saves a json file on disk.
2. pathway_finder: searches for pathways, saves a txt file with all pathways on disk. Also saves files for Reaxys query.
3. pathway_ranking: ranks pathways, saves a txt file with ranked pathways on disk.
4. pathway_visualization: saves a PDF file with all pathways on disk.

For a new run it might be difficult to get everything right at first. What if you want to redo the ranking using different weights but don't want to redo the pretreatment and pathway search? You can do the post-processing step by step instead of using the one_step function. This is also necessary if you're using Reaxys hits for ranking, as the Reaxys query and result need manual operation between the steps.




```
# Post processing in steps
post_processing.pretreat_networks(
    networks = {
        forward_network,
        retro_network,
        },
    total_generations = 2,
    starters = user_starters,
    helpers = user_helpers,
    job_name = job_name,
    )

post_processing.pathway_finder(
    starters = user_starters,
    helpers = user_helpers,
    target = user_target,
    search_depth = 2,
    max_num_rxns = 2,
    min_rxn_atom_economy = 0.5,
    job_name=job_name,
    )

post_processing.pathway_ranking(
    starters = user_starters,
    helpers = user_helpers,
    target = user_target,
    num_process = 2,
    job_name = job_name,
    )

post_processing.pathway_visualization(
    starters = user_starters,
    helpers = user_helpers,
    num_process = 2,
    job_name = job_name,
    )
```



# Extra 2 Thermodynamic Filters

Thermodynamic calculators can be used during the expansion to filter out reactions with thermodynamic change above the limit. DORAnet does not contain such calculators, but users can use their own. The calculators work differently for chemical and bio expansions.

In a checmial expansion, the calculator function takes the SMILES of a molecule, and produces its thermodynamic value (for example, the enthalpy of formation of this molecule).

In a bio expansion, the calculator function takes a dictionary, which contains the reactants and products of a reaction, and produces the thermodynamic change of this reaction (for example, dG of this reaction).

```python
# Chemical expansion
def mol_dH(SMILES): # example thermodynamic calculator for chemical expansion
    # do something to get the thermodynamic value of this molecule
    return 0  # value of this molecule

retro_network = synthetic.generate_network(
    job_name = job_name,
    starters = user_target,
    helpers = user_helpers,
    gen = 1,
    direction = "retro",
    molecule_thermo_calculator = mol_dH,
    max_rxn_thermo_change = 15,
    )

# Bio expansion
def rxn_dG(rxn_dict): # example thermodynamic calculator for bio expansion
    reactants = rxn_dict["reactants"]
    products = rxn_dict["products"]
    # do something to get the thermodynamic change of this reaction
    return 0  # value of this reaction

retro_network = enzymatic.generate_network(
    job_name = job_name,
    starters = {"OC(=O)C(=O)CCCO"},
    gen = 1,
    direction = "retro",
    rxn_thermo_calculator = rxn_dG,
    max_rxn_thermo_change = 0,
)
```

# Extra 3 Optional Arguments



Below lists all the arguments that can be used in the functions to generate network and post-processing.

synthetic.generate_network
```
generate_network(
    job_name="default_job",
    starters=False,
    helpers=False,
    gen=1,
    direction="forward",
    molecule_thermo_calculator=None,
    max_rxn_thermo_change=15,
    max_atoms=None,  # Use a dictionary of atom symbols and the max number of atoms, for example: {"C": 20, "O": 10} means any product molecule can have at most 20 carbon and 10 oxygen atoms.
    allow_multiple_reactants="default",  # By default True in forward expansion, False in retro expansion. If False, a reactant can react with itself or helpers, but not with other reactants.
    targets=None,  # String or list, set, etc. At the end of expansion, check if targets are in the network.
    )
```

enzymatic.generate_network

Note: no user helpers in enzymatic expansion. DORAnet comes with a list of cofactors for bio rules and they act similar to helpers.


```
generate_network(
    job_name="default_job",
    starters=False,
    gen=1,
    direction="forward",
    rxn_thermo_calculator=None,
    max_rxn_thermo_change=15,
    max_atoms=None,  # For example: {"C": 20, "O": 10}
    allow_multiple_reactants=False,
    targets=None,  # string or list, set, etc.
)
```

post_processing



```
pretreat_networks(
    networks=None,
    total_generations=1,
    starters=None,
    helpers=None,
    job_name="default_job_name",
    remove_pure_helpers_rxns=False, # If True, reactions with only helpers as reactants are removed
    sanitize=True, # If True, molecules cannot be reached from the starters/helpers within total_generations are removed.
    transform_enols_flag=False, # If true, if a product of a reaction is an enol, it is transformed into its keto form.
    molecule_thermo_calculator=None, # Can be used to calculate the thermodynamic change for the enol transformation
)

pathway_finder(
    starters=None,
    helpers=None,
    target=None,
    search_depth=1, # Should not be larger than the total_generations in pretreat_network
    max_num_rxns=1, # Max number of reactions in a pathway
    min_rxn_atom_economy=0.3, # Min atom economy of any reaction in a pathway. Between 0-1.
    job_name="default_job_name",
    consider_name_difference=True, # If true, two reactions with different names but otherwise same are considered different reactions.
)

pathway_ranking(
    starters=None,
    helpers=None,
    target=None,
    weights=None, # Default ranking weights: {"reaction_thermo": 2,
                                              "number_of_steps": 4,
                                              "by_product_number": 2,
                                              "atom_economy": 1,
                                              "salt_score": 0,
                                              "in_reaxys": 0,
                                              "coolness": 0,}
    num_process=1, # Number of processes for multi-processing
    reaxys_result_name=None, # Name of the csv file
    job_name="default_job_name",
    cool_reactions=None,
    molecule_thermo_calculator=None,  # For by-product calculator
    max_rxn_thermo_change=15,
)

pathway_visualization(
    starters=None,
    helpers=None,
    num_process=1, # Number of processes for multi-processing
    reaxys_result_name="default",
    job_name="default_job_name",
    exclude_smiles=None, # A set, list, etc. Pathways with such molecules won't be visualized.
    reaxys_rxn_color="blue",
    normal_rxn_color="black",
)

```



# Extra 4 Reaxys Batch Query

Reference for submitting batch query: https://service.elsevier.com/app/answers/detail/a_id/26151/supporthub/reaxys/p/10958/

pathway_finder generate a txt file containing all reactions in all pathways and can be used as the batch query for Reaxys. It also generates a csv file with 0s as place holders for Reaxys results.

If you have access to Reaxys, you can upload the batch query, and copy the result log to the csv file. They can be used in the pathway ranking and visualization step.

# Extra 5 Install DORAnet on Your Machine


It is recommend to install DORAnet in a virtual environment. Conda is a popular tool for managing virtual environments.

1. Install Miniconda or Anaconda

  Miniconda   https://docs.anaconda.com/free/miniconda/

  Anaconda    https://docs.anaconda.com/free/anaconda/install/


2. Run the installer

  Follow the on-screen instructions


3. Use conda with command line

  Linux and mac: open the built-in terminal

  Windows: open the Anaconda Powershell Prompt


4. Create and Activate an Environment

  If you already have an environment to install DORAnet in:

  `conda activate your_env_name`

  If you want to create a new environment to install DORAnet in:

  `conda create -n your_env_name python=3.10`

  `conda activate your_env_name`


5. Install DORAnet

  `pip install doranet`

  Update DORAnet:

  `pip install doranet --upgrade`


# Extra 6 Common Issues

If you are running post-processing on Windows, it is recommended to run your code under `if __name__ == "__main__":`. This helps avoid potential issues with multi-processing.



```
if __name__ == "__main__":
    post_processing.one_step(...)
```

