# Tutorial C - search parameters

In this tutorial you will learn about how to control and modify the retrosynthesis search algorithm

After the completion of this tutorial, you will know:
* How to modify the stock
* How to add custom stock rules
* How to use common search parameters
* How to select and use different search algorithms

We will start with installing packages from pypi

In [None]:
!pip install --quiet aizynthfinder
!pip install --quiet reaction-utils[models]
!pip install --ignore-installed Pillow==9.0.0

### Setup

As with the basic tutorial we will work with public data and models

In [None]:
!mkdir --parents data && download_public_data data

And we will setup the aizynthfinder interface similarly to the basic tutorial as well...




In [None]:
import logging
from aizynthfinder.utils.logging import setup_logger
setup_logger(logging.INFO)

from aizynthfinder.aizynthfinder import AiZynthFinder
from rdkit import Chem
from rdkit.Chem import Descriptors

In [None]:
finder = AiZynthFinder("data/config.yml")
finder.stock.select_all()
finder.expansion_policy.select("uspto")

... and setup it to do retrosynthesis on amenamevir

In [None]:
finder.target_smiles = "Cc1cccc(C)c1N(CC(=O)Nc1ccc(-c2ncon2)cc1)C(=O)C1CCS(=O)(=O)CC1"
display(finder.target_mol.rd_mol)
finder.tree_search(show_progress=True)

In [None]:
finder.build_routes()
finder.routes.reaction_trees[0].to_image()

In [None]:
for leaf in finder.routes.reaction_trees[0].leafs():
    print(leaf.smiles, Descriptors.ExactMolWt(leaf.rd_mol))

### Modify the stock

We see that some of the starting material contains several rings, and are rather heavy. Perhaps you have use-case where you want to constrain the starting material much more without modifying your stock file.

We will look at a few different ways to do this.

First, you can use built-in functionality to constrain the stock using
- Amount in the sock
- Price from stock
- Count of elements

For these amount and price constraints, you would need a stock that contain this information and because we use a version of ZINC without this information - we will look at the last option.

In [None]:
finder.stock.set_stop_criteria({"counts": {"C": 8}})

Here we constrain the stock to anything with eight carbon atoms or less.

In [None]:
finder.prepare_tree() # This is important to reset the previous search!
finder.tree_search(show_progress=True)
finder.build_routes()
finder.routes.reaction_trees[0].to_image()

This is a bit cluncky and imprecise, so instead we will build our own stock class that implements a constraint based on mass.

We need to subclass `StockQueryMixin` that provide some default functionality for a stock. Some of these functionalities can be overriden, but the only one that needs to be implemented is the `__contains__` method.

This method takes a single argument, a `Molecule` object internal to `aizynthfinder`, and should return True if the molecule is in stock or False otherwise.

You can use the `.rd_mol` or `.smiles` properties of the molecule object to access the RDKit molecule object or the SMILES string.

In [None]:
from aizynthfinder.context.stock.queries import StockQueryMixin

class MassCriteriaStock(StockQueryMixin):

    def __init__(self, mass_limit=180):
        self._mass_limit = mass_limit

    def __contains__(self, mol):
          return Descriptors.ExactMolWt(mol.rd_mol) < self._mass_limit

This stock will only return True for molecules with a mass less than a given limit.

Let's load it into our `finder` object and use it in the search.

In [None]:
mass_stock = MassCriteriaStock()
finder.stock.load(mass_stock, "mass")
finder.stock.select("mass")

In [None]:
finder.prepare_tree() # This is important to reset the previous search!
finder.tree_search(show_progress=True)
finder.build_routes()
finder.routes.reaction_trees[0].to_image()

We can also combine our custom stock with the ZINC stock

In [None]:
class MassCriteriaStock2(StockQueryMixin):

    def __init__(self, molecule_stock, mass_limit=180):
        self._molecule_stock = molecule_stock
        self._mass_limit = mass_limit

    def __contains__(self, mol):
        if Descriptors.ExactMolWt(mol.rd_mol) >= self._mass_limit:
            return False
        return mol in self._molecule_stock

mass_stock2 = MassCriteriaStock2(finder.stock["zinc"])
finder.stock.load(mass_stock2, "mass2")
finder.stock.select("mass2")

In [None]:
finder.prepare_tree() # This is important to reset the previous search!
finder.tree_search(show_progress=True)
finder.build_routes()
finder.routes.reaction_trees[0].to_image()


**Exercises**

- Change the mass limit and explore its effect
- Implement a stock class that constrain the number of rings in the starting material

### Modify search parameters

Now we will explore some common search parameters
- Number of iterations
- Search depth
- Search width

and see how they affect the search

For this we will use the regular ZINC stock and a new molecule that `aizynthfinder` have more problem to break down

In [None]:
finder.stock.select("zinc")

In [None]:
finder.target_smiles = "Cc1ccc(F)c(C(=O)NC(CNC(=O)Cn2c(=O)[nH]c3nc(F)c(F)cc32)C)c1"
display(finder.target_mol.rd_mol)

With default search setting we do not find any solved routes..

In [None]:
# These are default parameters if you want to go back to earlier state
# finder.config.search.time_limit = 120
# finder.config.search.iteration_limit = 100
# finder.config.search.max_transforms = 6
finder.prepare_tree()
finder.tree_search(show_progress=True)
finder.build_routes()
finder.extract_statistics()

We will start with increasing the number of iterations in the search. For this it is also adviceable to "disable" the time limit by setting it to something big

In [None]:
finder.config.search.time_limit = 3600
finder.config.search.iteration_limit = 200
# finder.config.search.max_transforms = 6
finder.prepare_tree()
finder.tree_search(show_progress=True)
finder.build_routes()
finder.extract_statistics()

Try to increase the iteration limit it further...

In [None]:
for limit in [300, 400, 500, 1000]:
  finder.config.search.iteration_limit = limit
  finder.prepare_tree()
  finder.tree_search(show_progress=True)
  finder.build_routes()
  print("Number of solved routes: ", finder.extract_statistics()["number_of_solved_routes"])

Display the first route and try to understand why `aizynthfinder`cannot break down the molecule

In [None]:
finder.routes.images[0]

Next, we will try to increase the search depth

In [None]:
finder.config.search.time_limit = 3600
finder.config.search.iteration_limit = 300
finder.config.search.max_transforms = 12
finder.prepare_tree()
finder.tree_search(show_progress=True)
finder.build_routes()
finder.extract_statistics()

Adjust the maximum search depth, iteration limit and display routes. Try to figure out why `aizynthfinder` cannot break down this compound to commerical starting material


To change the search width is a bit more involved, because it depends on the expansion model we used. Here, we will adjust it for the template-based model that we are using, but be aware that other expansion models might work differently

In [None]:
finder.expansion_policy["uspto"].cutoff_number

In [None]:
finder.expansion_policy["uspto"].cutoff_number = 100
finder.config.search.time_limit = 3600
finder.config.search.iteration_limit = 300
finder.config.search.max_transforms = 12
finder.prepare_tree()
finder.tree_search(show_progress=True)
finder.build_routes()
finder.extract_statistics()

### Search algorithm

The default search algorithm in `aizynthfinder` is Monte Carlo Tree Search (MCTS), but there are other alternatives available.

Here will explore two alternatives:
- [Retro*](https://arxiv.org/abs/2006.15820)
- [Multi-objective MCTS](https://www.sciencedirect.com/science/article/pii/S2667318525000066)

We will start with Retro*, which requires a trained model that scores potential solutions that the search produces

In [None]:
!wget https://github.com/MolecularAI/PaRoutes/raw/refs/heads/main/publication/retrostar_value_model.pickle -O retrostar_value_model.pickle

Then we change the search algorithm in the configuration of our `finder` object

In [None]:
finder.config.search.algorithm = "aizynthfinder.search.retrostar.search_tree.SearchTree"
finder.config.search.algorithm_config = {
    "molecule_cost": {
        "cost": "aizynthfinder.search.retrostar.cost.RetroStarCost",
        "model_path": "retrostar_value_model.pickle"
    }
}
finder.prepare_tree()
finder.tree

We will return to amenamevir and the default search parameters

In [None]:
finder.target_smiles = "Cc1cccc(C)c1N(CC(=O)Nc1ccc(-c2ncon2)cc1)C(=O)C1CCS(=O)(=O)CC1"
finder.config.search.iteration_limit = 100
finder.config.search.max_transforms = 6
finder.expansion_policy["uspto"].cutoff_number = 50
finder.tree_search(show_progress=True)

In [None]:
finder.build_routes()
finder.routes.images[0]

**Exercise**

Run a search with both MCTS and Retro* and compare the output of the `finder.extract_statistics()` method.

Next, we will setup a multi-objective MCTS search. We will setup a search with two objectives:

- The fraction of starting material in stock
- The average template occurrence as a simple proxy for route quality

Take it as an exercise to try out other objectives. In principle any scores that we explored in the previous tutorial can be used as an objective

In [None]:
from aizynthfinder.context.scoring import FractionInStockScorer, AverageTemplateOccurrenceScorer
scorer1 = FractionInStockScorer(finder.config)
scorer2 = AverageTemplateOccurrenceScorer(
    finder.config,
    #scaler_params={"name": "squash", "slope": 0.001, "xoffset": 5000, "yoffset": 0}
)
finder.scorers.load(scorer1)
finder.scorers.load(scorer2)

The `AverageTemplateOccurrenceScorer` scorer is an unbound scorer, and on a completely different scale compared to the `FractionInStockScorer` scorer. But that should no matter for the MO-MCTS algorithm.

However, you can try to uncomment the row above that suggest a sigmoid-like function to scale the scorer between 0 and 1.

Next, we will set up the algorithm. The search algorithm is simply selected by choosen "mcts", because `aizynthfinder` will figure out from the other settings the we want a multi-objectiv search.

In principle we only need to set `search_rewards` to a list of names of scorers. But since we ran Retro* above, we need to update the default settings of the other parameters for MCTS as well.

In [None]:
finder.config.search.algorithm = "mcts"
finder.config.search.algorithm_config = {
    "search_rewards": ["fraction in stock", "average template occurrence"],
    "C": 1.4,
    "default_prior": 0.5,
    "use_prior": True,
    "prune_cycles_in_search": True,
    "immediate_instantiation": (),
    "mcts_grouping": None,
    "search_rewards_weights": [],
}
finder.prepare_tree()
finder.tree

We will run retrosynthesis for amenamevir with default settings, but feel free to try other target compounds and/or settings

In [None]:
finder.target_smiles = "Cc1cccc(C)c1N(CC(=O)Nc1ccc(-c2ncon2)cc1)C(=O)C1CCS(=O)(=O)CC1"
finder.config.search.iteration_limit = 100
finder.config.search.max_transforms = 6
finder.expansion_policy["uspto"].cutoff_number = 50
finder.tree_search(show_progress=True)

When extracting routes from a multi-objective search it is also advantageous to extract routes on the Pareto front(s) of the objectives used in the search.

This can be accomplished by providing a list of scorer names to the `scorer` argument of the `build_routes` method.

In [None]:
finder.build_routes(
    scorer=["fraction in stock", "average template occurrence"]
)
finder.extract_statistics()

If you check out the scores of the routes, you see that there are two scores computed. All of the extract routes are solved, but the "average template occurence" show some variance.

In [None]:
import pandas as pd
pd.DataFrame(finder.routes.all_scores)

We can also plot these routes on a two-dimensional plot for the two objectives.

In [None]:
from aizynthfinder.interfaces.gui.utils import pareto_fronts_plot
pareto_fronts_plot(finder.routes)

**Exercise**

Run MO-MCTS with other objects and/or target compounds and plot the Pareto fronts of the found solutions.

That is all for this tutorial!