# Virtual Reaction Enumeration

Starting from a core scaffold, the RDKit can help us perform virtual reaction enumeration for further screening.


Packages required for this notebook include `rdkit`, `pandas`, and `matplotlib` 


In [None]:
# Import required libraries
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdChemReactions

In [None]:
#Load core scaffold: carboxylic acid with pyrazole, benzisoxazole
core_scaffold_smiles = 'Cc1nn(c2ccc3onc(N)c3c2)c(C(O)=O)c1'
core_scaffold = Chem.MolFromSmiles(core_scaffold_smiles)

In [None]:
core_scaffold

In [None]:
# Load amine substituents from CSV
amine_df = pd.read_csv('amine_substituents.csv')

amine_df.head(20)

In [None]:
# Convert SMILES to molecule objects
amine_df['mol'] = amine_df['SMILES'].apply(Chem.MolFromSmiles)

# Create molecule grid display
mols = amine_df['mol'].tolist()
legends = amine_df['Name'].tolist()

# Display grid of amine reagents
Draw.MolsToGridImage(mols, molsPerRow=4, legends=legends, subImgSize=(200, 150))

In [None]:
# Set up the reaction 
smirks_pattern = '[c:1][C:2](=O)[OH].[N:4]>>[c:1][C:2](=O)[N:4]'
rxn = rdChemReactions.ReactionFromSmarts(smirks_pattern)
rxn

In [None]:
#Let's run an example amide formation

# Define reactants
reactant1 = core_scaffold
reactant2 = Chem.MolFromSmiles('C1COCCN1')

# Run the reaction
products = rxn.RunReactants((reactant1, reactant2))

# Get the product molecules
for product_set in products:
  for mol in product_set:
      # Sanitize the molecule
      Chem.SanitizeMol(mol)
mol

In [None]:
# Ok now let's run all our amines through the reaction
amines = amine_df['mol'].tolist()

all_products = []
prod_smi = []
for amine in amines:
  products = rxn.RunReactants((core_scaffold, amine))
  for product_set in products:
      for mol in product_set:
          Chem.SanitizeMol(mol)
          all_products.append(mol)
          prod_smi.append(Chem.MolToSmiles(mol))

amine_df['product mol'] = all_products
amine_df['product smiles'] = prod_smi
# Display grid of amide products
Draw.MolsToGridImage(all_products, molsPerRow=4, legends=legends, subImgSize=(200, 150))

In [None]:
amine_df.head()

# Now What???

Different projects have different problems they're trying to solve. We can screen molecules through docking, predicting properties, or assessing synthetic accessibility. 

The RDKit has many built in properties (see here: https://www.rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html).

AQME and Robert can also be used to compute properties and evaluate trends.

Let's take a look at some of the properties of the molecules

In [None]:
product_df = amine_df[['Name','product smiles','product mol']]
product_df.head()

In [None]:
from rdkit.Chem import rdMolDescriptors
from rdkit.Chem import Descriptors
from rdkit.Chem import QED

In [None]:
mol = product_df['product mol'][0]

# Simple Properties
mw = Descriptors.MolWt(mol)                      # Molecular weight
logp = Descriptors.MolLogP(mol)                  # LogP
hbd = rdMolDescriptors.CalcNumHBD(mol)           # H-bond donors
hba = rdMolDescriptors.CalcNumHBA(mol)           # H-bond acceptors

# Topological Polar Surface Area
tpsa = Descriptors.TPSA(mol)

# Number of rotatable bonds
rotbonds = Descriptors.NumRotatableBonds(mol)

# Number of aromatic rings
aromatic_rings = Descriptors.NumAromaticRings(mol)

# QED (drug-likeness score)
qed = QED.qed(mol)

# Summary 
print(f'MW:\t\t{mw:.2f}')
print(f'logP:\t\t{logp:.2f}')
print(f'#HBD:\t\t{hbd}')
print(f'#HBA:\t\t{hba}')
print(f'TPSA:\t\t{tpsa:.2f}')
print(f'#RotBonds:\t{rotbonds}')
print(f'#AromRings:\t{aromatic_rings}')
print(f'QED:\t\t{qed:.2f}')

Now let's compute properties for all molecules

In [None]:
# Function to compute properties
def calculate_properties(mol):
  if mol is None:
      return {}
  try:
      return {
          'SMILES': Chem.MolToSmiles(mol),
          'MW': Descriptors.MolWt(mol),
          'LogP': Descriptors.MolLogP(mol),
          'HBD': rdMolDescriptors.CalcNumHBD(mol),
          'HBA': rdMolDescriptors.CalcNumHBA(mol),
          'TPSA': Descriptors.TPSA(mol),
          'RotBonds':Descriptors.NumRotatableBonds(mol),
          'AromaticRings':Descriptors.NumAromaticRings(mol),
          'QED': QED.qed(mol),
          }
  except:
      print('Something went wrong!')
      return {}

In [None]:
products = product_df['product mol']

# Loop through products, computing properties using the function we made
properties_list = []
for mol in products: 
  props = calculate_properties(mol)
  if props:
      properties_list.append(props)
df_properties = pd.DataFrame(properties_list)

df_properties

### _Visualize_

In [None]:
_ = df_properties.hist(figsize=(20,20))

In [None]:
from pandas.plotting import scatter_matrix

In [None]:
# Plot distributions and scatter matrix
_= scatter_matrix(df_properties,diagonal="kde",figsize=(15,15))

In [None]:
filter_df = df_properties.loc[df_properties['LogP'] > 2.0]
print(f'{len(filter_df)}/{len(df_properties)} pass filter')
filter_df.head()

In [None]:
filter_df.to_csv('amide_products_filtered.csv')

# Free-Wilson Analysis Information

A **Free-Wilson analysis** is a classical structure-activity relationship (SAR) method developed by Spencer Free and James Wilson in 1964. It's a **linear additive model** that decomposes molecular activity into contributions from individual substituents. In drug discovery, it can help you fill out your SAR for substituents you may have missed later on.

### Key Concepts:
- **Assumption**: Activity is the sum of contributions from individual substituents
- **Equation**: `Activity = Constant + Σ(substituent contributions)`
- **Applications**: Lead optimization, substituent prioritization, virtual library design

### Pat Walters' Blog: **Practical Cheminformatics**
- Pat Walters has a command line tool to assist in Free Wilson analyses
- New Blog: https://patwalters.github.io/
- Old Blog: https://practicalcheminformatics.blogspot.com/
- Python Tutorials: https://github.com/PatWalters/practical_cheminformatics_tutorials
- Free Wilson Analysis: https://practicalcheminformatics.blogspot.com/2018/05/free-wilson-analysis.html
- Updates to Free Wilson Analysis: https://practicalcheminformatics.blogspot.com/2018/09/a-few-updates-to-free-wilson.html

### Other Useful Resources
- Is life worth living? Blog: https://iwatobipen.wordpress.com/
- The RDKit Blog: https://greglandrum.github.io/rdkit-blog/
- The RDKit Cookbook: https://www.rdkit.org/docs/GettingStartedInPython.html
- StackOverflow, GitHub Pages, Medium Blogs
