# 5: Chemspace streamlined

**Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole**

## Overview

Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library. 

In [None]:
import prody
from rdkit import Chem

import fegrow
from fegrow import ChemSpace, Linkers, RGroups

rgroups = RGroups()
linkers = Linkers()

# Prepare the ligand template

The provided core structure `lig.pdb` has been extracted from a crystal structure of Mpro in complex with compound **4** from the Jorgensen study (PDB: 7L10), and a Cl atom has been removed to allow growth into the S3/S4 pocket. The template structure of the ligand is protonated with [Open Babel](http://openbabel.org/wiki/Main_Page):

In [None]:
init_mol = Chem.SDMolSupplier("sarscov2/mini.sdf", removeHs=False)[0]

# get the FEgrow representation of the rdkit Mol
scaffold = fegrow.RMol(init_mol)

In [None]:
# Show the 2D (with indices) representation of the core. This is used to select the desired growth vector.
scaffold.rep2D(idx=True, size=(500, 500))

Using the 2D drawing, select an index for the growth vector. Note that it is currently only possible to grow from hydrogen atom positions. In this case, we are selecting the hydrogen atom labelled H:40 to enable growth into the S3/S4 pocket of Mpro.

In [None]:
# specify the connecting point
scaffold.GetAtomWithIdx(8).SetAtomicNum(0)

In [None]:
# create the chemical space
cs = ChemSpace()

In [None]:
cs.add_scaffold(scaffold)

# Build a quick library

In [None]:
# building molecules by attaching the most frequently used 5 R-groups
cs.add_rgroups(rgroups.Mol[:3].to_list())

# build more molecules by combining the linkers and R-groups
cs.add_rgroups(linkers.Mol[:3].to_list(), rgroups.Mol[:3].to_list())
cs

### Prepare the protein

The protein-ligand complex structure is downloaded, and [PDBFixer](https://github.com/openmm/pdbfixer) is used to protonate the protein, and perform other simple repair:

In [None]:
# get the protein-ligand complex structure
!wget -nc https://files.rcsb.org/download/7L10.pdb

# load the complex with the ligand
sys = prody.parsePDB("7L10.pdb")

# remove any unwanted molecules
rec = sys.select("not (nucleic or hetatm or water)")

# save the processed protein
prody.writePDB("rec.pdb", rec)

# fix the receptor file (missing residues, protonation, etc)
fegrow.fix_receptor("rec.pdb", "rec_final.pdb")

# load back into prody
rec_final = prody.parsePDB("rec_final.pdb")

In [None]:
# make your chemical space aware of your receptor (important for the next step! )
cs.add_protein("rec_final.pdb")

In [None]:
# build and score the entire chemical space
cs.evaluate()

In [None]:
# verify that the score has been computed
cs

In [None]:
# access the Pandas dataframe directly
cs.df

In [None]:
# you can save the entire ChemSpace into an .SDF file, which can be used to recover ChemSpace
cs.to_sdf("cs_optimised_molecules.sdf")

# or access the molecules directly
cs[0].to_file("best_conformers0.pdb")

In [None]:
# recreate the chemical space
cs = ChemSpace.from_sdf("cs_optimised_molecules.sdf")

In [None]:
# search the Enamine database for the best 3 scoring molecules in your chemical space
# and enrich your chemical space by adding them to the chemical space
# (relies on https://sw.docking.org/)
# cs.add_enamine_molecules(3)