# 6: Chemspace with SMILES

**Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole**

## Overview

Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library. 

In [None]:
import pandas as pd
from rdkit import Chem

from fegrow import ChemSpace
from fegrow.testing import core_5R83_path, rec_5R83_path, smiles_5R83_path

# Prepare the ligand template

In [None]:
scaffold = Chem.SDMolSupplier(core_5R83_path)[0]

As we are using already prepared Smiles that have the scaffold as a substructure, it is not needed to set any growing vector. 

<div class="alert alert-block alert-warning">
    Ensure that your code is in <b>__name__ == "__main__"</b> when creating a cluster in your scripts,
    particularly when using processes=True. Although jupyter notebook works fine. 
</div>

<div class="alert alert-block alert-danger">
    When using ANI=True for processing the Dask cluster has to use processes because ANI is currently not threadsafe. Thus we create here a LocalCluster and ask ChemSpace to use it. 
</div>

In [None]:
from dask.distributed import LocalCluster

lc = LocalCluster(processes=True, n_workers=None, threads_per_worker=1)

In [None]:
# create the chemical space
cs = ChemSpace(dask_cluster=lc)

In [None]:
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(scaffold)
cs.add_protein(rec_5R83_path)

In [None]:
# load 50k smiles dataset from the study
smiles = pd.read_csv(smiles_5R83_path).Smiles.to_list()

# for testing, sort by size and pick small
smiles.sort(key=len)
# take 5 smallest smiles
smiles = smiles[:5]

In [None]:
# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles, protonate=False)
cs

In [None]:
cs.evaluate()

In [None]:
cs