# FEgrow: An Open-Source Molecular Builder and Free Energy Preparation Workflow

**Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole**

## Overview

Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library. 

In [None]:
import pandas as pd
import prody
from rdkit import Chem

import fegrow
from fegrow import ChemSpace

from fegrow.testing import core_5R83_path, smiles_5R83_core_path, rec_5R83_path

# Prepare the ligand template

In [None]:
scaffold = Chem.SDMolSupplier(core_5R83_path)[0]

As we are using already prepared Smiles that have the scaffold as a substructure, it is not needed to set any growing vector. 

In [None]:
# create the chemical space
cs = ChemSpace()

In [None]:
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(scaffold)
cs.add_protein(rec_5R83_path)

In [None]:
# load 50k Smiles
smiles = pd.read_csv(smiles_5R83_core_path).Smiles.to_list()

# take only 100
smiles = smiles[:20]

# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles)

In [None]:
cs

# Active Learning

In [None]:
# There is nothing to train the model on, so initially "first_random" is used by default
random1 = cs.active_learning(2, first_random=True)
random2 = cs.active_learning(2, first_random=True)

# note the different indices selected (unless you're lucky!)
print(random1.index.to_list(), random2.index.to_list())

## Warning! Please change the logger in order to see what is happening inside of ChemSpace.evaluate. There is too much info to output it into the screen .

In [None]:
# now evaluate the first selection
random1_results = cs.evaluate(random1)

In [None]:
# check the scores, note that they were updated in the master dataframe too
random1_results

In [None]:
# by default Gaussian Process with Greedy approach is used
# note that this time 
greedy1 = cs.active_learning(2)
greedy2 = cs.active_learning(2)
print(greedy1.index.to_list(), greedy2.index.to_list())

In [None]:
# learn in cycles
for cycle in range(2):
    greedy = cs.active_learning(2)
    greedy_results = cs.evaluate(greedy)
    
    # save the new results
    greedy_results.to_csv(f'notebook6_iteration{cycle}_results.csv')

# save the entire chemical space with all the results
cs.to_sdf('notebook6_chemspace.sdf')

In [None]:
computed = cs.df[~cs.df.score.isna()]
print('Computed cases in total: ', len(computed))

In [None]:
cs