### A Simple Interactive Tool for Exploring Funcitonal Group Filters

This notebook provides a simple tool for visualizing and exploring functional group filters. 

Uncomment and execute the cell below to install the required dependencies. 

In [None]:
#!pip install useful_rdkit_utils pandas datamol ipywidgets tqdm

In [1]:
import useful_rdkit_utils as uru
import pandas as pd
import datamol as dm
from ipywidgets import interact
from tqdm.auto import tqdm
from ipywidgets import IntSlider, Dropdown

Enable Pandas **progress_apply**

In [2]:
tqdm.pandas()

Use the menu below to set the active rule set

In [3]:
reos = uru.REOS()
set_selector = Dropdown(options=reos.get_available_rule_sets(),
                         description="Rule Set:")
def pick_rule_set(x):
    reos.set_active_rule_sets(x)
    print(f"Active rule set is {reos.get_active_rule_sets()}")
interact(pick_rule_set,x=set_selector);

interactive(children=(Dropdown(description='Rule Set:', options=('Glaxo', 'Dundee', 'BMS', 'PAINS', 'SureChEMB…

For the visualization we need the SMARTS.  Set the REOS object to return the SMARTS when we call **process_smarts** or **process_mol**.

In [4]:
reos.set_output_smarts(True)

Read a file with drug molecules from ChEMBL.  We'll use this as demo input. It's super easy to change the code below to read your own data from a SMILES file. 

In [5]:
#df = pd.read_csv("https://raw.githubusercontent.com/PatWalters/datafiles/main/chembl_drugs.smi", names=["SMILES","Name"],sep=" ")
df = pd.read_csv("../data/chembl_drugs.smi",sep=" ",names=["SMILES","Name"])

Run the filters on the drug set. 

In [6]:
df[['rule_set','rule','smarts']] = df.SMILES.progress_apply(reos.process_smiles).tolist()

  0%|          | 0/1203 [00:00<?, ?it/s]

Take a quick look at the results. 

In [7]:
df

Unnamed: 0,SMILES,Name,rule_set,rule,smarts
0,Nc1ccc(S(=O)(=O)Nc2ccccn2)cc1,CHEMBL700,ok,ok,ok
1,CCC(C)C1(CC)C(=O)[N-]C(=O)NC1=O.[Na+],CHEMBL1200982,ok,ok,ok
2,Cl.N=C(N)N,CHEMBL1200728,ok,ok,ok
3,CC1=CC(=O)c2ccccc2C1=O,CHEMBL590,Glaxo,N1 Quinones,O=C1[#6]~[#6]C(=O)[#6]~[#6]1
4,Cn1c(=O)c2[nH]cnc2n(C)c1=O.Cn1c(=O)c2[nH]cnc2n...,CHEMBL1370561,ok,ok,ok
...,...,...,...,...,...
1198,Cl.Cl.N#Cc1cccc(C(NCC2CC2)c2ccc(F)c(NC(=O)c3cc...,CHEMBL4594272,ok,ok,ok
1199,CN1CCC(COc2cnc(-c3cccc(Cn4nc(-c5cccc(C#N)c5)cc...,CHEMBL4594292,ok,ok,ok
1200,Nc1ncn([C@@H]2O[C@H](CO)[C@@H](O)[C@H]2O)c(=O)n1,CHEMBL1489,ok,ok,ok
1201,COC(=O)Nc1c(N)nc(-c2nn(Cc3ccccc3F)c3ncc(F)cc23...,CHEMBL4066936,ok,ok,ok


Summarize the data and put it into a list of lists that we'll use for the visualization. 

In [8]:
vc = df.query("rule != 'ok'").rule.value_counts()
rule_freq = vc.reset_index().values.tolist()
rule_freq = [(f"{a} ({b})",[a,b]) for a,b in rule_freq]

Set up the interactive visualization.  The trick here is setting up the **observe** method on the **rule_selector**, which contains the rules and the number of molecules matching the rule.  Every time the selection in **rule_selector** changes, we update the range for the **mol_selector** range slider below it. 
* Use the menu below to examine molecules matching specific functional group filters
* Use the slider below the menu to examine specific molecules.  You can also click on the slider then use the left and right arrow keys to move through the molecules. 

In [9]:
rule_selector = Dropdown(layout={'width': 'initial'},options=rule_freq,
                         description="Rule:")
mol_selector = IntSlider(min=0,max=rule_freq[0][1][1]-1,
                        description="Molecule:")

def update_slider_range(*args):
    mol_selector.max = rule_selector.value[1]-1
    mol_selector.value = 0

rule_selector.observe(update_slider_range,'value')

def foo(x,y):
    match_df = df.query("rule == @x")
    row = match_df.iloc[y]
    return dm.viz.lasso_highlight_image(target_molecules=row.SMILES,search_molecules=row.smarts,use_svg=False)
interact(foo,x=rule_selector, y=mol_selector);

interactive(children=(Dropdown(description='Rule:', layout=Layout(width='initial'), options=(('I16 Betalactams…

This cell is similar to the cell above except that it displays up to 6 molecules and the substructure matching the alert.  Use the menu to examine molecules matching the alerts.  

In [10]:
rule_selector = Dropdown(layout={'width': 'initial'},options=rule_freq,
                         description="Rule:")
def show_grid(x):
    match_df = df.query("rule == @x")
    mols_to_show = min(len(match_df),6)
    smiles_list = match_df.SMILES.tolist()[:mols_to_show]
    name_list = match_df.Name.tolist()[:mols_to_show]
    smarts_list = match_df.smarts.tolist()[0]
    return dm.viz.lasso_highlight_image(target_molecules=smiles_list,
                                        search_molecules=smarts_list,
                                        legends=name_list,
                                        n_cols=3,
                                        color_list = [(1.0,0.0,0.0)]*len(smiles_list),
                                        use_svg=False)
    
interact(show_grid,x=rule_selector);

interactive(children=(Dropdown(description='Rule:', layout=Layout(width='initial'), options=(('I16 Betalactams…