# Makes SMIRKS from Molecules

This notebook will showcase how `chemper` `ClusterGraph`s store SMIRKS decorators for a set of molecular subgraphs. 
Remember the ultimate goal here is to take clustered molecular subgraphs and create a set of SMIRKS patterns that would put those molecular subgraphs into the same groups. 

For example, if your initial clusters had 4 types of carbon-carbon bonds (single, aromatic, double, and triple), you would expect the final SMIRKS patterns to reflect those four categories. Keeping in mind that sorting by just bond order is unlikely to be detailed enough. 

The first step here is to store possible decorators for atoms and bonds in a given cluster. This notebook will attempt to walk through the code that currently exists and goals for how to move forward

In [1]:
# import statements
from chemper.mol_toolkits import mol_toolkit
from chemper.graphs.cluster_graph import ClusterGraph

In [2]:
def smi_file_to_mols(smi_file): 
    """
    returns a list of Mol objects from a SMILES file
    """
    f = open(smi_file, 'r')
    lines = f.readlines()
    f.close()
    
    mols = list()
    
    for l in lines:
        # TODO: add mol title function or property to mol_toolkit.Mol
        (smiles, name) = l.split()
        mols.append(mol_toolkit.MolFromSmiles(smiles))
    
    print("Parsed %i molecules from SMILES file %s" % (len(mols), smi_file))
    return mols

In [3]:
def smirks_list_from_file(smarts_file):
    """
    returns a smirks_list - list of tuples (SMIRKS, label)
    """
    f = open(smarts_file)
    lines = f.readlines()
    f.close()
    
    smirks_list = list()
    for l in lines:
        splits = l.split()
        if splits[0][0] != '#':
            smirks_list.append( (splits[0], splits[1]) )
            
    return smirks_list

In [4]:
def get_smirks_dict(mol, smirks_list):
    """
    mol - chemper Mol object
    smirks_list - list of tuples (SMIRKS, label)
    smirks_indices - indices to track from SMIRKS
    
    Returns a dictionary of listes
    {label: [ {smirks_index: atom_index} ] }
    """
    temp_dict = dict()
    for smirks, label in smirks_list:
        for dic in mol.smirks_search(smirks):
            atom_tuple = (dic[i+1] for i in range(len(dic)))
            temp_dict[atom_tuple] = label
    
    label_dict = dict()
    for atom_tuple, label in temp_dict.items():
        if label not in label_dict:
            label_dict[label] = list()
            
        label_dict[label].append({i+1: atom.get_index() for i, atom in enumerate(atom_tuple) })
    
    return label_dict

In [5]:
def make_cluster_graphs(molecules, smirks_list):
    """
    molecules - list of chemper mols
    smirks_list - list of tuples (SMIRKS, label)
    
    returns a dictionary of chemper ClusterGraphs:
    {label: ClusterGraph} object
    """
    graph_dict = dict()
    for mol in molecules:
        label_dict = get_smirks_dict(mol, smirks_list)
        for label, atom_list in label_dict.items():
            if label not in graph_dict:
                graph_dict[label] = ClusterGraph([mol], [atom_list])
                
            else:
                graph_dict[label].add_mol(mol, atom_list)
    
    return graph_dict

In [6]:
mols = smi_file_to_mols('carbon.smi')
smirks_list = smirks_list_from_file('angles.smarts')
clusters = make_cluster_graphs(mols, smirks_list)

Parsed 43 molecules from SMILES file carbon.smi
72
36
72
36
72
36
72
36
72
36
72
36
72
36
72
36
72
36
36
24
12
72
36
36
24
12
36
24
12
36
24
12
72
36
36
24
72
36
36
24
72
36
36
72
36
36
24
60
72
36
24
48
36
24
72
36
36
24
60
72
36
36
6
72
36
6
72
36
6
72
36
6
6
72
36
6
72
36
60
6
36
24
12
6
72
36
6
72
36
6
36
24
72
36
36
24
72
36
36
24
72
36
36
24
72
36
36
24
72
36
36
24
72
36
36
24
72
36
36
24
36
24
72
36
36
24
72
36
36
24
72
36
36
24


In [7]:
for label, c in clusters.items():
    print(label, c.as_smirks())

RecursionError: maximum recursion depth exceeded while calling a Python object

In [8]:
for a in c.get_atoms():
    print(a.as_smirks())

[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0r0+0:2]
[#1AH0X1x0r0+0:3]
[#1AH0X1x0r0+0:1]
[#6AH3X4x0