# Enhance feasibility of solutions generated from *Limeade*

Without more specific restrictions, the search space of *Limeade* is quite large since it just considers basic rules of the molecular structure, making it hard to generate reasonable molecules in practice. This notebook uses the commonly used morgan fingerprint as the measurement of feasibility, and shows that how to enhance this type of feasibility in *Limeade* by just adding two lines of commands.

This notebooks list some examples to show how to use *Limeade* to achieve practical requirements.

The required Python libraries used in this notebook are as follows:
- `Limeade`: the package this notebook demonstrates. It can encode molecule space with given requirements into mathematical equations and generate feasible solutions quickly.
- `rdkit`: used to plot generated molecules.
- `numpy`: used to load the dataset.

In [1]:
from limeade import MIPMol
from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np

We define the following function to check the feasibility. A molecule is claimed as infeasible if it contains a Morgan fingerprint with radius 1 that occurs less than 5 times in ChEMBL. This function will also record these "strange" substructures for later use.

In [2]:
fps = np.load("data/chembl_fps.npy", allow_pickle=True).item()

# checking if mol has strange substructures,
# if it does, put all strange substructures into a dictionary
def has_chembl_substruct(mol, strange_parts):
    fpgen = AllChem.GetMorganGenerator(radius=1)
    ao = AllChem.AdditionalOutput()
    ao.CollectBitInfoMap()
    fp = fpgen.GetSparseCountFingerprint(mol, additionalOutput=ao)
    info = ao.GetBitInfoMap()
    res = True
    for bit in fp.GetNonzeroElements().keys():
        if bit not in fps:
            res = False
            idx = info[bit][0][0]
            env = Chem.FindAtomEnvironmentOfRadiusN(mol, 1, idx)
            submol = Chem.PathToSubmol(mol, env, atomMap={})
            smiles = Chem.MolToSmiles(submol)
            if smiles not in strange_parts:
                strange_parts[smiles] = 1
            else:
                strange_parts[smiles] += 1
    return res

For a list of molecules, we count the number of feasible molecules using the following function:

In [3]:
# count the number of feasible molecules
# a dictionary of strange substructures is also returned
def count_feasible(mols):
    # dictionary of strange substructures
    strange_parts = {}
    # number of feasible molecules
    cnt = 0
    for mol in mols:
        mol.UpdatePropertyCache()
        cnt += has_chembl_substruct(mol, strange_parts)
    strange_parts = sorted(strange_parts.items(), key=lambda item: -item[1])
    return cnt, strange_parts

## Round 1: without excluding substructures

We first count how many feasible molecules generated without excluding any substructures.

In [4]:
# set the number of atoms and types of atoms
N = 20
Mol = MIPMol(atoms=["C", "N", "O", "S"], N_atoms=N)

# set the bounds for the number of each type of atom (optional)
lb = [N // 2, None, None, None]
ub = [None, N // 4, N // 4, N // 4]
Mol.bounds_atoms(lb, ub)

# set the bounds for number of double/triple bonds, and rings (optional)
Mol.bounds_double_bonds(None, N // 2)
Mol.bounds_triple_bonds(None, N // 2)
Mol.bounds_rings(None, 0)

mols = Mol.solve(NumSolutions=1000)

cnt, strange_parts = count_feasible(mols)

print("Number of feasible molecules:", cnt)
print("Strange substructures appearing among generated molecules:")
print(strange_parts)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-03-11


  0%|          | 0/10 [00:00<?, ?it/s]

Discarded solution information
Reset all parameters


100%|██████████| 10/10 [00:01<00:00,  5.01it/s]


1000 molecules are generated after 2.0 seconds.
There are 957 molecules left after removing symmetric and invalid molecules.
Number of feasible molecules: 2
Strange substructures appearing among generated molecules:
[('CC(C)(C)N', 717), ('CC(C)(C)C', 309), ('CC(C)(N)O', 282), ('CC(C)(N)N', 213), ('CC(C)(C)O', 194), ('CC(C)(C)S', 185), ('CC(C)(O)S', 177), ('CC(C)(N)S', 155), ('CC(C)(O)O', 99), ('C=C(C)C', 85), ('CC(N)(N)O', 58), ('CC(C)=S', 56), ('CN(C)S', 55), ('CC(C)S', 54), ('CC(N)(O)O', 47), ('C=NC', 46), ('CC(C)(S)S', 42), ('CN(C)O', 42), ('CN=S', 39), ('CN(C)N', 38), ('CN(O)S', 35), ('CC(C)C', 32), ('OS', 30), ('COS', 30), ('C=C(C)O', 30), ('CC(N)(N)S', 30), ('CN=N', 29), ('CC(N)(O)S', 26), ('C=C(C)S', 25), ('CC(C)=N', 25), ('CN=O', 24), ('CC(N)N', 23), ('CC(C)N', 22), ('CC(O)(O)O', 19), ('C=C(C)N', 18), ('CC(O)(O)S', 17), ('CC(N)(N)N', 15), ('CN(N)S', 15), ('CC(=N)S', 14), ('CC=S', 14), ('CON', 13), ('NSO', 12), ('CC(N)S', 11), ('C=C(N)S', 11), ('NCS', 9), ('CN(N)O', 9), ('CC(N)O

## Round 2: exclude carbon without any implicit hydrogen atom

*Limeade* only generates 2 feasible molecules in Round 1. From those strange substructures as shown in Round 1, we can notice that most molecules are infeasible since they contain a carbon without any implicit hydrogen atom. Let us try to exclude this substructure and check if we can get more feasible molecules.

In [5]:
# exclude substructure [CH0]
Mol.exclude_substructures(["[CH0]"])

mols = Mol.solve(NumSolutions=1000)

cnt, strange_parts = count_feasible(mols)

print("Number of feasible molecules:", cnt)
print("Strange substructures appearing among generated molecules:")
print(strange_parts)

100%|██████████| 10/10 [00:02<00:00,  4.61it/s]


1000 molecules are generated after 2.17 seconds.
There are 970 molecules left after removing symmetric and invalid molecules.
Number of feasible molecules: 62
Strange substructures appearing among generated molecules:
[('CN(C)N', 427), ('CC(N)N', 361), ('CC(N)O', 195), ('CC(N)S', 182), ('CN(N)N', 178), ('CN(N)O', 177), ('CN(C)S', 106), ('CSN', 93), ('CN=S', 84), ('CN(C)O', 69), ('CC(C)S', 63), ('NC(O)S', 57), ('NC(N)O', 54), ('CC(O)S', 54), ('NON', 48), ('CN(O)S', 44), ('CN(N)S', 34), ('NOS', 32), ('NN(N)O', 30), ('OS', 30), ('NOO', 29), ('NC(O)O', 26), ('NC(N)N', 25), ('CON', 24), ('CC=S', 23), ('COS', 22), ('NN(O)O', 22), ('NC(N)S', 18), ('CN(O)O', 18), ('NN(N)N', 16), ('OC(S)S', 11), ('CN(S)S', 11), ('NN=S', 10), ('NC(S)S', 9), ('CC(S)S', 8), ('CNS', 7), ('NN(N)S', 7), ('NN(O)S', 6), ('NCS', 6), ('CC(O)O', 5), ('NCN', 4), ('OC(O)O', 4), ('CSC', 4), ('N=CO', 4), ('CSO', 3), ('ON(O)S', 3), ('NNO', 2), ('OC(O)S', 2), ('SCS', 2), ('CSS', 2), ('OOO', 2), ('NNN', 2), ('N=CS', 2), ('N=CN',

## Round 3: exclude chains of heteroatoms

In Round 2, we indeed get more feasible molecules than Round 1, but most molecules are still infeasible due to heteroatom-heteroatom or a carbon linked with two heteroatoms. We exclude both substructures and then see the improvements.

In [6]:
# exclude heteroatom-heteroatom or heteroatom-carbon-heteroatom
Mol.exclude_substructures(["[N,O,S]~[N,O,S]", "[N,O,S]~C~[N,O,S]"])

mols = Mol.solve(NumSolutions=1000)

cnt, strange_parts = count_feasible(mols)

print("Number of feasible molecules:", cnt)
print("Strange substructures appearing among generated molecules:")
print(strange_parts)

100%|██████████| 10/10 [00:04<00:00,  2.46it/s]

1000 molecules are generated after 4.08 seconds.
There are 855 molecules left after removing symmetric and invalid molecules.
Number of feasible molecules: 787
Strange substructures appearing among generated molecules:
[('CC(C)S', 58), ('CC=S', 13), ('CSC', 1)]





Now we have 787 feasible molecules out of 855 unique molecules after generating 1000 solutions.

Let us see the performance after adding the aforementioned constraints in larger scale.

In [7]:
mols = Mol.solve(NumSolutions=100000)

cnt, strange_parts = count_feasible(mols)

print("Number of feasible molecules:", cnt)
print("Strange substructures appearing among generated molecules:")
print(strange_parts)

100%|██████████| 1000/1000 [07:07<00:00,  2.34it/s]


100000 molecules are generated after 427.42 seconds.
There are 63973 molecules left after removing symmetric and invalid molecules.
Number of feasible molecules: 55498
Strange substructures appearing among generated molecules:
[('CC(C)S', 7837), ('CC=S', 737), ('CSC', 158), ('CC(C)N', 55), ('CN(C)C', 11), ('C=NC', 3)]


We generate near 64k unique molecules, among which more than 86\% molecules are feasible.