# RECAP
分子を合成する際に逆合成の可能性が高い構造（アミド基やエステル基など11種類の部分構造）に基づいてフラグメント化する。(Retrosynthetic Combinatorial Analysis Procedure)

https://qiita.com/ottantachinque/items/c2b2e2528f097afe8eeb

In [3]:
import pandas as pd
from rdkit import rdBase, Chem
from rdkit.Chem import AllChem, Draw, BRICS, Recap

filename = "molecule.csv"
number_of_generated_structures = 1000

dataset = pd.read_csv(filename, index_col=0)
smiles = dataset.iloc[:, 0]
display(smiles)

sample_1                C=CC#N
sample_2                C=CC=O
sample_3             C=CC(=O)O
sample_4                C=CCCl
sample_5                  C=CC
                    ...       
sample_290    CCCCCCCCCCCCCCCC
sample_291             CC(C)Br
sample_292          CC(C)(C)Cl
sample_293           Cc1ccncc1
sample_294           CCCCCOC=O
Name: SMILES, Length: 294, dtype: object

In [11]:
molecules = [Chem.MolFromSmiles(smiles_i) for smiles_i in smiles]
print("読み込んだ分子の数 :", len(molecules))
print("フラグメントへの分解")
fragments = set()
for mol in molecules:
    fragment = Recap.RecapDecompose(mol, minFragmentSize=1)
    fragment = [leaf.mol for leaf in fragment.GetLeaves().values()]
    fragment = set([Chem.MolToSmiles(mol) for mol in fragment])
    fragments.update(fragment)
print("生成されたフラグメントの数 :", len(fragments))
print("1番目のフラグメント :", list(fragments)[0])

読み込んだ分子の数 : 294
フラグメントへの分解
生成されたフラグメントの数 : 75
1番目のフラグメント : *CC(C)(C)C


In [17]:
# 化学構造生成
generated_structures = BRICS.BRICSBuild([Chem.MolFromSmiles(fragment) for fragment in list(fragments)])
smiles_of_generated_structures = []
for idx, structure in enumerate(generated_structures):
    print(f"{idx + 1}/ {number_of_generated_structures}")
    structure.UpdatePropertyCache(True)
    smiles_of_generated_structures.append(Chem.MolToSmiles(structure))
    if idx + 1 >= number_of_generated_structures:
        break
smiles_of_generated_structures = pd.DataFrame(smiles_of_generated_structures, columns=["SMILES"])
smiles_of_generated_structures.to_csv("generate_structure_RECAP.csv", header=False, index=False)
smiles_of_generated_structures

<generator object BRICSBuild at 0x000001E095AB41C0>


Unnamed: 0,SMILES
