This documents describes the initial set-up of the experiments to establish a baseline of the MDFPs as last made by ShuZe Wang, with updated (2. Oct 2023) conda enviroment (mdfp_carl) and openFF, openMM and forcecield versions.

In [1]:
import openff.toolkit
import openmm
print("ff_name: openff_unconstrained-2.1.0.offxml")
print("ff_version: ", openff.toolkit.__version__)
print("simulation_type: tMD water solution")
print("md_engine: openMM")
print("version: ", openmm.__version__)
print("steps_time: 5.0")

ff_name: openff_unconstrained-2.1.0.offxml
ff_version:  0.14.3
simulation_type: tMD water solution
md_engine: openMM
version:  8.0
steps_time: 5.0


Our starting point is the database that was created by ShuZe, combining REACH, OCHEM, PUBCHEM and CRC.

In [4]:
import pandas as pd
df = pd.read_csv('/localhome/cschiebroek/ShuZe/vp/data cleaning/cleaned_vp_all.tsv', sep='\t')
df = df[df['Temperature'] == 298.15 ]
all_smiles = df['SMILES'].tolist()
df.head()

Unnamed: 0,hash_code,SMILES,Temperature,Vapour Pressure (log10 kPa),Source
0,000c0dc393452e54c19cae2c6501b956,[CH3]-[CH2]-[CH2]-[CH2]-[CH2]-[CH2]-[CH2]-[CH2...,298.15,-14.744727,REACH
2,0026cd05c00286506e4c0051abcde83f,[CH3]-[CH]=[CH]-[CH2]-[CH2]-[CH2]-[CH2]-[CH2]-...,298.15,-0.301067,PUBCHEM
3,0029fb907f0382cb8ba7913301e791ec,[O]=[C](-[OH])-[c]1:[cH]:[cH]:[c]2:[cH]:[cH]:[...,298.15,-6.053323,OCHEM
4,0050ebd8f62aa0dd9dd972b02e277f5b,[Cl]-[CH2]-[O]-[CH2]-[Cl],298.15,0.591625,OCHEM
5,0067f638b002221144b4e1108f7d3ae5,[CH3]-[C]#[C]-[CH](-[CH3])-[CH3],298.15,1.227887,CRC


We get all the smiles, create a conformer, assign stereo from 3D and register this mol. This  is because not for all mols, stereo is defined. Alternatively one could have used EnumerateStereoIsomers. The experimental data for each mol is retrieved from the dataframe and stored in the experimental_data table. 

In [None]:
#lets register the others
from rdkit.Chem import rdmolops
from rdkit.Chem import AllChem
from rdkit import Chem
import json
import lwreg
from lwreg import standardization_lib
from lwreg import utils
config = lwreg.utils.defaultConfig()
# set the name of the database we'll work with:
config['dbtype'] = 'postgresql'
config['dbname'] = 'cs_mdfps'
config['host'] = 'lebanon'
config['user'] = 'cschiebroek'
config['password'] = '' # password is saved in our .pgpass
# we don't want to standardize the molecules:
config['standardization'] = standardization_lib.RemoveHs()
# we want to store conformers
config['registerConformers'] = True
cn = utils._connect(config)
cur = cn.cursor()
for smi in all_smiles:
    print(smi)
    mol = Chem.AddHs(Chem.MolFromSmiles(smi, sanitize=False))
    AllChem.EmbedMolecule(mol, enforceChirality=True, randomSeed=0xf00d)
    rdmolops.AssignStereochemistryFrom3D(mol)
    used_smiles = Chem.MolToSmiles(mol, isomericSmiles=True)
    mol.SetProp("_Name", used_smiles)
    mol.UpdatePropertyCache(strict=False)
    #register mol
    try:
        lwreg.register(config=config,mol=mol)
    except:
        print('registration failed for: ',smi)
        continue
    #get molregno
    hits = lwreg.query(smiles=used_smiles,config=config)
    try:
        molregno = hits[0]
    except IndexError:
        print('query failed for: ',smi)
        continue
    #get experimental data for this molecule
    VP = df[df['SMILES']==smi]['Vapour Pressure (log10 kPa)'].tolist()[0]
    Temperature = df[df['SMILES']==smi]['Temperature'].tolist()[0]
    Source = df[df['SMILES']==smi]['Source'].tolist()[0]
    #create dict of temperature, source and smilese
    metadata = {'Temperature':Temperature,'Source':Source,'SMILES':smi}
    #register experimental data
    #only insert if molregno is not in table
    try:
        cur.execute('insert into cs_mdfps_schema.experimental_data values (%s , %s, %s, %s, %s)',(str(molregno),str(0),json.dumps({}),str(VP),json.dumps(metadata)))
    except:
        pass
cn.commit()



We then use these conformers as input for mdfp_from_confid, and use these mdfps to predict vapour pressure (see Analysis/001_Baseline.ipynb)

confgen_uuid: 906589dd-76fa-4d7b-aa9f-1ee90abe3835