Skip to content

Potential duplicate molecules in FreeSolv Set #40

@bannanc

Description

@bannanc

While typing FreeSolv molecules with smirnoff99Frosst, I found 4 molecules that are potentially duplicated in the FreeSolv set. Below is the code snippet I used that found the duplicates:

import glob
from openforcefield.utils import read_molecules
from openeye import oechem

# untarred mol2files_sybyl.tar.gz
DBpath = "/FreeSolv/mol2files_sybyl/*.mol2"
for file in glob.glob(DBpath):
	mol = read_molecules(file, verbose = False)[0]
	f = file.split('/')[-1]
	c_mol = oechem.OEMol(mol)
	oechem.OEAddExplicitHydrogens(c_mol)
	    smi = oechem.OECreateIsoSmiString(mol)
    f = file.split('/')[-1]
    if smi in isosmiles_to_mol:
        print("File:   %35s %35s" % (f, smi_to_file[smi]))
        print("Title:  %35s %35s" % (c_mol.GetTitle(), isosmiles_to_mol[smi].GetTitle()))
        print("SMILES: %35s %35s" % (smi, oechem.OECreateIsoSmiString(isosmiles_to_mol[smi])))
        print('\n')

    isosmiles_to_mol[smi] = c_mol
    smi_to_file[smi] = f

# OUTPUT: 

#File:                   mobley_4689084.mol2                  mobley_352111.mol2
#Title:               2-acetoxyethyl acetate              2-acetoxyethyl acetate
#SMILES:                    CC(=O)OCCOC(=O)C                    CC(=O)OCCOC(=O)C
#
#
#File:                   mobley_9897248.mol2                  mobley_819018.mol2
#Title:  (2Z)-3,7-dimethylocta-2,6-dien-1-ol (2E)-3,7-dimethylocta-2,6-dien-1-ol
#SMILES:                   CC(=CCCC(=CCO)C)C                   CC(=CCCC(=CCO)C)C
#
#
#File:                   mobley_9913368.mol2                 mobley_4465023.mol2
#Title:             (E)-1,2-dichloroethylene            (Z)-1,2-dichloroethylene
#SMILES:                           C(=CCl)Cl                           C(=CCl)Cl
#
#
#File:                   mobley_9979854.mol2                  mobley_628086.mol2
#Title:      (2R)-1,1,1-trifluoropropan-2-ol     (2S)-1,1,1-trifluoropropan-2-ol
#SMILES:                       CC(C(F)(F)F)O                       CC(C(F)(F)F)O

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions