Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Stereoisomer Enumeration #21

Merged
merged 14 commits into from
Mar 14, 2024
Merged

Conversation

Feriolet
Copy link
Contributor

No description provided.

Comment on lines 142 to 143
def get_isomers(mol):
opts = StereoEnumerationOptions(tryEmbedding=True,maxIsomers=32,rand=0xf00d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_isomers should be added to arguments and elevated to the level of script arguments (argparse arguments) with the default value 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add please spaces after the commas

if bond.GetStereo() == Chem.BondStereo.STEREOANY:
bond.SetStereo(Chem.rdChem.BondStereo.STEREONONE)
isomers = tuple(EnumerateStereoisomers(mol,options=opts))
return isomers

def init_db(db_fname, input_fname, prefix=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_isomers should be added to arguments with the default value 1

Comment on lines 162 to 170
isomers = get_isomers(mol)
for stereo_index, stereo_mol in enumerate(isomers):
smi = Chem.MolToSmiles(stereo_mol, isomericSmiles=True)
if prefix:
mol_name = f'{prefix}-{mol_name}'
if mol_is_3d(mol):
data_mol.append((mol_name, stereo_index, smi, Chem.MolToMolBlock(stereo_mol)))
else:
data_smi.append((mol_name, stereo_index, smi))
Copy link
Contributor

@DrrDom DrrDom Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        if prefix:
            mol_name = f'{prefix}-{mol_name}'
        if mol_is_3d(mol):
            data_mol.append((mol_name, 0, smi, Chem.MolToMolBlock(stereo_mol)))
        else:
            isomers = get_isomers(mol, max_isomers=max_isomers)
            for stereo_index, stereo_mol in enumerate(isomers):
                smi = Chem.MolToSmiles(stereo_mol, isomericSmiles=True)
                data_smi.append((mol_name, stereo_index, smi))
  1. Enumeration is only needed if input is not a 3D molecule
  2. I added max_isomers to input arguments
  3. I added 0 as a default stereo_id value

@DrrDom
Copy link
Contributor

DrrDom commented Feb 27, 2024

I looked through the changes and made some particular notes. Below are more general ones;

  1. We may remove complex_id argument and use mol_id and stereo_id by default to name individual records. This will simplify code. Breaking backward compatibility should be a minor issue. Default value of stereo_id should be 0
  2. Let's make implementation of alternative protonation scheme in a separate pull request. It is a little bit more complex. We have to add a command line argument, and a user may specify which protonation utility should be used from the list of supported ones. Dimorphite is not the best one and in future I'm thinking to implement support of different tools. Therefore, implementation should not be hard coded.
  3. Please insert spaces where they usually needed and replace stereo_index with stereo_id to be more consistent with database field names.

@Feriolet Feriolet changed the title Adding Stereoisomer Enumeration and Modifying ChemaXon protonation to Dimorphite_dl Adding Stereoisomer Enumeration Feb 28, 2024
@Feriolet
Copy link
Contributor Author

Feriolet commented Mar 4, 2024

I have edited some of the commits based on your suggestions. Is there any more edits that I have missed?

@@ -136,23 +139,38 @@ def restore_setup_from_db(db_fname):

return d, tmpfiles

def get_isomers(mol, max_isomers):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add please default value max_isomers=1


def init_db(db_fname, input_fname, prefix=None):
def init_db(db_fname, input_fname, max_isomers, prefix=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add please default value max_isomers=1
This will make the code more compatible with previous versions, because previously by default we generated one random stereoisomer

if prefix:
mol_name = f'{prefix}-{mol_name}'
if mol_is_3d(mol):
data_mol.append((mol_name, smi, Chem.MolToMolBlock(mol)))
smi = Chem.MolToSmiles(mol, isomericSmiles=True)
data_mol.append((mol_name, 0, smi, Chem.MolToMolBlock(stereo_mol)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be mol, not stereo_mol

@@ -305,15 +324,16 @@ def add_protonation(db_fname, tautomerize=True, table_name='mols', add_sql=''):
fd, output = tempfile.mkstemp() # use output file to avoid overflow of stdout in extreme cases
try:
for smi, _, mol_id in data_list:
tmp.write(f'{smi}\t{mol_id}\n')
tmp.write(f'{smi}\t{mol_id}\t{stereo_id}\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to join mol_id and stereo_id, otherwise stereo_id may not become a part of a title of an output molecule

tmp.write(f'{smi}\t{mol_id}_{stereo_id}\n')

for mol in Chem.SDMolSupplier(output, sanitize=False):
if mol:
mol_name = mol.GetProp('_Name')
mol_name, stereo_id = mol.GetProp('_Name').rsplit(maxsplit=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be adjusted to the previous comment

mol_name, stereo_id = mol.GetProp('_Name').rsplit('_', 1)

@@ -173,6 +173,8 @@ def main():
help='number of cpus. This affects only docking on a single server.')
parser.add_argument('-v', '--verbose', action='store_true', default=False,
help='print progress to STDERR.')
parser.add_argument('--max_isomers', metavar='N_STEREO', type=int, required=False, default=1,
help='maximum number of isomers to enumerate. The default is set to 1.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace isomer with stereoisomer to be maximally explicit

@DrrDom
Copy link
Contributor

DrrDom commented Mar 8, 2024

After those comments will be fixed, I think we will test and accept the PR.

Regarding replacement of chemaxon. I'll implement a generic interface which should enable easy integration of third-party protonation tools and you will be able to commit Dimorphite-DL. This will be really useful - #17.

@Feriolet
Copy link
Contributor Author

Alright, I have edited the naming. Can you check if that is alright before testing it?

@DrrDom
Copy link
Contributor

DrrDom commented Mar 12, 2024

It looks OK, I do not see any issues for main functionality. It can be tested.

The will be one more change required - get_sdf_from_dock_db.py. It should be also adapted to stereoisomers, but I cannot decide how to do this to keep backward compatibility and easiness to use. I suggest to do that after we will implement the current changes. Probably I'll fix this script by myself. It has a little bit complex logic because we use it to extract data from similar database structures but not identical and keeping this compatibility is very convenient for other projects.

@DrrDom
Copy link
Contributor

DrrDom commented Mar 14, 2024

It seems that everything works fine. Hope so)
Thanks a lot for the contribution!
I hope I'll soon implement a new interface to easily integrate custom protonation tools

@DrrDom DrrDom merged commit c2af6c3 into ci-lab-cz:master Mar 14, 2024
@Feriolet Feriolet deleted the new_branch branch March 14, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants