Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better sanitize handling in read_sdf #41

Merged
merged 16 commits into from Jun 10, 2021
Merged

better sanitize handling in read_sdf #41

merged 16 commits into from Jun 10, 2021

Conversation

hadim
Copy link
Contributor

@hadim hadim commented Jun 5, 2021

Checklist:

  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Added a news entry.
    • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

Calling dm.sanitize_mol delete the conformers which should not happens by default since you often want the 3d structure when you read an SDF file.

(bug spotted while running unit tests on berth with a recent version of datamol)

@hadim
Copy link
Contributor Author

hadim commented Jun 5, 2021

Note that this PR restore the original behaviour in 0.3.7. (in that sense 0.3.8 is backward incompatible).

@hadim hadim requested a review from maclandrol June 7, 2021 12:51
datamol/io.py Outdated Show resolved Hide resolved
@hadim
Copy link
Contributor Author

hadim commented Jun 10, 2021

It's tricky to copy/paste the conformers in sanitize_mol because of mol = to_mol(dm.to_smiles(mol), sanitize=True) that very often modify the order of the atoms.

I could map the atoms but I am afraid it would be a bit CPU intensive for what it is.

I also tried using cxsmiles=True but it only conserves the first conformer :-(

@hadim
Copy link
Contributor Author

hadim commented Jun 10, 2021

I don't see any other way than using the rdkit sanitize flag and discard the use of dm.sanitize_mol in dm.read_sdf. dm.sanitize_mol will be able to copy props but not conformers (and we should trigger a warning message when a conformer is detected).

If you have a better idea, let me know!

@hadim
Copy link
Contributor Author

hadim commented Jun 10, 2021

Or we could make mol = to_mol(dm.to_smiles(mol), sanitize=True) optional in dm.sanitize_mol.

@maclandrol
Copy link
Member

I don't see any other way than using the rdkit sanitize flag and discard the use of dm.sanitize_mol in dm.read_sdf. dm.sanitize_mol will be able to copy props but not conformers (and we should trigger a warning message when a conformer is detected).

Sanitizing with rdkit is probably the better idea, then let the user call dm.sanitize_mol himself if he needs it.

@hadim
Copy link
Contributor Author

hadim commented Jun 10, 2021

I also tried using cxsmiles=True but it only conserves the first conformer :-(

I just realized that an SDF file can only have one single conformer per molecule, meaning the cxsmiles approach should be fine. That being said I think I prefer to stick to the sanitize from rdkit as said in your last message.

Plan is:

  • remove dm.sanitize_mol from dm.read_sdf and use rdkit sanitize.
  • copy props in dm.sanitize_mol
  • add the cxsmiles trick in sanitize_mol to preserve the first conformer (could be useful) and trigger a warning when more than one conformer is detected.

@hadim hadim requested a review from maclandrol June 10, 2021 16:44
Copy link
Member

@maclandrol maclandrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am perso fine with the current version.

datamol/io.py Show resolved Hide resolved
datamol/mol.py Show resolved Hide resolved
datamol/mol.py Outdated Show resolved Hide resolved
@hadim
Copy link
Contributor Author

hadim commented Jun 10, 2021

Thanks for the review!

@hadim hadim merged commit c056200 into master Jun 10, 2021
@hadim hadim deleted the sanitize-sdf branch June 10, 2021 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants