-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
environment.yml
Unblocking Python 3.11 - rdkit to conda-forge channel, remove chemprop for now
#2553
environment.yml
Unblocking Python 3.11 - rdkit to conda-forge channel, remove chemprop for now
#2553
Conversation
@xiaoruiDong there are 3 failures. Two of them are inchi translation related - rdkit gives a dofferent adjacency list than we expect. The other is different - we expect rdkit to fail to translate a molecule, but it suceeds. For the first two errors, we need to check and see if the inchi are equivalent. For the last one, I think we need to just find a new un-translatable molecule. Thoughts? |
I will also add that we have moved about 2 years of RDKit updates: https://github.com/ReactionMechanismGenerator/RMG-Py/actions/runs/6395007324/job/17357683191#step:4:354 |
Thanks. I will look into those InChI Generation. For the un-translatable molecule, I don't have an immediate thought of which example we can use as a replacement. P.S. I hope to reproduce it on my M2 Mac, with a fresh env installation and the conda trick to install osx-64 deps. However, surprisingly, I keep getting dep conflict errors, though the CI for MacOS installation is error free.... I have no trouble installing the new environment on Linux. |
As a follow-up, @JacksonBurns do you have any suggestions for rewriting the tests while avoiding if-elses (for example, for the following case)? RMG-Py/test/rmgpy/molecule/translatorTest.py Lines 234 to 242 in c9cb854
Regarding the other error, It seems that RDKit fixes the bug and now gives the same InChI as openbabel |
@xiaoruiDong thanks for the quick update! For the first two failues - glad to see that the translation are equivalent. But this presents a more interesting problem. The unit test calls RMG-Py/test/rmgpy/molecule/translatorTest.py Lines 72 to 80 in c9cb854
I think RMG is calling (inside of the That would mean that this failure is related to the second one. We do not need to test for RDKit failing to translate to InChI here, since it seems to do better now. Here is my proposal:
Could you implement this, and/or let me know what you think would be better? |
As a record, Inside of the For the example here, RDKit with a version before 2020.03.01 returns the
Sounds good
Sounds good
See my comments above. I also checked the inchi generated by Openbabel, and it is
I can implement this. I just quickly wrote some changes, and will check the CI tomorrow. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2553 +/- ##
==========================================
- Coverage 55.16% 54.94% -0.22%
==========================================
Files 125 125
Lines 37020 37021 +1
==========================================
- Hits 20422 20342 -80
- Misses 16598 16679 +81 ☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
c0472d6
to
5ac1d14
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
87092b0
to
970a9fd
Compare
Thanks to @rwest this PR can actually move forward - I'm going to copy paste the solution to the weird problem (see the hidden comments from me) for reference: # since RDKit 2022.03.1, logging is done using the Python logger instead of the
# Cout streams. This does not affect running RMG normally, but this testing file
# only works properly if it is the only logger
# see https://github.com/rdkit/rdkit/pull/4846 for the changes in RDKit
# clear all other existing loggers
# https://stackoverflow.com/a/12158233
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
# once moved to a more recent python (at least 3.8), just add force=true to this statement
# and remove the above I'm going to let the CI go, and this should now pass, or at least run 😅 |
This comment was marked as outdated.
This comment was marked as outdated.
The patch required from switching RDKit versions is also now causing tons of useless warnings to be printed in the regression testing output. This was resolved in later versions of descripatastorus (the package raising the errors, see here), so I am switching to |
Sounds here like that could put the warnings into a different logger namespace, so may need additional tweaking after the switch. Good luck! |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Stuck between a rock and a hard place. Upgrading to the latest RDKit also allowed upgrading descriptastorus, a dependency of chemprop, that we have our own custom build of (which we should stop using in favor of conda-forge). The latest version of descriptastorus that we can use (that supports Python 3.7) raises obnoxious warnings on import time that we can't (easily) avoid. We can't upgrade to the actual latest version that doesn't do this, because it requires SciPy 1.9 or newer, which is Python 3.8+ only. Temporary patch to get this PR through is to pull the even older version of descriptastorus on the RMG channel, for now. The new version of chemprop no longer uses it at all, so once that transition is made this will be a moot point. Actually transitioning to the latest chemprop (which supports python 3.11/12 will be difficult too). |
This comment was marked as outdated.
This comment was marked as outdated.
Ok, so the version of descriptastorus on the RMG channel still has the annoying warnings problem. I am going to change it back to conda-forge, but expand this PR to also fix the last few conda package issues in the environment file. |
environment.yml
Unblocking Python 3.11 - rdkit, chemprop to conda-forge channel, diffeqpy from pip
environment.yml
Unblocking Python 3.11 - rdkit, chemprop to conda-forge channel, diffeqpy from pipenvironment.yml
Unblocking Python 3.11 - rdkit to conda-forge channel, diffeqpy from pip, remove chemprop for now
49712c5
to
c7bf3e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me and I see the tests are passing. Please see some small comments below.
Philosophically, what is the idea behind exposing the translator functions to Molecule
at all? Looking in the codebase, it seems like there is an awkward mixture of importing the to_inchi
functions from translator
and calling Molecule.to_inchi
in the unit tests. It's not clear what the "canonical" implementation is, especially if there is the chance for the default arguments to diverge.
To me, it feels like it would make more sense to just use the translator
functions in all cases (like the paradigm of using Chem.MolToSmiles
rather than mol.ToSmiles
). Including these functions in Molecule
risks unnecessarily duplicating code and docstrings that could easily become out-of-date. However, maybe people find it more convenient. Let me know what you think.
rmgpy/molecule/molecule.py
Outdated
Convert a molecular structure to an InChI string. Uses | ||
`OpenBabel <http://openbabel.org/>`_ to perform the conversion. | ||
|
||
Available options for InChI backend: 'rdkit-first' (default), | ||
'try-all', 'rdkit', or 'openbabel'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for molecule/translator.py
needs to be updated as well so that it includes rdkit-first (default)
. For InChI and InchI key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this respect, following off of the previous comment, it also feels awkward to need to repeat the arguments for the translator
function in the docstring here. If the supported keyword arguments change in translator
we have to remember to update it here.
@@ -1863,62 +1863,74 @@ def to_single_bonds(self, raise_atomtype_exception=True): | |||
new_mol.update_atomtypes(raise_exception=raise_atomtype_exception) | |||
return new_mol | |||
|
|||
def to_inchi(self): | |||
def to_inchi(self, backend='rdkit-first'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels kind of awkward that the default arguments are duplicated both here in molecule.py
as well as translator.py
. If one default value changes, it doesn't guarantee that the other will mirror that change.
Good question. I guess Jackson may have a more valuable insight from a more professional point of view. To me, the motivation is really about "convenience for users" when implementing something like this in python. Like You don't have to import different modules and usually it ends up with a slightly shorter code. It is also very prevalent, like in PyTorch, there are |
dfb28e7
to
a23b5f3
Compare
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:01:05 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Passed Edge Comparison ✅Original model has 106 species.
Observables Test Case: Aromatics Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:10 liquid_oxidation Failed Core Comparison ❌Original model has 37 species. liquid_oxidation Failed Edge Comparison ❌Original model has 202 species.
Observables Test Case: liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:22 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Failed Edge Comparison ❌Original model has 132 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(O2s-CdN3d) + group(N3d-OCd) + group(Cd-HN3dO) + ring(oxirene) + radical(CdJ-NdO) Non-identical kinetics! ❌
kinetics:
Observables Test Case: NC Comparison
✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:27 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species.
Observables Test Case: Oxidation Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Regression test sulfur:Reference: Execution time (DD:HH:MM:SS): 00:00:00:54 sulfur Passed Core Comparison ✅Original model has 27 species. sulfur Failed Edge Comparison ❌Original model has 89 species.
Observables Test Case: SO2 Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! sulfur Passed Observable Testing ✅Regression test superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:00:34 superminimal Passed Core Comparison ✅Original model has 13 species. superminimal Passed Edge Comparison ✅Original model has 18 species. Regression test RMS_constantVIdealGasReactor_superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:02:26 RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅Original model has 13 species. RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅Original model has 13 species.
Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅Regression test RMS_CSTR_liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:06:08 RMS_CSTR_liquid_oxidation Failed Core Comparison ❌Original model has 37 species. RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌Original model has 206 species.
Observables Test Case: RMS_CSTR_liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_CSTR_liquid_oxidation Passed Observable Testing ✅Regression test fragment:Reference: Execution time (DD:HH:MM:SS): 00:00:00:40 fragment Passed Core Comparison ✅Original model has 10 species. fragment Passed Edge Comparison ✅Original model has 33 species.
Observables Test Case: fragment Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! fragment Passed Observable Testing ✅Regression test RMS_constantVIdealGasReactor_fragment:Reference: Execution time (DD:HH:MM:SS): 00:00:03:06 RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅Original model has 10 species. RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅Original model has 27 species.
Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅beep boop this comment was written by a bot 🤖 |
@JacksonBurns Any thoughts about @jonwzheng 's question? |
I see both perspectives here in terms of convenience for users and ease of development. I think we should probably clean up the code internally so that we don't have copies of defaults anywhere that could possibly go out of date (and at the same time ensure that the class methods are calling those functions and pointing to their docs), but that's a later problem and I believe out of scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me and I agree any internal changes would be outside the scope of this PR. Nice work and thank you for working on this issue. Setting this to approved.
This PR is very close to ready. @rwest left a comment here (#2628 (comment)) approving of the changes to the inchi strings, which is probably the only big 'theory' question in this PR. Since this will be a semi-large change (removing a feature), I think we should add one more review to be safe. @hwpang please take a look over this PR and let us know if anything needs changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Mostly LGTM with some questions
@@ -54,7 +57,7 @@ dependencies: | |||
- coverage | |||
- cython >=0.25.2 | |||
- scikit-learn | |||
- scipy | |||
- scipy <1.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I believe I originally added this because of descriptastorus (which was incompatible with later versions of scipy) but it might not be required. I will remove it and we can see what happens in the CI.
rmgpy/molecule/molecule.py
Outdated
Convert a molecular structure to an InChI string. Uses | ||
`OpenBabel <http://openbabel.org/>`_ to perform the conversion. | ||
|
||
Available options for InChI backend: 'rdkit-first' (default), | ||
'try-all', 'rdkit', or 'openbabel'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an explanation in the docstring on what packages try-all
uses and their sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoruiDong can you do this?
@@ -238,7 +239,7 @@ def test_ch2o2(self): | |||
3 O 1 {1,S} | |||
""" | |||
|
|||
aug_inchi = "InChI=1/CH2O2/c2-1-3/h1H,(H,2,3)/u1,2" | |||
aug_inchi = "InChI=1/CH2O2/c2-1-3/h1-2H/u1,3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the inchi different? Is it due to RDKit update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoruiDong explains it in excellent details here, but in short yes RDKit fixed something: #2553 (comment)
By the way, I see that the documentation build is failing, but it doesn't seem to be related to this PR? Do we know what is causing it? |
I also see that the title suggests we get diffeqpy from pip now, but I don't see relevant code changes, could you update the title to reflect what has been changed? |
The docs build is fixed here: #2628 which we can merge after this PR |
I think this might have got wiped out in a force push. I will add it back and see what happens in the CI. |
8516cbc
to
62b2792
Compare
environment.yml
Unblocking Python 3.11 - rdkit to conda-forge channel, diffeqpy from pip, remove chemprop for nowenvironment.yml
Unblocking Python 3.11 - rdkit to conda-forge channel, remove chemprop for now
@hwpang turns out that one of our other dependencies requires diffeqpy during the conda install, so we end up with two copies of it if we ask for it from pip. We will have to deal with this in a separate PR. |
former is not needed since the descriptastorus (which was incompatible with latest scipy and required this limitation) is no longer in the dep list
62b2792
to
d8d430f
Compare
try-all is a bit of confusing, while its actual behavior is using openbabel-first whenever possible. Although at rmgpy.molecule.translator line 41-46, there is a try/except to check if openbabel is available, and sometimes only RDKit is included in the BACKEND; Given openbabel is by default installed in the RMG-Py environment, it should be reasonable to call it openbabel-first.
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:01:05 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Passed Edge Comparison ✅Original model has 106 species.
Observables Test Case: Aromatics Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:09 liquid_oxidation Failed Core Comparison ❌Original model has 37 species. Non-identical kinetics! ❌
kinetics: liquid_oxidation Failed Edge Comparison ❌Original model has 202 species. Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics:
Observables Test Case: liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:23 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Passed Edge Comparison ✅Original model has 132 species.
Observables Test Case: NC Comparison
✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:24 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species.
Observables Test Case: Oxidation Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Regression test sulfur:Reference: Execution time (DD:HH:MM:SS): 00:00:00:53 sulfur Passed Core Comparison ✅Original model has 27 species. sulfur Failed Edge Comparison ❌Original model has 89 species.
Observables Test Case: SO2 Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! sulfur Passed Observable Testing ✅Regression test superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:00:34 superminimal Passed Core Comparison ✅Original model has 13 species. superminimal Passed Edge Comparison ✅Original model has 18 species. Regression test RMS_constantVIdealGasReactor_superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:02:28 RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅Original model has 13 species. RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅Original model has 13 species.
Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅Regression test RMS_CSTR_liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:06:07 RMS_CSTR_liquid_oxidation Failed Core Comparison ❌Original model has 37 species. RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌Original model has 206 species.
Observables Test Case: RMS_CSTR_liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_CSTR_liquid_oxidation Passed Observable Testing ✅Regression test fragment:Reference: Execution time (DD:HH:MM:SS): 00:00:00:40 fragment Passed Core Comparison ✅Original model has 10 species. fragment Passed Edge Comparison ✅Original model has 33 species.
Observables Test Case: fragment Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! fragment Passed Observable Testing ✅Regression test RMS_constantVIdealGasReactor_fragment:Reference: Execution time (DD:HH:MM:SS): 00:00:03:04 RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅Original model has 10 species. RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅Original model has 27 species.
Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅beep boop this comment was written by a bot 🤖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Approved and merged as agreed upon in RMG subgroup |
Resolves #2462