Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

environment.yml Unblocking Python 3.11 - rdkit to conda-forge channel, remove chemprop for now #2553

Merged
merged 15 commits into from
Mar 13, 2024

Conversation

JacksonBurns
Copy link
Contributor

Resolves #2462

@JacksonBurns
Copy link
Contributor Author

@xiaoruiDong there are 3 failures. Two of them are inchi translation related - rdkit gives a dofferent adjacency list than we expect. The other is different - we expect rdkit to fail to translate a molecule, but it suceeds.

For the first two errors, we need to check and see if the inchi are equivalent. For the last one, I think we need to just find a new un-translatable molecule. Thoughts?

@JacksonBurns
Copy link
Contributor Author

I will also add that we have moved about 2 years of RDKit updates: https://github.com/ReactionMechanismGenerator/RMG-Py/actions/runs/6395007324/job/17357683191#step:4:354

@xiaoruiDong
Copy link
Contributor

xiaoruiDong commented Oct 3, 2023

Thanks. I will look into those InChI Generation. For the un-translatable molecule, I don't have an immediate thought of which example we can use as a replacement.

P.S. I hope to reproduce it on my M2 Mac, with a fresh env installation and the conda trick to install osx-64 deps. However, surprisingly, I keep getting dep conflict errors, though the CI for MacOS installation is error free.... I have no trouble installing the new environment on Linux.

@xiaoruiDong
Copy link
Contributor

xiaoruiDong commented Oct 3, 2023

@xiaoruiDong there are 3 failures. Two of them are inchi translation related - rdkit gives a dofferent adjacency list than we expect. The other is different - we expect rdkit to fail to translate a molecule, but it succeeds.

For the first two errors, we need to check and see if the inchi are equivalent. For the last one, I think we need to just find a new un-translatable molecule. Thoughts?

As a follow-up, 'InChI=1/CH2O2/c2-1-3/h1H,(H,2,3)/u1,2' (the original test target) and 'InChI=1/CH2O2/c2-1-3/h1-2H/u1,3' (generated after upgrading rdkit) correspond to the same molecule (tested with and without the change of this PR)
image

@JacksonBurns do you have any suggestions for rewriting the tests while avoiding if-elses (for example, for the following case)?

def test_ch2o2(self):
adjlist = """
1 C 1 {2,S} {3,S}
2 O 0 {1,S}
3 O 1 {1,S}
"""
aug_inchi = "InChI=1/CH2O2/c2-1-3/h1H,(H,2,3)/u1,2"
self.compare(adjlist, aug_inchi)

Regarding the other error, It seems that RDKit fixes the bug and now gives the same InChI as openbabel 'InChI=1S/CH2N2/c1-3-2/h1H2'. The early version of RDKit generated a wrong InChI ('InChI=1S/CN2/c1-3-2') and couldn't pass the check functiontranslator._check_output. I think it can be hard to find a new one, can we temporarily hibernate this unit test until someone reports a new one?

@JacksonBurns
Copy link
Contributor Author

@xiaoruiDong thanks for the quick update!

For the first two failues - glad to see that the translation are equivalent. But this presents a more interesting problem. The unit test calls self.compare, which does this:

def compare(self, adjlist, aug_inchi):
spc = Species(molecule=[Molecule().from_adjacency_list(adjlist)])
spc.generate_resonance_structures()
ignore_prefix = r"(InChI=1+)(S*)/"
exp = re.split(ignore_prefix, aug_inchi)[-1]
comp = re.split(ignore_prefix, spc.get_augmented_inchi())[-1]
assert exp == comp

I think RMG is calling (inside of the get_augmented_inchi) OpenBabel, which is returning the original test target.

That would mean that this failure is related to the second one. We do not need to test for RDKit failing to translate to InChI here, since it seems to do better now.

Here is my proposal:

  1. Change the environment file to specify a version of RDKit which does not have the bug the previous version had.
    • we could easily just do >=2022.9.1 which is the version the CI picked
  2. Add a pytest.skip to the third failing unit test, with reason= this unit test checks for a bug which has been patched in version of RDKit of the above version or newer
  3. Change the code for Molecule.get_augmented_inchi to call RDKit instead of OpenBabel

Could you implement this, and/or let me know what you think would be better?

@JacksonBurns JacksonBurns added the Python 3.11 Transition PRs and Issues related to transitioning from Python 3.7 to 3.11 label Oct 3, 2023
@xiaoruiDong
Copy link
Contributor

xiaoruiDong commented Oct 4, 2023

@xiaoruiDong thanks for the quick update!

For the first two failues - glad to see that the translation are equivalent. But this presents a more interesting problem. The unit test calls self.compare, which does this:

def compare(self, adjlist, aug_inchi):
spc = Species(molecule=[Molecule().from_adjacency_list(adjlist)])
spc.generate_resonance_structures()
ignore_prefix = r"(InChI=1+)(S*)/"
exp = re.split(ignore_prefix, aug_inchi)[-1]
comp = re.split(ignore_prefix, spc.get_augmented_inchi())[-1]
assert exp == comp

I think RMG is calling (inside of the get_augmented_inchi) OpenBabel, which is returning the original test target.

As a record,

Inside of the get_augmented_inchi, it is translator.to_inchi that is being called. For radical molecules, the normal inchi is created by _write, with the backend being 'rdkit-first', and the u layer is created by inchiutil's method create_augmented_layers and compose_aug_inchi. In _writer, RMG will convert an RMG molecule to an rdkitmol and use Chem.inchi.MolToInchi on it to obtain the inchi.

For the example here, RDKit with a version before 2020.03.01 returns the InChI=1/CH2O2/c2-1-3/h1H,(H,2,3), while some later version (I don't know exactly which) returns InChI=1/CH2O2/c2-1-3/h1-2H, the u indices are created based on the obtained normal inchi and the molecule's radical sites.

That would mean that this failure is related to the second one. We do not need to test for RDKit failing to translate to InChI here since it seems to do better now.

Here is my proposal:

  1. Change the environment file to specify a version of RDKit that does not have the bug the previous version had.

    • we could easily just do >=2022.9.1 which is the version the CI picked

Sounds good

  1. Add a pytest.skip to the third failing unit test, with reason= this unit test checks for a bug which has been patched in version of RDKit of the above version or newer

Sounds good

  1. Change the code for Molecule.get_augmented_inchi to call RDKit instead of OpenBabel

See my comments above. I also checked the inchi generated by Openbabel, and it is InChI=1/CH2O2/c2-1-3/h1-2H, the one generated the by later version of RDKit.

translator.to_inchi accepts an argument of backend while Molecule.get_inchi and Molecule.get_augmented_inchi don't. I may prefer to expose the backend to them for the user to choose which backend to use.

Could you implement this, and/or let me know what you think would be better?

I can implement this. I just quickly wrote some changes, and will check the CI tomorrow.

@codecov
Copy link

codecov bot commented Oct 4, 2023

Codecov Report

Attention: Patch coverage is 20.83333% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 54.94%. Comparing base (bfaee1c) to head (7496009).

Files Patch % Lines
rmgpy/molecule/molecule.py 0.00% 10 Missing ⚠️
rmgpy/species.py 0.00% 4 Missing ⚠️
rmgpy/molecule/translator.py 0.00% 3 Missing ⚠️
rmgpy/ml/estimator.py 66.66% 1 Missing ⚠️
rmgpy/qm/molecule.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2553      +/-   ##
==========================================
- Coverage   55.16%   54.94%   -0.22%     
==========================================
  Files         125      125              
  Lines       37020    37021       +1     
==========================================
- Hits        20422    20342      -80     
- Misses      16598    16679      +81     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JacksonBurns

This comment was marked as outdated.

@JacksonBurns

This comment was marked as outdated.

@JacksonBurns

This comment was marked as resolved.

@JacksonBurns

This comment was marked as outdated.

@JacksonBurns JacksonBurns force-pushed the rdkit-conda-forge branch 2 times, most recently from 87092b0 to 970a9fd Compare October 16, 2023 15:58
@JacksonBurns
Copy link
Contributor Author

Thanks to @rwest this PR can actually move forward - I'm going to copy paste the solution to the weird problem (see the hidden comments from me) for reference:

# since RDKit 2022.03.1, logging is done using the Python logger instead of the
# Cout streams. This does not affect running RMG normally, but this testing file
# only works properly if it is the only logger
# see https://github.com/rdkit/rdkit/pull/4846 for the changes in RDKit

# clear all other existing loggers
# https://stackoverflow.com/a/12158233
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

# once moved to a more recent python (at least 3.8), just add force=true to this statement
# and remove the above

I'm going to let the CI go, and this should now pass, or at least run 😅

@github-actions

This comment was marked as outdated.

@JacksonBurns
Copy link
Contributor Author

The patch required from switching RDKit versions is also now causing tons of useless warnings to be printed in the regression testing output. This was resolved in later versions of descripatastorus (the package raising the errors, see here), so I am switching to conda-forge to try and get a later version.

@rwest
Copy link
Member

rwest commented Oct 16, 2023

Sounds here like that could put the warnings into a different logger namespace, so may need additional tweaking after the switch. Good luck!

@JacksonBurns

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@JacksonBurns
Copy link
Contributor Author

Stuck between a rock and a hard place.

Upgrading to the latest RDKit also allowed upgrading descriptastorus, a dependency of chemprop, that we have our own custom build of (which we should stop using in favor of conda-forge). The latest version of descriptastorus that we can use (that supports Python 3.7) raises obnoxious warnings on import time that we can't (easily) avoid. We can't upgrade to the actual latest version that doesn't do this, because it requires SciPy 1.9 or newer, which is Python 3.8+ only.

Temporary patch to get this PR through is to pull the even older version of descriptastorus on the RMG channel, for now. The new version of chemprop no longer uses it at all, so once that transition is made this will be a moot point. Actually transitioning to the latest chemprop (which supports python 3.11/12 will be difficult too).

@github-actions

This comment was marked as outdated.

@JacksonBurns
Copy link
Contributor Author

Ok, so the version of descriptastorus on the RMG channel still has the annoying warnings problem. I am going to change it back to conda-forge, but expand this PR to also fix the last few conda package issues in the environment file.

@JacksonBurns JacksonBurns changed the title switch rdkit to the conda-forge channel environment.yml Unblocking Python 3.11 - rdkit, chemprop to conda-forge channel, diffeqpy from pip Oct 17, 2023
@JacksonBurns JacksonBurns changed the title environment.yml Unblocking Python 3.11 - rdkit, chemprop to conda-forge channel, diffeqpy from pip environment.yml Unblocking Python 3.11 - rdkit to conda-forge channel, diffeqpy from pip, remove chemprop for now Oct 17, 2023
Copy link
Contributor

@jonwzheng jonwzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me and I see the tests are passing. Please see some small comments below.

Philosophically, what is the idea behind exposing the translator functions to Molecule at all? Looking in the codebase, it seems like there is an awkward mixture of importing the to_inchi functions from translator and calling Molecule.to_inchi in the unit tests. It's not clear what the "canonical" implementation is, especially if there is the chance for the default arguments to diverge.

To me, it feels like it would make more sense to just use the translator functions in all cases (like the paradigm of using Chem.MolToSmiles rather than mol.ToSmiles). Including these functions in Molecule risks unnecessarily duplicating code and docstrings that could easily become out-of-date. However, maybe people find it more convenient. Let me know what you think.

Convert a molecular structure to an InChI string. Uses
`OpenBabel <http://openbabel.org/>`_ to perform the conversion.

Available options for InChI backend: 'rdkit-first' (default),
'try-all', 'rdkit', or 'openbabel'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for molecule/translator.py needs to be updated as well so that it includes rdkit-first (default). For InChI and InchI key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this respect, following off of the previous comment, it also feels awkward to need to repeat the arguments for the translator function in the docstring here. If the supported keyword arguments change in translator we have to remember to update it here.

@@ -1863,62 +1863,74 @@ def to_single_bonds(self, raise_atomtype_exception=True):
new_mol.update_atomtypes(raise_exception=raise_atomtype_exception)
return new_mol

def to_inchi(self):
def to_inchi(self, backend='rdkit-first'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels kind of awkward that the default arguments are duplicated both here in molecule.py as well as translator.py. If one default value changes, it doesn't guarantee that the other will mirror that change.

rmgpy/molecule/molecule.py Outdated Show resolved Hide resolved
@xiaoruiDong
Copy link
Contributor

xiaoruiDong commented Mar 7, 2024

The changes look good to me and I see the tests are passing. Please see some small comments below.

Philosophically, what is the idea behind exposing the translator functions to Molecule at all? Looking in the codebase, it seems like there is an awkward mixture of importing the to_inchi functions from translator and calling Molecule.to_inchi in the unit tests. It's not clear what the "canonical" implementation is, especially if there is the chance for the default arguments to diverge.

To me, it feels like it would make more sense to just use the translator functions in all cases (like the paradigm of using Chem.MolToSmiles rather than mol.ToSmiles). Including these functions in Molecule risks unnecessarily duplicating code and docstrings that could easily become out-of-date. However, maybe people find it more convenient. Let me know what you think.

Good question. I guess Jackson may have a more valuable insight from a more professional point of view. To me, the motivation is really about "convenience for users" when implementing something like this in python. Like You don't have to import different modules and usually it ends up with a slightly shorter code. It is also very prevalent, like in PyTorch, there are torch.sum(tensor) and Tensor.sum(). Implementing both should satisfy users with different coding styles.
From a developer's point of view, this indeed results in some extra work, but still manageable.

Copy link

github-actions bot commented Mar 8, 2024

Regression Testing Results

⚠️ One or more regression tests failed.
Please download the failed results and run the tests locally or check the log to see why.

Detailed regression test results.

Regression test aromatics:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:05
Current: Execution time (DD:HH:MM:SS): 00:00:01:05
Reference: Memory used: 3031.39 MB
Current: Memory used: 2973.07 MB

aromatics Passed Core Comparison ✅

Original model has 15 species.
Test model has 15 species. ✅
Original model has 11 reactions.
Test model has 11 reactions. ✅

aromatics Passed Edge Comparison ✅

Original model has 106 species.
Test model has 106 species. ✅
Original model has 358 reactions.
Test model has 358 reactions. ✅

Observables Test Case: Aromatics Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

aromatics Passed Observable Testing ✅

Regression test liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:10
Current: Execution time (DD:HH:MM:SS): 00:00:02:09
Reference: Memory used: 3153.06 MB
Current: Memory used: 3108.54 MB

liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 216 reactions.
Test model has 215 reactions. ❌
The original model has 1 reactions that the tested model does not have. ❌
rxn: CCO[O](31) <=> [OH](22) + CC=O(69) origin: intra_H_migration

liquid_oxidation Failed Edge Comparison ❌

Original model has 202 species.
Test model has 202 species. ✅
Original model has 1618 reactions.
Test model has 1613 reactions. ❌
The original model has 6 reactions that the tested model does not have. ❌
rxn: CCO[O](31) <=> [OH](22) + CC=O(69) origin: intra_H_migration
rxn: [CH2]CCOO(73) + CCCCCOO(105) <=> CCCOO(35) + CC[CH]CCOO(114) origin: H_Abstraction
rxn: [CH2]CCOO(73) + CCCCCOO(105) <=> CCCOO(35) + CCC[CH]COO(113) origin: H_Abstraction
rxn: [CH2]CCOO(73) + CCCCCOO(105) <=> CCCOO(35) + C[CH]CCCOO(115) origin: H_Abstraction
rxn: [CH2]CCOO(73) + CCCCCOO(105) <=> CCCOO(35) + CCCC[CH]OO(138) origin: H_Abstraction
rxn: CCCOO(35) + [CH2]CCCCOO(116) <=> [CH2]CCOO(73) + CCCCCOO(105) origin: H_Abstraction
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCO[O](30) <=> C[CH]OO(70) origin: intra_H_migration

Observables Test Case: liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

liquid_oxidation Passed Observable Testing ✅

Regression test nitrogen:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:22
Current: Execution time (DD:HH:MM:SS): 00:00:01:20
Reference: Memory used: 3164.01 MB
Current: Memory used: 3103.51 MB

nitrogen Passed Core Comparison ✅

Original model has 41 species.
Test model has 41 species. ✅
Original model has 360 reactions.
Test model has 360 reactions. ✅

nitrogen Failed Edge Comparison ❌

Original model has 132 species.
Test model has 132 species. ✅
Original model has 997 reactions.
Test model has 997 reactions. ✅

Non-identical thermo! ❌
original: O1[C]=N1
tested: O1[C]=N1

Hf(300K) S(300K) Cp(300K) Cp(400K) Cp(500K) Cp(600K) Cp(800K) Cp(1000K) Cp(1500K)
141.64 58.66 12.26 12.27 12.09 11.96 12.26 12.72 12.15
116.46 53.90 11.62 12.71 13.49 13.96 14.14 13.85 13.58

thermo: Thermo group additivity estimation: group(O2s-CdN3d) + group(N3d-OCd) + group(Cd-HN3dO) + ring(oxirene) + radical(CdJ-NdO)
thermo: Thermo group additivity estimation: group(O2s-CdN3d) + group(N3d-OCd) + group(Cd-HN3dO) + ring(Cyclopropene) + radical(CdJ-NdO)

Non-identical kinetics! ❌
original:
rxn: NCO(66) <=> O1[C]=N1(126) origin: Intra_R_Add_Endocyclic
tested:
rxn: NCO(66) <=> O1[C]=N1(126) origin: Intra_R_Add_Endocyclic

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): -66.25 -46.19 -34.19 -26.21 -16.28 -10.36 -2.54 1.31
k(T): -49.54 -33.65 -24.16 -17.85 -10.01 -5.35 0.80 3.82

kinetics: Arrhenius(A=(6.95187e+18,'s^-1'), n=-1.628, Ea=(111.271,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Backbone0_N-2R!H-inRing_N-1R!H-inRing_Sp-2R!H-1R!H""")
kinetics: Arrhenius(A=(6.95187e+18,'s^-1'), n=-1.628, Ea=(88.327,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Backbone0_N-2R!H-inRing_N-1R!H-inRing_Sp-2R!H-1R!H""")
Identical kinetics comments:
kinetics: Estimated from node Backbone0_N-2R!H-inRing_N-1R!H-inRing_Sp-2R!H-1R!H

Observables Test Case: NC Comparison

✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions!

nitrogen Passed Observable Testing ✅

Regression test oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:27
Current: Execution time (DD:HH:MM:SS): 00:00:02:21
Reference: Memory used: 3014.90 MB
Current: Memory used: 2963.55 MB

oxidation Passed Core Comparison ✅

Original model has 59 species.
Test model has 59 species. ✅
Original model has 694 reactions.
Test model has 694 reactions. ✅

oxidation Passed Edge Comparison ✅

Original model has 230 species.
Test model has 230 species. ✅
Original model has 1526 reactions.
Test model has 1526 reactions. ✅

Observables Test Case: Oxidation Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

oxidation Passed Observable Testing ✅

Regression test sulfur:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:54
Current: Execution time (DD:HH:MM:SS): 00:00:00:54
Reference: Memory used: 3118.06 MB
Current: Memory used: 3085.64 MB

sulfur Passed Core Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 74 reactions.
Test model has 74 reactions. ✅

sulfur Failed Edge Comparison ❌

Original model has 89 species.
Test model has 89 species. ✅
Original model has 227 reactions.
Test model has 227 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary
The tested model has 1 reactions that the original model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary

Observables Test Case: SO2 Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

sulfur Passed Observable Testing ✅

Regression test superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:34
Current: Execution time (DD:HH:MM:SS): 00:00:00:34
Reference: Memory used: 3202.81 MB
Current: Memory used: 3186.71 MB

superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 21 reactions.
Test model has 21 reactions. ✅

superminimal Passed Edge Comparison ✅

Original model has 18 species.
Test model has 18 species. ✅
Original model has 28 reactions.
Test model has 28 reactions. ✅

Regression test RMS_constantVIdealGasReactor_superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:26
Current: Execution time (DD:HH:MM:SS): 00:00:02:25
Reference: Memory used: 3714.03 MB
Current: Memory used: 3666.17 MB

RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅

Regression test RMS_CSTR_liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:06:08
Current: Execution time (DD:HH:MM:SS): 00:00:06:05
Reference: Memory used: 3651.81 MB
Current: Memory used: 3570.18 MB

RMS_CSTR_liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 232 reactions.
Test model has 233 reactions. ❌
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCO[O](36) <=> [OH](21) + CC=O(61) origin: intra_H_migration

RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌

Original model has 206 species.
Test model has 206 species. ✅
Original model has 1508 reactions.
Test model has 1508 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: CCO[O](35) <=> C[CH]OO(62) origin: intra_H_migration
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCO[O](36) <=> [OH](21) + CC=O(61) origin: intra_H_migration

Observables Test Case: RMS_CSTR_liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_CSTR_liquid_oxidation Passed Observable Testing ✅

Regression test fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:40
Current: Execution time (DD:HH:MM:SS): 00:00:00:39
Reference: Memory used: 2950.00 MB
Current: Memory used: 2901.35 MB

fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

fragment Passed Edge Comparison ✅

Original model has 33 species.
Test model has 33 species. ✅
Original model has 47 reactions.
Test model has 47 reactions. ✅

Observables Test Case: fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

fragment Passed Observable Testing ✅

Regression test RMS_constantVIdealGasReactor_fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:03:06
Current: Execution time (DD:HH:MM:SS): 00:00:03:05
Reference: Memory used: 3844.47 MB
Current: Memory used: 3798.12 MB

RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 24 reactions.
Test model has 24 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅

beep boop this comment was written by a bot 🤖

@xiaoruiDong
Copy link
Contributor

@JacksonBurns Any thoughts about @jonwzheng 's question?
@jonwzheng The docstrings in the translator module are corrected.

@JacksonBurns
Copy link
Contributor Author

I see both perspectives here in terms of convenience for users and ease of development. I think we should probably clean up the code internally so that we don't have copies of defaults anywhere that could possibly go out of date (and at the same time ensure that the class methods are calling those functions and pointing to their docs), but that's a later problem and I believe out of scope of this PR.

jonwzheng
jonwzheng previously approved these changes Mar 8, 2024
Copy link
Contributor

@jonwzheng jonwzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me and I agree any internal changes would be outside the scope of this PR. Nice work and thank you for working on this issue. Setting this to approved.

@JacksonBurns
Copy link
Contributor Author

This PR is very close to ready.

@rwest left a comment here (#2628 (comment)) approving of the changes to the inchi strings, which is probably the only big 'theory' question in this PR.

Since this will be a semi-large change (removing a feature), I think we should add one more review to be safe. @hwpang please take a look over this PR and let us know if anything needs changes.

Copy link
Contributor

@hwpang hwpang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Mostly LGTM with some questions

@@ -54,7 +57,7 @@ dependencies:
- coverage
- cython >=0.25.2
- scikit-learn
- scipy
- scipy <1.11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I believe I originally added this because of descriptastorus (which was incompatible with later versions of scipy) but it might not be required. I will remove it and we can see what happens in the CI.

Convert a molecular structure to an InChI string. Uses
`OpenBabel <http://openbabel.org/>`_ to perform the conversion.

Available options for InChI backend: 'rdkit-first' (default),
'try-all', 'rdkit', or 'openbabel'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an explanation in the docstring on what packages try-all uses and their sequence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoruiDong can you do this?

@@ -238,7 +239,7 @@ def test_ch2o2(self):
3 O 1 {1,S}
"""

aug_inchi = "InChI=1/CH2O2/c2-1-3/h1H,(H,2,3)/u1,2"
aug_inchi = "InChI=1/CH2O2/c2-1-3/h1-2H/u1,3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the inchi different? Is it due to RDKit update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoruiDong explains it in excellent details here, but in short yes RDKit fixed something: #2553 (comment)

@hwpang
Copy link
Contributor

hwpang commented Mar 13, 2024

By the way, I see that the documentation build is failing, but it doesn't seem to be related to this PR? Do we know what is causing it?

@hwpang
Copy link
Contributor

hwpang commented Mar 13, 2024

I also see that the title suggests we get diffeqpy from pip now, but I don't see relevant code changes, could you update the title to reflect what has been changed?

@JacksonBurns
Copy link
Contributor Author

I also see that the title suggests we get diffeqpy from pip now, but I don't see relevant code changes, could you update the title to reflect what has been changed?

The docs build is fixed here: #2628 which we can merge after this PR

@JacksonBurns
Copy link
Contributor Author

I also see that the title suggests we get diffeqpy from pip now, but I don't see relevant code changes, could you update the title to reflect what has been changed?

I think this might have got wiped out in a force push. I will add it back and see what happens in the CI.

@JacksonBurns JacksonBurns changed the title environment.yml Unblocking Python 3.11 - rdkit to conda-forge channel, diffeqpy from pip, remove chemprop for now environment.yml Unblocking Python 3.11 - rdkit to conda-forge channel, remove chemprop for now Mar 13, 2024
@JacksonBurns
Copy link
Contributor Author

@hwpang turns out that one of our other dependencies requires diffeqpy during the conda install, so we end up with two copies of it if we ask for it from pip. We will have to deal with this in a separate PR.

former is not needed since the descriptastorus (which was incompatible with latest scipy and required this limitation) is no longer in the dep list
try-all is a bit of confusing, while its actual behavior is using openbabel-first whenever possible. Although at rmgpy.molecule.translator line 41-46, there is a try/except to check if openbabel is available, and sometimes only RDKit is included in the BACKEND; Given openbabel is by default installed in the RMG-Py environment, it should be reasonable to call it openbabel-first.
Copy link

Regression Testing Results

⚠️ One or more regression tests failed.
Please download the failed results and run the tests locally or check the log to see why.

Detailed regression test results.

Regression test aromatics:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:05
Current: Execution time (DD:HH:MM:SS): 00:00:01:04
Reference: Memory used: 3046.57 MB
Current: Memory used: 3001.39 MB

aromatics Passed Core Comparison ✅

Original model has 15 species.
Test model has 15 species. ✅
Original model has 11 reactions.
Test model has 11 reactions. ✅

aromatics Passed Edge Comparison ✅

Original model has 106 species.
Test model has 106 species. ✅
Original model has 358 reactions.
Test model has 358 reactions. ✅

Observables Test Case: Aromatics Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

aromatics Passed Observable Testing ✅

Regression test liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:09
Current: Execution time (DD:HH:MM:SS): 00:00:02:08
Reference: Memory used: 3156.41 MB
Current: Memory used: 3108.76 MB

liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 215 reactions.
Test model has 215 reactions. ✅

Non-identical kinetics! ❌
original:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](128) origin: Peroxyl_Disproportionation
tested:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](127) origin: Peroxyl_Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 7.83 7.49 7.23 7.02 6.68 6.42 5.95 5.61
k(T): 3.77 4.45 4.86 5.14 5.48 5.68 5.96 6.09

kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing""")
kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(3.756,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R

liquid_oxidation Failed Edge Comparison ❌

Original model has 202 species.
Test model has 202 species. ✅
Original model has 1618 reactions.
Test model has 1610 reactions. ❌
The original model has 9 reactions that the tested model does not have. ❌
rxn: [CH2]CCOO(79) + CCCCCOO(105) <=> CCCOO(36) + CC[CH]CCOO(114) origin: H_Abstraction
rxn: [CH2]CCOO(79) + CCCCCOO(105) <=> CCCOO(36) + CCC[CH]COO(113) origin: H_Abstraction
rxn: [CH2]CCOO(79) + CCCCCOO(105) <=> CCCOO(36) + C[CH]CCCOO(115) origin: H_Abstraction
rxn: [CH2]CCOO(79) + CCCCCOO(105) <=> CCCOO(36) + CCCC[CH]OO(134) origin: H_Abstraction
rxn: CCCOO(36) + [CH2]CCCCOO(116) <=> [CH2]CCOO(79) + CCCCCOO(105) origin: H_Abstraction
rxn: C[CH]CCCO(157) + CCCCCO[O](103) <=> CC=CCCO(192) + CCCCCOO(105) origin: Disproportionation
rxn: C[CH]CCCO(157) + CCCCCO[O](103) <=> C=CCCCO(193) + CCCCCOO(105) origin: Disproportionation
rxn: C[CH]CCCO(157) + C[CH]CCCO(157) <=> CC=CCCO(192) + CCCCCO(130) origin: Disproportionation
rxn: C[CH]CCCO(157) + C[CH]CCCO(157) <=> C=CCCCO(193) + CCCCCO(130) origin: Disproportionation
The tested model has 1 reactions that the original model does not have. ❌
rxn: CCCCCO[O](103) + CCCCCO[O](103) <=> oxygen(1) + CCCCC=O(106) + CCCCCO(130) origin: Peroxyl_Termination

Non-identical kinetics! ❌
original:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](128) origin: Peroxyl_Disproportionation
tested:
rxn: CCCC(C)O[O](20) + CCCCCO[O](103) <=> oxygen(1) + CCCC(C)[O](64) + CCCCC[O](127) origin: Peroxyl_Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 7.83 7.49 7.23 7.02 6.68 6.42 5.95 5.61
k(T): 3.77 4.45 4.86 5.14 5.48 5.68 5.96 6.09

kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing""")
kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(3.756,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R

Non-identical kinetics! ❌
original:
rxn: CCCCCO[O](103) + CC(CC(C)OO)O[O](104) <=> oxygen(1) + CCCCC[O](128) + CC([O])CC(C)OO(127) origin: Peroxyl_Disproportionation
tested:
rxn: CCCCCO[O](103) + CC(CC(C)OO)O[O](104) <=> oxygen(1) + CCCCC[O](127) + CC([O])CC(C)OO(129) origin: Peroxyl_Disproportionation

k(1bar) 300K 400K 500K 600K 800K 1000K 1500K 2000K
k(T): 3.52 4.27 4.71 5.01 5.39 5.61 5.91 6.06
k(T): 7.79 7.46 7.21 7.00 6.67 6.41 5.94 5.60

kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(4.096,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0.053,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing Ea raised from 0.0 to 0.2 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing
Ea raised from 0.0 to 0.2 kJ/mol to match endothermicity of reaction.

Observables Test Case: liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

liquid_oxidation Passed Observable Testing ✅

Regression test nitrogen:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:23
Current: Execution time (DD:HH:MM:SS): 00:00:01:22
Reference: Memory used: 3155.44 MB
Current: Memory used: 3111.79 MB

nitrogen Passed Core Comparison ✅

Original model has 41 species.
Test model has 41 species. ✅
Original model has 360 reactions.
Test model has 360 reactions. ✅

nitrogen Passed Edge Comparison ✅

Original model has 132 species.
Test model has 132 species. ✅
Original model has 997 reactions.
Test model has 997 reactions. ✅

Observables Test Case: NC Comparison

✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions!

nitrogen Passed Observable Testing ✅

Regression test oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:24
Current: Execution time (DD:HH:MM:SS): 00:00:02:21
Reference: Memory used: 3022.33 MB
Current: Memory used: 2950.56 MB

oxidation Passed Core Comparison ✅

Original model has 59 species.
Test model has 59 species. ✅
Original model has 694 reactions.
Test model has 694 reactions. ✅

oxidation Passed Edge Comparison ✅

Original model has 230 species.
Test model has 230 species. ✅
Original model has 1526 reactions.
Test model has 1526 reactions. ✅

Observables Test Case: Oxidation Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

oxidation Passed Observable Testing ✅

Regression test sulfur:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:53
Current: Execution time (DD:HH:MM:SS): 00:00:00:54
Reference: Memory used: 3132.54 MB
Current: Memory used: 3059.71 MB

sulfur Passed Core Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 74 reactions.
Test model has 74 reactions. ✅

sulfur Failed Edge Comparison ❌

Original model has 89 species.
Test model has 89 species. ✅
Original model has 227 reactions.
Test model has 227 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary
The tested model has 1 reactions that the original model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary

Observables Test Case: SO2 Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

sulfur Passed Observable Testing ✅

Regression test superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:34
Current: Execution time (DD:HH:MM:SS): 00:00:00:34
Reference: Memory used: 3249.75 MB
Current: Memory used: 3155.55 MB

superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 21 reactions.
Test model has 21 reactions. ✅

superminimal Passed Edge Comparison ✅

Original model has 18 species.
Test model has 18 species. ✅
Original model has 28 reactions.
Test model has 28 reactions. ✅

Regression test RMS_constantVIdealGasReactor_superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:28
Current: Execution time (DD:HH:MM:SS): 00:00:02:26
Reference: Memory used: 3704.70 MB
Current: Memory used: 3656.96 MB

RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅

Regression test RMS_CSTR_liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:06:07
Current: Execution time (DD:HH:MM:SS): 00:00:06:04
Reference: Memory used: 3643.07 MB
Current: Memory used: 3587.56 MB

RMS_CSTR_liquid_oxidation Failed Core Comparison ❌

Original model has 37 species.
Test model has 37 species. ✅
Original model has 233 reactions.
Test model has 232 reactions. ❌
The original model has 1 reactions that the tested model does not have. ❌
rxn: CCO[O](36) <=> [OH](21) + CC=O(61) origin: intra_H_migration

RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌

Original model has 206 species.
Test model has 206 species. ✅
Original model has 1508 reactions.
Test model has 1508 reactions. ✅
The original model has 2 reactions that the tested model does not have. ❌
rxn: CCO[O](36) <=> [OH](21) + CC=O(61) origin: intra_H_migration
rxn: CCCO[O](34) <=> CC[CH]OO(51) origin: intra_H_migration
The tested model has 2 reactions that the original model does not have. ❌
rxn: CCCO[O](36) <=> [OH](21) + CCC=O(50) origin: intra_H_migration
rxn: CCO[O](35) <=> C[CH]OO(63) origin: intra_H_migration

Observables Test Case: RMS_CSTR_liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_CSTR_liquid_oxidation Passed Observable Testing ✅

Regression test fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:40
Current: Execution time (DD:HH:MM:SS): 00:00:00:40
Reference: Memory used: 2955.03 MB
Current: Memory used: 2903.81 MB

fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

fragment Passed Edge Comparison ✅

Original model has 33 species.
Test model has 33 species. ✅
Original model has 47 reactions.
Test model has 47 reactions. ✅

Observables Test Case: fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

fragment Passed Observable Testing ✅

Regression test RMS_constantVIdealGasReactor_fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:03:04
Current: Execution time (DD:HH:MM:SS): 00:00:03:04
Reference: Memory used: 3868.01 MB
Current: Memory used: 3801.56 MB

RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 24 reactions.
Test model has 24 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅

beep boop this comment was written by a bot 🤖

Copy link
Contributor

@hwpang hwpang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hwpang
Copy link
Contributor

hwpang commented Mar 13, 2024

Approved and merged as agreed upon in RMG subgroup

@hwpang hwpang merged commit 22f5930 into ReactionMechanismGenerator:main Mar 13, 2024
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python 3.11 Transition PRs and Issues related to transitioning from Python 3.7 to 3.11
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RDKit should be upgraded, potentially requiring API changes
5 participants