Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add four more sources of constants data - Wikidata, NIST Webbook, Common Chemistry, and the Joback group contribution method #39

Merged
merged 56 commits into from
Dec 29, 2021

Conversation

CalebBell
Copy link
Owner

Creating a PR to observe if tests pass on other platforms; not ready yet.

@yoelcortes
Copy link
Collaborator

yoelcortes commented Dec 28, 2021

Hi Caleb! The updates look great overall!

I made some minor changes to prevent attempting CAS_to_int if the user passes an integer (which is an update hope to make in thermo during chemical creation). Let me know if there are any issues with my implementation.

I added a "low memory" run to the github workflow, which shows some issues with the test. A few of the issues are trivial (like dataframe shapes not matching), but others may need a little more time to solve. But all the tests are passing without "low memory".

I'll be flying back to the US and will be a little busy until January 7, so I won't get a chance to address the test issues until then.

Wish you happy holidays and a happy new year!
Thanks,

@CalebBell
Copy link
Owner Author

Hello Yoel!

Thank you very much for the multitude of improvements. If I understand what you are after, you would like to be able to do "Tc(CASRN=64175)" and get the critical temperature? I think that is pretty doable with a few more changes if so. Updating the documentation and adding a few more tests is probably the hardest part to making that happen.
I'm glad you noticed the CHEDL_LOW_MEMORY environmental variable I was playing around with. I was curious how much memory could be saved if more strings were removed, so I added a flag. It wasn't my intention to be providing this as a feature to users. The flag saves about 1/3 of the memory used, or 6 MB.
Longer term, I hope to be able to reduce memory use for most users in the following two ways:

  • Adding a "best available" data source, which contains the default (quasi-recommended) value for every property for every chemical for which the data is available. If like most users they aren't specifying a method, this will work fine.
  • Breaking up the lazy-loading to be even more granular, so each data frame is loaded only when it needs to be searched; instead of all dataframes for a particular property being loaded first.

Your point about the tests not passing with that variable set was expected for me. The flag essentially removes some of the functionality of the library, so some of the tests should be expected to fail.
I was able to make them pass through a few fixes and replacing the previously deleted column with an empty sparse column.
I like the pytest option to run the tests with a changed environmental variable; nice!
I definitely could have communicated better what I was trying to do with the flag though. With my hopefully-better explanation, do you think it is still necessary to add the CI for the low memory mode? I wouldn't want to update documentation and point users towards this flag at this point.

I wish you a happy holidays and new year as well!
Caleb

@CalebBell CalebBell merged commit defcd27 into master Dec 29, 2021
@yoelcortes
Copy link
Collaborator

Hi Caleb,

OK! I like the flag and may use it myself in the near future. The points you mentioned on reducing memory use would be wonderful too.

It's great that you were able to get the tests passing with low memory! I agree with not updating documentation and pointing users towards the new flag yet. If we intend to keep the CHEDL_LOW_MEMORY variable, I think keeping the flag in the CI may be better in the long run. But if there is a good chance we remove it in favor of something else, we can remove it from the CI (your call).

Thinking for BioSTEAM, I might include a function to delete all the dataframes in chemicals after loading the thermodynamic property packages. The function would also set "_loaded" attributes to false. This would save almost 100% of the memory. But using the flag may be useful for more dynamic efforts to keep the dataframes.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants