Merged
Conversation
…naming. Add in per-atom versions of dipole, quadrupoles and octupoles
… to the changes to the api
Codecov ReportAttention: Patch coverage is
Additional details and impacted files🚀 New features to boost your workflow:
|
…cases and failures
…l validation for several properties
…l validation for several properties
…mpares against scf dipole for a molecule from spice2
…r of conformers) to the sourcedataset, so they can be easily and uniformly used in the codes.
…n that takes as input the file to output and all the max number of conformers. the idea is to operate on the entirely processed dataset (which should be faster for the scripts running curation). This also simplifies the code that needs to be written
…nality to check species added to record, and sourcedataset function to return subset of records that match). Also added to functions that limit configurations.
…ing in routine to record to remove high force configurations.
…des total_records to include, total_conformers, max_conformers_per_record, atomic_species_to_limit, and max_force. These are part of the SourceDataset and can be automatically applied when writing to the hdf5 file from the baseclass. These routines do not need to be written for each dataset.
…than python memory.
…ow accepts the max_force_key in case a different name is used for the forces. tests for this added.
MarshallYan
requested changes
Mar 11, 2025
… reading from file if prepare_dataset has been called. Also, explicitly state weights_only=False, as that is now necessary.
…, not number of configurations (initially was implemented/tested for qm9, where those are the same). added self energies for tmqm xtb
Member
Author
|
Note this also addresses Issue #342 , whereby the atomic self energy regression only worked when n_configs = 1 for a dataset. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
This creates an API for dataset curation that relies on pydantic to ensure we have lots of validation at the time of dataset construction.
Note: Even though new curation scripts have been added as part of this PR, I won't remove the old ones (or replace yaml files) in this PR. That will be done in a separate PR.
Key changes
Notable points that this PR has either accomplished or will accomplish.
To Do:
Associated Issue(s)
Pull Request Checklist