epsilon value to norm function in aimnet2 by chrisiacovella · Pull Request #391 · choderalab/modelforge

chrisiacovella · 2026-01-07T21:52:20Z

Pull Request Summary

As discussed in #390 aimnet2 was failing when training with forces. I tracked this down to be where we take the norm of the vector contributions. Adding in a small epsilon value (1e-8) seems to correct the issue.

This also includes a few additional example scripts, e.g., to calculate the self-energy of a dataset and plotting a dataset.

Note: CI was failing due to an update to pytorch lightning. See #392 . Basically, we now also need to pass weights_only to various pytorch lightning functions we are calling.

Key changes

Notable points that this PR has either accomplished or will accomplish.

fix issue causing crashes when training with force

Associated Issue(s)

Pull Request Checklist

Issue(s) raised/addressed and linked
Includes appropriate unit test(s)
Appropriate docstring(s) added/updated
Appropriate .rst doc file(s) added/updated
PR is ready for review

…alkethoh

codecov-commenter · 2026-01-07T21:54:34Z

Codecov Report

❌ Patch coverage is 77.35849% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.84%. Comparing base (8995841) to head (6719d6d).

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chrisiacovella · 2026-01-07T21:59:29Z

Running any real training tests (with forces) that would help to detect these errors is going to be too expensive for CI and require too much memory. I think I should create a separate test (that will be skipped on CI) to do these checks...they will not necessarily be that expensive to run on a machine with GPU locally. While I was seeing failure during the first epoch with the full dataset, I'm not sure if the n=1000 test set will also fail so quickly (or fail at all). I need to do some additional local tests on the unmodified version to ensure failure with the test set (so that any actual testing with the patched version is actually meaningful).

Edit: The smaller subsets do not seem to fail..right now the failure in the unpatched version is only occurring when using the "full" tmqm openff dataset...I will test all the potentials using this (separately) to see if any others fail for similar reasons (i.e., dividing by zero).

…(as this caused a very old version to be installed to work with the newest version of pint, yet those two are incompatible). Update the trainer model to specify weights_only for the calls to pytorch lightning (may need to pin this to the latest).

Copilot

Pull request overview

This PR addresses a critical bug in AIMNet2 that caused training failures when using forces, by adding numerical stability to the vector norm calculation. Additionally, it includes PyTorch Lightning compatibility updates for the weights_only parameter, updates atomic self-energies in dataset YAML files from hartree to kilojoule_per_mole units, and adds example scripts for dataset visualization and self-energy calculation.

Fixed numerical stability issue in AIMNet2's vector contribution calculation by adding epsilon to norm computation
Added weights_only=False parameter to PyTorch Lightning's validate and test methods to address compatibility issues
Updated atomic self-energy values in dataset YAML files (spice1, spice2, phalkethoh) with unit conversions

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
modelforge/potential/aimnet2.py	Added epsilon value to torch.norm for numerical stability and NaN detection
modelforge/train/training.py	Added weights_only=False parameter to trainer.validate() and trainer.test() calls
modelforge/train/losses.py	Added scale_by_number_of_atoms=True to PerAtomChargeError initialization
modelforge/potential/potential.py	Added jit parameter to load_from_wandb method signature
modelforge/potential/featurization.py	Added commented debugging code for anomaly detection
modelforge/utils/misc.py	Added file path expansion for tilde and environment variables in OpenWithLock
modelforge/dataset/utils.py	Fixed typo in comment ("lineare" to "linear")
modelforge/dataset/dataset.py	Added histogram plotting for shifted energies (appears to be debug code)
modelforge/dataset/yaml_files/spice1.yaml	Updated atomic self-energies from hartree to kilojoule_per_mole units
modelforge/dataset/yaml_files/spice2.yaml	Updated atomic self-energies from hartree to kilojoule_per_mole units
modelforge/dataset/yaml_files/spice2_openff.yaml	Added trailing comma to Br self-energy entry
modelforge/dataset/yaml_files/phalkethoh.yaml	Updated atomic self-energies from hartree to kilojoule_per_mole units
modelforge-curate/modelforge/curate/sourcedataset.py	Added iter_records and records_names methods, file locking for HDF5 reads
modelforge-curate/modelforge/curate/record.py	Fixed docstring indentation, added calculate_reference_energy function
modelforge-curate/modelforge/curate/datasets/tmqm_openff_curation.py	Added fallback logic for molecule name parsing with underscore separator
modelforge-curate/modelforge/curate/datasets/yaml_files/tmqm_openff_curation.yaml	Added v0_ext_sm1 version configuration
scripts/self_energy_calculation.py	New script for calculating self-energies across multiple datasets
scripts/plot_dataset.py	New script for plotting dataset energy distributions
devtools/conda-envs/*.yaml	Updated openff-units version requirement to >=0.3.1 across all environments
devtools/conda-envs/env.yaml	Renamed environment, added multiple new dependencies for testing and JAX support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fixing typo Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

renaming full environment file Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

removing commas from yaml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

updating file locking for reading in source datasets Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

removing unused imports Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…wever, it might be good to add in a function and/or example script that wraps this functionality (calling the torch dataset) so that it is easy for users to interrogate what happens to a dataset after it has been loaded and preprocessed.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…per_scripts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chrisiacovella added 2 commits December 11, 2025 10:54

updating self energies in .yaml files, primarily for spice 1,2 and ph…

26ba827

…alkethoh

Add in epsilon to norm value to avoid errors when training with forces.

5579884

chrisiacovella requested review from MarshallYan and Copilot January 8, 2026 06:49

Copilot started reviewing on behalf of chrisiacovella January 8, 2026 06:50 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

chrisiacovella and others added 14 commits January 8, 2026 08:08

Update modelforge/dataset/dataset.py

835fca9

fixing typo Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update devtools/conda-envs/env.yaml

6a4c3b0

renaming full environment file Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update modelforge/dataset/yaml_files/phalkethoh.yaml

4f5aa24

removing commas from yaml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update modelforge-curate/modelforge/curate/sourcedataset.py

5990ca2

updating file locking for reading in source datasets Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

removing unused imports

2221a28

removing unused imports Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fixing linting complaint after removing some code.

a71928c

Update devtools/conda-envs/env.yaml

50a3a2b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

switched to torch.linalg.norm as torch.norm is deprecated.

7baec73

Merge remote-tracking branch 'origin/dev-helper_scripts' into dev-hel…

6a076c7

…per_scripts

Adding doc string to calculate_reference_energy in record.py

937b6ee

Update modelforge/dataset/yaml_files/spice1.yaml

f9b53db

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

rename wrapper for record_names

da09b1a

fixing typing

6719d6d

MarshallYan approved these changes Jan 12, 2026

View reviewed changes

chrisiacovella merged commit 0f4ed92 into choderalab:main Jan 12, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epsilon value to norm function in aimnet2#391

epsilon value to norm function in aimnet2#391
chrisiacovella merged 17 commits intochoderalab:mainfrom
chrisiacovella:dev-helper_scripts

chrisiacovella commented Jan 7, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 7, 2026 •

edited

Loading

Uh oh!

chrisiacovella commented Jan 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

chrisiacovella commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Key changes

Associated Issue(s)

Pull Request Checklist

Uh oh!

codecov-commenter commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chrisiacovella commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chrisiacovella commented Jan 7, 2026 •

edited

Loading

codecov-commenter commented Jan 7, 2026 •

edited

Loading

chrisiacovella commented Jan 7, 2026 •

edited

Loading