Replace weather eval with additional inference#1096
Merged
mcgibbon merged 7 commits intoApr 29, 2026
Conversation
Weather evaluation's `get_inference_data` was using window data requirements derived from the inline inference config's `forward_steps_in_memory` instead of its own. This also applied to prognostic state requirements which happened to be config- independent today but would break if that changed. Removes the shared helper methods and inlines the correct config-specific values at each call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The aggregator was built once and reused for every epoch's inference run, causing stale accumulated state. Move construction inside the per-epoch closure so a fresh aggregator is created each time, matching how inline inference already works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WeatherEvaluationConfig was identical to InlineInferenceConfig. Replace all usages with InlineInferenceConfig and delete the duplicate class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure rename of the field, variable names, docstrings, error messages, and log label. No behavioral changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single optional InlineInferenceConfig with a list of
AdditionalInferenceConfig entries, each carrying a name and config.
The name is used as the wandb log prefix and to create distinct
output subdirectories under output_dir/additional_inference/{name}/.
Duplicate names are rejected in __post_init__.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mcgibbon
commented
Apr 28, 2026
| ) | ||
| ) | ||
| class AdditionalInferenceConfig: | ||
| name: str |
Contributor
Author
There was a problem hiding this comment.
Wandb logs will be stored under {name}/, so this could be things like weather_eval, inference_era5, or inf_era5_pre_industrial. When I add a similar additional validation config, I'll need to check for duplicate names across both lists.
spencerkclark
approved these changes
Apr 29, 2026
Member
spencerkclark
left a comment
There was a problem hiding this comment.
Nice that you could repurpose / generalize this after fixing a few bugs. Looks good to me!
One thing that this does not make easier yet is automating checkpoint selection based on inline inference targeting multiple datasets, but that would probably need more design discussion.
The additional_inference path was not passing n_ensemble_per_ic to aggregator.build(), so ensemble metrics (CRPS, SSR) were silently dropped for stochastic inference runs. Also adds n_ensemble_per_ic=2 to the test's additional_inference config and asserts that ensemble metrics appear in the weather_eval logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the single optional
weather_evaluationconfig with a list of namedadditional_inferenceentries, each able to run an independent inference evaluation during training. Also fixes two bugs in the existing weather evaluation code path.Changes:
TrainBuilders.get_evaluation_inference_data,TrainBuilders.get_end_of_epoch_callback: fix weather evaluation using inline inference's data requirements instead of its ownTrainBuilders.get_end_of_epoch_callback: fix aggregator being built once and reused across epochs instead of rebuilt each timeWeatherEvaluationConfig: removed, was identical toInlineInferenceConfigTrainConfig.weather_evaluationrenamed toTrainConfig.additional_inference, changed fromInlineInferenceConfig | Nonetolist[AdditionalInferenceConfig]AdditionalInferenceConfig: new dataclass withname(used as wandb log prefix and output subdirectory) andconfig: InlineInferenceConfig, with duplicate name validationfme.ace.AdditionalInferenceConfig: added to public APITests added
If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated