Skip to content

Replace weather eval with additional inference#1096

Merged
mcgibbon merged 7 commits into
mainfrom
feature/replace-weather-eval-with-additional-inference
Apr 29, 2026
Merged

Replace weather eval with additional inference#1096
mcgibbon merged 7 commits into
mainfrom
feature/replace-weather-eval-with-additional-inference

Conversation

@mcgibbon
Copy link
Copy Markdown
Contributor

@mcgibbon mcgibbon commented Apr 28, 2026

Replaces the single optional weather_evaluation config with a list of named additional_inference entries, each able to run an independent inference evaluation during training. Also fixes two bugs in the existing weather evaluation code path.

Changes:

  • TrainBuilders.get_evaluation_inference_data, TrainBuilders.get_end_of_epoch_callback: fix weather evaluation using inline inference's data requirements instead of its own

  • TrainBuilders.get_end_of_epoch_callback: fix aggregator being built once and reused across epochs instead of rebuilt each time

  • WeatherEvaluationConfig: removed, was identical to InlineInferenceConfig

  • TrainConfig.weather_evaluation renamed to TrainConfig.additional_inference, changed from InlineInferenceConfig | None to list[AdditionalInferenceConfig]

  • AdditionalInferenceConfig: new dataclass with name (used as wandb log prefix and output subdirectory) and config: InlineInferenceConfig, with duplicate name validation

  • fme.ace.AdditionalInferenceConfig: added to public API

  • Tests added

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

mcgibbon and others added 5 commits April 28, 2026 19:42
Weather evaluation's `get_inference_data` was using window data
requirements derived from the inline inference config's
`forward_steps_in_memory` instead of its own. This also applied
to prognostic state requirements which happened to be config-
independent today but would break if that changed.

Removes the shared helper methods and inlines the correct
config-specific values at each call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The aggregator was built once and reused for every epoch's inference
run, causing stale accumulated state. Move construction inside the
per-epoch closure so a fresh aggregator is created each time, matching
how inline inference already works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WeatherEvaluationConfig was identical to InlineInferenceConfig. Replace
all usages with InlineInferenceConfig and delete the duplicate class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure rename of the field, variable names, docstrings, error messages,
and log label. No behavioral changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single optional InlineInferenceConfig with a list of
AdditionalInferenceConfig entries, each carrying a name and config.
The name is used as the wandb log prefix and to create distinct
output subdirectories under output_dir/additional_inference/{name}/.
Duplicate names are rejected in __post_init__.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mcgibbon mcgibbon requested a review from spencerkclark April 28, 2026 20:16
@mcgibbon mcgibbon marked this pull request as ready for review April 28, 2026 20:16
)
)
class AdditionalInferenceConfig:
name: str
Copy link
Copy Markdown
Contributor Author

@mcgibbon mcgibbon Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wandb logs will be stored under {name}/, so this could be things like weather_eval, inference_era5, or inf_era5_pre_industrial. When I add a similar additional validation config, I'll need to check for duplicate names across both lists.

Copy link
Copy Markdown
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you could repurpose / generalize this after fixing a few bugs. Looks good to me!

One thing that this does not make easier yet is automating checkpoint selection based on inline inference targeting multiple datasets, but that would probably need more design discussion.

@mcgibbon mcgibbon enabled auto-merge (squash) April 29, 2026 13:36
@mcgibbon mcgibbon disabled auto-merge April 29, 2026 13:38
The additional_inference path was not passing n_ensemble_per_ic to
aggregator.build(), so ensemble metrics (CRPS, SSR) were silently
dropped for stochastic inference runs. Also adds n_ensemble_per_ic=2
to the test's additional_inference config and asserts that ensemble
metrics appear in the weather_eval logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mcgibbon mcgibbon enabled auto-merge (squash) April 29, 2026 13:55
@mcgibbon mcgibbon merged commit 71a8326 into main Apr 29, 2026
7 checks passed
@mcgibbon mcgibbon deleted the feature/replace-weather-eval-with-additional-inference branch April 29, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants