Replace weather eval with additional inference by mcgibbon · Pull Request #1096 · ai2cm/ace

mcgibbon · 2026-04-28T20:15:30Z

Replaces the single optional weather_evaluation config with a list of named additional_inference entries, each able to run an independent inference evaluation during training. Also fixes two bugs in the existing weather evaluation code path.

Changes:

TrainBuilders.get_evaluation_inference_data, TrainBuilders.get_end_of_epoch_callback: fix weather evaluation using inline inference's data requirements instead of its own
TrainBuilders.get_end_of_epoch_callback: fix aggregator being built once and reused across epochs instead of rebuilt each time
WeatherEvaluationConfig: removed, was identical to InlineInferenceConfig
TrainConfig.weather_evaluation renamed to TrainConfig.additional_inference, changed from InlineInferenceConfig | None to list[AdditionalInferenceConfig]
AdditionalInferenceConfig: new dataclass with name (used as wandb log prefix and output subdirectory) and config: InlineInferenceConfig, with duplicate name validation
fme.ace.AdditionalInferenceConfig: added to public API
Tests added
If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Weather evaluation's `get_inference_data` was using window data requirements derived from the inline inference config's `forward_steps_in_memory` instead of its own. This also applied to prognostic state requirements which happened to be config- independent today but would break if that changed. Removes the shared helper methods and inlines the correct config-specific values at each call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The aggregator was built once and reused for every epoch's inference run, causing stale accumulated state. Move construction inside the per-epoch closure so a fresh aggregator is created each time, matching how inline inference already works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

WeatherEvaluationConfig was identical to InlineInferenceConfig. Replace all usages with InlineInferenceConfig and delete the duplicate class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pure rename of the field, variable names, docstrings, error messages, and log label. No behavioral changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the single optional InlineInferenceConfig with a list of AdditionalInferenceConfig entries, each carrying a name and config. The name is used as the wandb log prefix and to create distinct output subdirectories under output_dir/additional_inference/{name}/. Duplicate names are rejected in __post_init__. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mcgibbon · 2026-04-28T20:19:38Z

-        )
-    )
+class AdditionalInferenceConfig:
+    name: str


Wandb logs will be stored under {name}/, so this could be things like weather_eval, inference_era5, or inf_era5_pre_industrial. When I add a similar additional validation config, I'll need to check for duplicate names across both lists.

spencerkclark

Nice that you could repurpose / generalize this after fixing a few bugs. Looks good to me!

One thing that this does not make easier yet is automating checkpoint selection based on inline inference targeting multiple datasets, but that would probably need more design discussion.

…-inference

The additional_inference path was not passing n_ensemble_per_ic to aggregator.build(), so ensemble metrics (CRPS, SSR) were silently dropped for stochastic inference runs. Also adds n_ensemble_per_ic=2 to the test's additional_inference config and asserts that ensemble metrics appear in the weather_eval logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mcgibbon and others added 5 commits April 28, 2026 19:42

Remove duplicate WeatherEvaluationConfig class

25b9ebf

WeatherEvaluationConfig was identical to InlineInferenceConfig. Replace all usages with InlineInferenceConfig and delete the duplicate class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rename weather_evaluation to additional_inference

03f30c6

Pure rename of the field, variable names, docstrings, error messages, and log label. No behavioral changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mcgibbon requested a review from spencerkclark April 28, 2026 20:16

mcgibbon marked this pull request as ready for review April 28, 2026 20:16

mcgibbon commented Apr 28, 2026

View reviewed changes

spencerkclark approved these changes Apr 29, 2026

View reviewed changes

Merge branch 'main' into feature/replace-weather-eval-with-additional…

e5a1058

…-inference

mcgibbon enabled auto-merge (squash) April 29, 2026 13:36

mcgibbon disabled auto-merge April 29, 2026 13:38

mcgibbon enabled auto-merge (squash) April 29, 2026 13:55

mcgibbon merged commit 71a8326 into main Apr 29, 2026
7 checks passed

mcgibbon deleted the feature/replace-weather-eval-with-additional-inference branch April 29, 2026 14:09

elynnwu mentioned this pull request May 4, 2026

Add additional inference to coupled training #1108

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace weather eval with additional inference#1096

Replace weather eval with additional inference#1096
mcgibbon merged 7 commits into
mainfrom
feature/replace-weather-eval-with-additional-inference

mcgibbon commented Apr 28, 2026 •

edited

Loading

Uh oh!

mcgibbon Apr 28, 2026 •

edited

Loading

Uh oh!

spencerkclark left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mcgibbon commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcgibbon Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spencerkclark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcgibbon commented Apr 28, 2026 •

edited

Loading

mcgibbon Apr 28, 2026 •

edited

Loading