Downscaling model handles static input instead of DataLoader by frodre · Pull Request #954 · ai2cm/ace

frodre · 2026-03-10T23:33:23Z

Previously, the DataLoader was responsible for subsetting and device-placing static inputs (e.g. fine-resolution topography) to match each batch's spatial extent before passing them to the model. This moves that responsibility into DiffusionModel itself: the model stores the full-domain static inputs on construction and subsets them per-batch using the batch's coordinate metadata.

Changes:

DiffusionModel.__init__ now calls .to_device() on static inputs at construction
New DiffusionModel._subset_static_inputs encapsulates fine lat/lon subsetting for train_on_batch and generate_on_batch
generate_on_batch_no_target derives the fine coordinate interval from coarse batch coordinates via adjust_fine_coord_range, then subsets stored static inputs
BatchData gains lat_interval and lon_interval properties for use in subsetting
adjust_fine_coord_range now raises a clear ValueError when the coordinate range is too close to the domain boundary for the required number of fine points to exist (documents the implicit ±88° latitude requirement)
The static_inputs parameter on all public model methods is retained but ignored — removal is deferred to a follow-on PR
Tests added

frodre · 2026-03-10T23:36:10Z

fme/downscaling/data/datasets.py


+    @property
+    def lat_interval(self) -> ClosedInterval:
+        lat = self.latlon_coordinates.lat[0]  # all batch members identical; use first


Given that we're no longer doing per-item random patch definitions, we could update BatchData to just use LatLonCoordinates

AnnaKwa

I really like these changes, this is a really clean way to remove the static inputs from the data loading! Just to be super sure, can you add a test that the generated batches (when patching is used) have lat lon coords subset as expected? I think the current tests only check data values since we haven't used the coord info before.

AnnaKwa · 2026-03-11T18:45:06Z

fme/downscaling/models.py

        self,
        coarse_data: TensorMapping,
-        static_inputs: StaticInputs | None,
+        static_inputs: StaticInputs | None,  # TODO: remove in follow-on PR


This should be kept since generate will be passed subsetted static_inputs from generate_on_batch or generate_on_batch_no_target, right?

AnnaKwa · 2026-03-11T18:49:45Z

fme/downscaling/models.py

        n_samples: int = 1,
    ) -> tuple[TensorDict, torch.Tensor, list[torch.Tensor]]:
+        # static_inputs receives an internally-subsetted value from the calling method;
+        # external callers should use generate_on_batch / generate_on_batch_no_target.


Out of scope for this PR, but the only external caller is CascadePredictor.generate, which uses this instead of generate_on_batch_no_target because it was simpler to just pass the first model's output tensor instead of making a new BatchData object out of it to input to the next model. It would be worth adding a helper function to construct a new BatchData object out of generate's output; then we could make this method private.

I mentioned in the #959 , but I ran into the problem that inference needs knowledge of the output sizes that forces some awkward handling within that module. I think this could be solved by having a richer set of output information (e.g., coordinates) passed along in the generation/prediction. That would also allow for some smarter handling in CascadedModels as well.

AnnaKwa · 2026-03-11T18:54:22Z

fme/downscaling/models.py

-        generated, _, _ = self.generate(batch.data, static_inputs, n_samples)
+        # Ignore the passed static_inputs; derive the fine lat/lon interval from coarse
+        # batch coordinates via adjust_fine_coord_range, then subset self.static_inputs.
+        if self.config.use_fine_topography:


Is this check necessary? Could this instead just check if self.static_inputs is None?

Since the previous PR removed the old option of loading HGTsfc from the fine dataset, this config option can be deprecated and downstream checks could be removed.

Yes, I agree and was thinking this and the DataRequirements.use_fine_topography would be handled in another PR to simplify things.

AnnaKwa · 2026-03-11T19:07:59Z

fme/downscaling/data/datasets.py

+    def lon_interval(self) -> ClosedInterval:
+        lon = self.latlon_coordinates.lon[0]  # all batch members identical; use first
+        return ClosedInterval(lon.min().item(), lon.max().item())
+


Could you add a test that the coordinates are as expected when BatchData.generate_from_patches is called? It looks like the existing tests for that usage only check data values.

Added tests for the data and coordinates for generate_from_patches under test_datasets.py. I did not adjust the tests in test_static.py since the patch generation will be removed in #956.

frodre · 2026-03-12T20:42:13Z

fme/downscaling/data/datasets.py


    def __getitem__(self, k):
        return BatchItem(
-            {key: value[k].squeeze() for key, value in self.data.items()},


Tests uncovered an error for patches of length 1 in the x/y dimension!

AnnaKwa

LGTM, thanks!

This PR finalizes the removal of the `StaticInput` handling by the data pipeline. The passing of static_input objects are removed from the data configuration, batch iteration, and model call signatures in favor of the direct model handling introduced in the previous downscaling PR (#954). Changes: - add `get_fine_coords_for_batch` to facilitate translation of an input batch domain to output coordinates via the models stored information. For now, this relies on the model's `static_inputs`, but will be switched to model's stored coordinates in (#971) - inference `Downscaler` now takes the batch `input_shape` instead of `static_inputs` to check the domain size and model type (regular `DiffusionModel` or `PatchPredictor` - downscaling `torch.datasets` generators for `BatchData` no longer include `StaticInputs` - removed `_apply_patch` and `_generate_from_patches` from `StaticInputs` - `config.py` no longer references static inputs as an argument - [x] Tests added

frodre commented Mar 10, 2026

View reviewed changes

Base automatically changed from refactor/remove-topography-dataloader-pathway to main March 11, 2026 00:12

frodre added 4 commits March 10, 2026 17:43

First shot at model handling static input subsetting

5cf4452

Add explicit coverage for fine domain from coarse input index error

000dcf7

Simplify monotonic coordinate handling

76903d7

Fix model tests with monotonic coords, simplify static inputs usage

f144258

frodre force-pushed the refactor/static-input-handled-by-model branch from 555cb92 to f144258 Compare March 11, 2026 00:44

frodre marked this pull request as ready for review March 11, 2026 04:17

AnnaKwa requested changes Mar 11, 2026

View reviewed changes

This was referenced Mar 11, 2026

Refactor/private generate method #958

Closed

Refactor CascadePredictor.generate so generate methods can be private #959

Closed

frodre added 3 commits March 11, 2026 21:41

remove unneeded TODO

cac8ebf

Add testing of coordinate subsetting within generate_from_patches

7debcc0

Add dataset patch tests

e3192af

frodre commented Mar 12, 2026

View reviewed changes

frodre requested a review from AnnaKwa March 12, 2026 20:43

AnnaKwa approved these changes Mar 13, 2026

View reviewed changes

Merge branch 'main' into refactor/static-input-handled-by-model

ba5502a

frodre enabled auto-merge (squash) March 13, 2026 18:04

frodre merged commit a7d8555 into main Mar 13, 2026
7 checks passed

frodre deleted the refactor/static-input-handled-by-model branch March 13, 2026 18:18

frodre mentioned this pull request Mar 16, 2026

Remove static_input from data pipeline and model call signatures #956

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downscaling model handles static input instead of DataLoader#954

Downscaling model handles static input instead of DataLoader#954
frodre merged 8 commits intomainfrom
refactor/static-input-handled-by-model

frodre commented Mar 10, 2026 •

edited

Loading

Uh oh!

frodre Mar 10, 2026

Uh oh!

AnnaKwa left a comment

Uh oh!

AnnaKwa Mar 11, 2026

Uh oh!

AnnaKwa Mar 11, 2026

Uh oh!

frodre Mar 12, 2026 •

edited

Loading

Uh oh!

AnnaKwa Mar 11, 2026

Uh oh!

frodre Mar 12, 2026

Uh oh!

AnnaKwa Mar 11, 2026

Uh oh!

frodre Mar 12, 2026

Uh oh!

frodre Mar 12, 2026

Uh oh!

AnnaKwa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

frodre commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AnnaKwa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frodre Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AnnaKwa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frodre commented Mar 10, 2026 •

edited

Loading

frodre Mar 12, 2026 •

edited

Loading