Bnb/cond obs fwp by bnb32 · Pull Request #262 · NatLabRockies/sup3r

bnb32 · 2025-04-12T00:53:17Z

The commit history is a mess but the cumulative effect is almost entirely in sup3r/preprocessing/rasterizers/exo.py and sup3r/models/with_obs.py. Lots of tiny edits re: refactoring, adding doc strings, etc.

…et to work with three data members.

…ember in data

…orbed WithObs variant.

…ind model training added to examples README.

…`` class to enable passing ``xr.Dataset`` and like objects directly to ``BatchHandlers`` without needing to invoke ``.data``

…s with obs data

…ueueing, since batches can be sampled directly if there are none in the queue).

…d dequeueing since this would cast to tensors.

…h ``max_workers > 1``.

… just sampling is, except for macos, but training is not.

…ection``

… files and replacement.

…classes fix: update tensor initialization to use tf.cast for consistency refactor: simplify Loader initialization in ForwardPassStrategy refactor: improve docstring handling in Loader class test: convert observation mask to numpy array in test_fixed_wind_obs

grantbuster · 2025-05-27T20:48:13Z

+        logger.info(msg)
+        return total_grad, loss_details
+
+    def _get_parallel_grad(


Does any of this need to be decorated with @tf.function? Am i crazy or did you remove this? If so, why?

This is actually a new helper function that was in run_gradient_descent before. Couldn't have a tf.function decorator since it was calling np.array_split

grantbuster · 2025-05-29T17:11:00Z

+class Sup3rGanWithObs(Sup3rGan):
+    """Sup3r GAN model which includes mid network observation fusion. This
+    model is useful for when production runs will be over a domain for which
+    observation data is available."""


Somewhere, probably in this header, can you describe the nuances of training with (fake)observations vs. inference with real observations? I think that functions described with "obs for current batch" are for fake observations from training data but it's not really clear.

I also want to know how someone would pass in real observations at inference time. You don't subclass the generate() function here so i don't see anywhere that would describe how to provide real observations in a fwp.

Yeah thanks for this feedback. Hard to get this perspective when you're "in it" for a while. Obs are used in fwp the same way as other exogenous data sources, but the ObsRasterizer should be specified (this allows NaNs to make it through the rasterization) or obs feature names should include the suffix "_obs" to automatically use the ObsRasterizer.

new docs look good thanks!

grantbuster · 2025-05-29T17:41:26Z

+            step and will be converted to a list
        """  # noqa : D301
        if isinstance(steps, dict):
+            for feat, entry in steps.items():


This whole init statement is really convoluted and confusing and i'm pretty sure the entire purpose of this logic is to accomodate inputs that are not correctly formatted. Do we really need to be this flexible? I think decisions like this make things way too complex. My suggestion would be to just have a ton of assert statements in this init to make sure things are formatted correctly and then have static methods that could massage things into the correct format.

Also, nitpick, no vertical spacing for this entire block is insane. No way this is one logical group of statements.

Yeah this is definitely fair. I got tired of not being able to pass exo data to generate just with a dict {'topography': nd.array(...)} - instead needing to do this nested dict described in the current doc. The nested dict was motivated by the need to work with multi step models which require a lot more complexity, but we've moved away from multi-step models a bit in favor of "multi-step" pipelines (e.g. spatial config, temporal config). Maybe this could allow us to simplify?

Is this a classic example of the challenge of having a single object do all the things? Trying to do: exo data for multiple features, multiple model steps, and multiple types of concat? Would having more atomic objects help with this? Like having a exo class for each of these options, or having a class template for a single exo layer that can be stacked by another class.

This might also be a good example of where flat is better than nested. A flat list of exo objects would be really easy to parse through and could be handled the same for a single GAN and multi-step GAN.

What if the ExoDataSingle class was just a data class with ExoDataSingle(data, fname, step_ind, ctype, model_ind=0) and then ExoData was just a list of these and they would be super easy to parse through for single in ExoData if single.attrs == desired_attrs then use this object; raise if desired_attrs not found

Sounds like maybe a good opportunity for refactor, but probably in a separate PR after we do a release with all of this PR's work. Right now just glad you simplified the init logic, thanks!

bnb32

Would you prefer more doc info elsewhere or do you think these updates are sufficient? Also, lmk what you think on the ExoData response.

bnb32 · 2025-05-29T23:02:33Z

+        logger.info(msg)
+        return total_grad, loss_details
+
+    def _get_parallel_grad(


This is actually a new helper function that was in run_gradient_descent before. Couldn't have a tf.function decorator since it was calling np.array_split

bnb32 · 2025-05-29T23:10:06Z

+class Sup3rGanWithObs(Sup3rGan):
+    """Sup3r GAN model which includes mid network observation fusion. This
+    model is useful for when production runs will be over a domain for which
+    observation data is available."""


Yeah thanks for this feedback. Hard to get this perspective when you're "in it" for a while. Obs are used in fwp the same way as other exogenous data sources, but the ObsRasterizer should be specified (this allows NaNs to make it through the rasterization) or obs feature names should include the suffix "_obs" to automatically use the ObsRasterizer.

bnb32 · 2025-05-29T23:33:12Z

+            step and will be converted to a list
        """  # noqa : D301
        if isinstance(steps, dict):
+            for feat, entry in steps.items():


Yeah this is definitely fair. I got tired of not being able to pass exo data to generate just with a dict {'topography': nd.array(...)} - instead needing to do this nested dict described in the current doc. The nested dict was motivated by the need to work with multi step models which require a lot more complexity, but we've moved away from multi-step models a bit in favor of "multi-step" pipelines (e.g. spatial config, temporal config). Maybe this could allow us to simplify?

bnb32

@grantbuster Would you prefer more doc info elsewhere or do you think these updates are sufficient? Also, lmk what you think on the ExoData question.

…added tf.function back to _get_parallel_grad.

…e / training only designation

Bnb/cond obs fwp

bnb32 force-pushed the bnb/cond_obs_fwp branch from 2c0c05c to 2eeb503 Compare April 12, 2025 13:10

bnb32 force-pushed the bnb/cond_obs_fwp branch 2 times, most recently from ffe6dd8 to 018276a Compare April 22, 2025 16:31

bnb32 added 27 commits April 24, 2025 13:19

dual sampler, queue, and batch handler with obs. modifying Sup3rDatas…

c65ce52

…et to work with three data members.

training with obs test

16ca65c

split up interface and abstact model

8630629

made dual batch queue flexible enough to account for additional obs m…

9e35ce0

…ember in data

tensorboard mixin moved to model utilities. dual queue completely abs…

7c52bf5

…orbed WithObs variant.

integrated dual sampler with obs into base dual sampler.

709792f

examples added to DataHandler doc string. Some instructions on sup3rw…

019d022

…ind model training added to examples README.

removed namedtuple from Sup3rDataset to make Sup3rDataset picklable.

7ff2fba

parallel batch queue test added.

5a483c4

namedtuple -> DsetTuple missing attr fix

53bdf46

gust added to era download variables. len dunder added to ``Container…

f25beb0

…`` class to enable passing ``xr.Dataset`` and like objects directly to ``BatchHandlers`` without needing to invoke ``.data``

computing before reshaping is 2x faster.

5e10120

obs_index fix - sampler needs to use hr_out_features for the obs member.

f01f136

split up calc_loss and calc_loss_obs

01d9e60

Optional run_qa flag in DualRasterizer. Queue shape fix for queue…

507b158

…s with obs data

run_qa=True default for DualRasterizer

af047da

better tracking of batch counting. (this can be tricky for parallel q…

1388cd7

…ueueing, since batches can be sampled directly if there are none in the queue).

missed compute call for slow batching. this was hidden by queueing an…

950ca9e

…d dequeueing since this would cast to tensors.

Included convert to tensor in sample_batch. Test for training wit…

0b489b6

…h ``max_workers > 1``.

cc batch handler test fix

d7ca6cd

added test for new disc with "valid" padding

07251bb

parallel sampling batch sampling test.

0b1a43d

removed workers tests. max_workers > 1 still not consistently faster.…

81bad5c

… just sampling is, except for macos, but training is not.

Sup3rGanWithObs model subclass. Other misc model refactoring.

ed2cc21

moved _run method to bias correction interface ``AbstractBiasCorr…

1ee7e31

…ection``

moved _run method to bias correction interface ``AbstractBiasCorr…

e353ebc

…ection``

fix: tensorboard issue with loss obs details

707a624

bnb32 added 3 commits May 5, 2025 10:00

_write_single as classmethod in cacher

5383593

using _write_single instead of write_netcdf - absorbs handling of tmp…

93735f3

… files and replacement.

using _write_single instead of write_netcdf - absorbs handling of tmp…

5046c7b

… files and replacement.

bnb32 force-pushed the bnb/cond_obs_fwp branch from 666e71e to 5046c7b Compare May 6, 2025 20:48

bnb32 added 10 commits May 7, 2025 08:57

ensuring correct ordering of input features to generator

7082c87

less verbose logging

d2e7692

using _write_single instead of write_netcdf - absorbs handling of tmp…

87254ef

… files and replacement.

_write_single feature parsing fix

8741fb9

fix: already have file paths in input_handler_kwargs

17d02ee

feat: sample obs frac for each batch element to build obs mask

d7276a2

removed embedded obs layer and fix tf.gather calls.

264fc47

test fix - obs mask to numpy from tensor.

b0bda95

bug fix with temporal_pad = 0

7de5183

grantbuster requested changes May 29, 2025

View reviewed changes

bnb32 commented May 29, 2025

View reviewed changes

bnb32 added 8 commits May 29, 2025 17:59

updated docs for obs model use info. corrected some old doc strings. …

b06b92c

…added tf.function back to _get_parallel_grad.

test fix -> renamed get_obs_mask to _get_full_obs_mask to make privat…

4482ea4

…e / training only designation

asserts instead of checks in ExoData init.

6ed7795

bump required phygnn version

4134834

enabling loss_obs functions which require 2d or 3d tensors

20c4eb3

changed notebook to use all keys required for ExoData

86ea30b

run_exo_layer methods for handling sup3r obs model layers

a5002b0

fix: check for means is None before norm in run exo layer

f79983e

grantbuster approved these changes Jun 2, 2025

View reviewed changes

refactor: use tf methods for obs loss calc and gpu splitting

c76fea1

bnb32 merged commit 285a9d4 into main Jun 2, 2025
12 checks passed

bnb32 deleted the bnb/cond_obs_fwp branch June 2, 2025 20:28

github-actions bot pushed a commit that referenced this pull request Jun 2, 2025

Merge pull request #262 from NREL/bnb/cond_obs_fwp

85d2cda

Bnb/cond obs fwp

Conversation

bnb32 commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnb32 May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnb32 left a comment

Choose a reason for hiding this comment

Uh oh!

bnb32 May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bnb32 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bnb32 commented Apr 12, 2025 •

edited

Loading

bnb32 May 29, 2025 •

edited

Loading

bnb32 May 29, 2025 •

edited

Loading

bnb32 left a comment •

edited

Loading