Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor stacked climate dataset #103

Merged
merged 6 commits into from
May 18, 2023
Merged

Conversation

prakhar6sharma
Copy link
Collaborator

@prakhar6sharma prakhar6sharma commented May 18, 2023

Issues fixed by this PR:

  • StackedClimateDataset is hard coded to work for Downscaling only because it has a different return signature than the ERA5 and ClimateDataset. Specifically, Downscaling assumes that the raw_data it would receive would be a list with 2 items, one for input and other for output. Thus limiting us with a single dataset for input and single dataset for output.
  • Imports under the data/ folder were all absolute imports.
  • Forecasting's create_copy() allows illegal values for some of it's attributes

Solution implemented:

  • All ClimateDataset has a new attribute called name. Thus, all variables are indexed by f"{dataset_name}:{variable}".
  • StackedClimateDataset now allows recursively stacking arbitrary ClimateDataset's while keeping the same return signature. Specifically, if it has two child datasets named "child1" and "child2" and it's own name is "parent", now the name of the children are changed to "parent:child1" and "parent:child2".
  • Because of the above change, the in_vars, out_vars, and constants of the Task should have the dataset name followed by a colon followed by the variable name. Example: ["2m_temperature", "geopotential"] to ["era5:2m_temperature", "era5:geopotential"].
  • Imports are changed under the data/ folder to follow relative paths.
  • Alphabetize imports.
  • Implemented create_copy() for Forecasting.

@prakhar6sharma
Copy link
Collaborator Author

prakhar6sharma commented May 18, 2023

To Do:

  • Update the notebooks.

@prakhar6sharma prakhar6sharma force-pushed the refactor_stacked_climate_dataset branch from 09beb4a to c278d11 Compare May 18, 2023 03:28
Copy link
Collaborator

@jasonjewik jasonjewik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50-50 on this, so you could convince me either way, but why does name need to be a parameter for ClimateDatasetArgs. For example, if I use ERA5Args, it just makes sense that the name is "era5". A compromise solution could be that the default name is "era5", but we still allow the user to specify a different name.

Otherwise, good PR.

@prakhar6sharma
Copy link
Collaborator Author

I'm 50-50 on this, so you could convince me either way, but why does name need to be a parameter for ClimateDatasetArgs. For example, if I use ERA5Args, it just makes sense that the name is "era5". A compromise solution could be that the default name is "era5", but we still allow the user to specify a different name.

Why name need to be a parameter for ClimateDatasetArgs: Suppose we are doing downscaling. For the typical setup that we support, both high_res and low_res data are from ERA5. Thus keeping the name as "era5" for both of them would lead to ambiguity. Right now, there is a default name for every different ClimateDatasetArgs class, for ERA5Args that happen to be "era5".

@jasonjewik jasonjewik merged commit 95dc85c into main May 18, 2023
6 checks passed
@jasonjewik jasonjewik deleted the refactor_stacked_climate_dataset branch May 18, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants