Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D ensemble weights #231

Merged
merged 21 commits into from Aug 9, 2023
Merged

2D ensemble weights #231

merged 21 commits into from Aug 9, 2023

Conversation

RondeauG
Copy link
Contributor

@RondeauG RondeauG commented Aug 2, 2023

Pull Request Checklist:

What kind of change does this PR introduce?

  • Major changes to generate_weights:
    • The function itself was mostly rewritten, in an effort to more easily support independence levels.
    • The results for independence_level: GCM were changed substantially. See Incorrect weights for independence_level="GCM" #230.
    • New independence level: institution
    • New argument: split_experiments: Pretty self-explanatory
    • New argument: skipna:
      • If True (default), then the weights are built using the attributes only.
      • If False, the weights are built by taking into account the values themselves and putting a weight of 0 when NaNs are encountered. Requires a time or horizon dimension. If the dimension is not the same across all datasets, they are first reindexed to cover all horizon/time.
  • Updated produce_horizon so it can accept multiple periods or warming levels.
  • Small bugfix in cleanup to prevent raising an error when no cat: attribute exists in the datasets.
  • Adds tests to most of xs.ensemble_stats.
  • Adds tests to xs.generate_weights
  • Updated the notebooks to reflect the changes (Getting Started - Ensemble stats & Warming Levels)

Does this PR introduce a breaking change?

  • Removed a FutureWarning in ensemble_stats that was there since xscen 0.4.0. stats_kwargs is not fully abandoned.
  • independence_level: all was renamed model for better consistency.
  • The results for independence_level: GCM were changed substantially. See Incorrect weights for independence_level="GCM" #230.
  • period in produce_horizon has been deprecated and replaced with periods.
  • Some automated to_level were updated to reflect more recent changes.

Other information:

@RondeauG
Copy link
Contributor Author

RondeauG commented Aug 2, 2023

I can't turn the PR into a draft, but you can wait a little before reviewing this. While updating the Notebooks, I realized that produce_horizon could be updated to allow multiple periods/warming levels, now that we can support that further down the road.

@RondeauG
Copy link
Contributor Author

RondeauG commented Aug 3, 2023

Alright, this is good for review.

xscen/aggregate.py Outdated Show resolved Hide resolved
xscen/utils.py Show resolved Hide resolved
docs/notebooks/2_getting_started.ipynb Show resolved Hide resolved
"</div>\n",
"\n",
"Next, the weights and the datasets can be passed to `xs.ensemble_stats` to calculate the ensemble statistics."
"Next, the weights and the datasets can be passed to `xs.ensemble_stats` to calculate the ensemble statistics. Note that if `split_experiments=True` is used, the datasets should be split before being sent to `ensemble_statistics`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence. If we need to do it separatly doesn't that defeat the purpose ?

Copy link
Contributor Author

@RondeauG RondeauG Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bring out a going counterpoint that even though we can now more easily weight the experiments separately, it's not really useful. For now, I've removed the text related to this from the notebooks, and split_experiments == False is the default anyway.

Two options:

  1. I change it so that an equal weight is given to all experiments. i.e. if SSP585 has 30 simulations and SSP245 only has 5, the total weight is still 1 each. This would allow for some fancy ensemble statistics that are not possible if you compute each experiment separately.
  2. I keep it as it is, it just won't be too useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see when we would want 1... so I would vote for 2 !

Copy link
Contributor Author

@RondeauG RondeauG Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A case for 1 (which is kinda what I had in mind when designing the option) would be if you want to make statistics on the full scope of possible climate change. @sarahclaude will need to do that for her project (although in her case, we would need to allow the weight between experiments to differ from 1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will be useful to be able to associate weights to experiement in an ensemble! I am not using it directly in ensemble_stats.

xscen/aggregate.py Outdated Show resolved Hide resolved
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@RondeauG RondeauG merged commit be660df into main Aug 9, 2023
11 checks passed
@RondeauG RondeauG deleted the even-moar-tests branch August 9, 2023 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants