Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy implementation of multi_model_statistics and ensemble_statistics preprocessors #968

Merged
merged 81 commits into from
Jun 6, 2023

Conversation

Peter9192
Copy link
Contributor

@Peter9192 Peter9192 commented Jan 28, 2021

Description

Make the multi_model_statistics and ensemble_statistics preprocessors lazy.

Note that they are only lazy if

  • at least one of the input cubes is lazy
  • the statistics operator is lazy (all but MEDIAN)
  • the input cubes do not have to be interpolated to get the right time points

Closes #781


Before you get started

Checklist


To help with the number pull requests:

@Peter9192 Peter9192 changed the title First attempt at replacing multimodel stats with a native iris altern… Lazy implementation of _multimodel statistics preprocessor Jan 28, 2021
@stefsmeets stefsmeets self-assigned this Feb 22, 2021
@stefsmeets
Copy link
Contributor

stefsmeets commented Feb 23, 2021

Hi @Peter9192 , I just noticed #1009 got fixed in a separate PR (#1010) which is causing a merge conflict. Just wanted to check with you if this is still an issue in the changes we are making here (meaning can we ignore it)?

@Peter9192
Copy link
Contributor Author

Hi @Peter9192 , I just noticed #1009 got fixed in a separate PR (#1010) which is causing a merge conflict. Just wanted to check with you if this is still an issue in the changes we are making here (meaning can we ignore it)?

If this is a feature that's needed than there should be a test for it. I think the new implementation will work only if either no or all cubes have this 'altitude' coordinate, not sure what the current master does...

@bouweandela
Copy link
Member

No. So far, we have not been able to find additional funding to work on this, unfortunately.

@bouweandela bouweandela modified the milestones: v2.5.0, v2.6.0 Feb 3, 2022
@rcomer
Copy link

rcomer commented Mar 25, 2022

SciTools/iris#3901 has now been merged, so the lazy percentile aggregation will be available at Iris v3.3, which is due out in the summer.

@valeriupredoi
Copy link
Contributor

many thanks for the heads up @rcomer

@sloosvel
Copy link
Contributor

No. So far, we have not been able to find additional funding to work on this, unfortunately.

Should this PR be removed from the 2.6 milestone then?

@bouweandela bouweandela modified the milestones: v2.6.0, v2.7.0 May 17, 2022
@ESMValGroup ESMValGroup deleted a comment from zklaus Jun 30, 2022
@bouweandela
Copy link
Member

Example recipe for testing:

---
documentation:
  description: Test recipe for multimodel statistics
  authors:
    - andela_bouwe

  references:
    - gleckler08jgr

  projects:
    - c3s-magic

preprocessors:

  preproc:
    regrid:
      target_grid: 0.5x0.5
      scheme: linear
    multi_model_statistics:
      span: overlap
      statistics: [mean]
      keep_input_datasets: true

diagnostics:

  tas:
    variables:
      tas:
        preprocessor: preproc
        product: output1
        mip: Amon
        project: CMIP5
        exp: historical
        ensemble: r1i1p1
        start_year: 1850
        end_year: 2005
    additional_datasets:
      - {dataset: ACCESS1-0}
      - {dataset: ACCESS1-3}
      - {dataset: bcc-csm1-1}
      - {dataset: bcc-csm1-1-m}
      - {dataset: BNU-ESM}
      - {dataset: CanESM2}
      - {dataset: CCSM4}
      - {dataset: CESM1-BGC}
      - {dataset: CESM1-CAM5}
      - {dataset: CESM1-FASTCHEM}
      - {dataset: CESM1-WACCM}
      - {dataset: CSIRO-Mk3-6-0}
      - {dataset: FIO-ESM}
      - {dataset: inmcm4}
      - {dataset: IPSL-CM5A-LR}
      - {dataset: IPSL-CM5A-MR}
      - {dataset: IPSL-CM5B-LR}
      - {dataset: MPI-ESM-LR}
      - {dataset: MPI-ESM-MR}
      - {dataset: MPI-ESM-P}
      - {dataset: NorESM1-M}
      - {dataset: NorESM1-ME}
      - {project: CMIP6, dataset: TaiESM1, ensemble: r1i1p1f1, grid: gn}
    scripts: null

@bouweandela bouweandela removed the request for review from jvegreg September 23, 2022 13:18
@bouweandela bouweandela removed this from the v2.7.0 milestone Sep 23, 2022
@bouweandela
Copy link
Member

@fnattino and I have some time to work on this now, but it seems unlikely that it will be ready in time for v2.7.

@bouweandela
Copy link
Member

Tests by @fnattino appear to show that this works quite well when used with #1714.

@bouweandela bouweandela added this to the v2.9.0 milestone Nov 21, 2022
@bouweandela bouweandela changed the title Lazy implementation of _multimodel statistics preprocessor Lazy implementation of multi_model_statistics and ensemble_statistics preprocessors Jun 2, 2023
@bouweandela bouweandela added preprocessor Related to the preprocessor dask related to improvements using Dask labels Jun 2, 2023
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many thanks to all who contributed to this - it looks great and I bet it's also adding quite a bit of performance improvement (I've not tested though); I am approving, but I have a few, minor questions. Cheers 🍺

esmvalcore/preprocessor/_multimodel.py Show resolved Hide resolved
esmvalcore/preprocessor/_multimodel.py Show resolved Hide resolved
esmvalcore/preprocessor/_regrid.py Show resolved Hide resolved
@valeriupredoi valeriupredoi added the enhancement New feature or request label Jun 2, 2023
@bouweandela bouweandela merged commit 83650df into main Jun 6, 2023
@bouweandela bouweandela deleted the mmstats_lazy branch June 6, 2023 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask related to improvements using Dask enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making the multi-model statistics preprocessor lazy
10 participants