Custom aggregation #147

aulemahal · 2023-01-31T20:50:12Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
(If applicable) Documentation has been added / updated (for bug fixes / features)
HISTORY.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

New to_dataset method on DataCatalog.

Same as to_dask, but exposes options to change the aggregation control :

concat_on to list columns over which the datasets are concatenated.
ensemble_on to list columns over which a realization dimension is created.

The goal of this function is to reduce code complexity is some common cases where one wants a dataset with all members and experiments (for examples).

I also added a "good to know" page to the doc. A place where to list all sorts of misc information that users of xscen should be aware of. The first section is about how to open data.

Example:

cat = xs.DataCatalog("/you/know/where/ESPO-extra.json")

ds = cat.search(
    bias_adjust_project='ScenGen', xrfreq='QS-DEC'
).to_dataset(concat_on=['experiment'], ensemble_on=['institution', 'source'])

The output:

<xarray.Dataset>
Dimensions:                   (experiment: 2, realization: 11, time: 605, lat: 320, lon: 416)
Coordinates:
  * lat                       (lat) float32 66.62 66.54 66.46 ... 40.12 40.04
  * lon                       (lon) float32 -89.05 -88.96 ... -54.55 -54.46
  * time                      (time) datetime64[ns] 1949-12-01 ... 2100-12-01
  * experiment                (experiment) object 'rcp45' 'rcp85'
  * realization               (realization) object 'CCCma_CanESM2' ... 'NOAA-...
...

The same code with pure xscen:

dss = []
for exp in ['rcp45', 'rcp85']:
    cats = xs.search_data_catalogs(
        "/you/know/where/ESPO-extra.json",
        variables_and_freqs={ind: 'QS-DEC' for ind in indicators},
        other_search_criteria={'domain': 'QC'},
    )
    dsd = {}
    for dsid, cat in cats.items():
        dsd[dsid] = xs.extract_dataset(cat)['QS-DEC']
    dss.append(xclim.ensembles.create_ensembles(dsd, calendar='standard', resample_freq='QS-DEC')
ds = xr.concat(dss, xr.DataArray(['rcp45', 'rcp85'], dims=('experiment',), name='experiment'))

The plus value of this PR seems evident to me here.

Where ?

As it was developed for an ensemble case, I implemented it as a single-dataset output. But that seems a bit limited. In a xscen-world, this could be implemented at the search_data_catalogs level.

However, it felt to me that the search_data_catalogs/extract_dataset combo is best used for raw data. Once at the ensemble step, simples DataCatalog.search are often enough. Thus, the need to have this on DataCatalog.

Also, this could be moved upstream to intake-esm, but the ensemble_on and calendar args seem a bit to xclim-specific...

juliettelavoie

I think the good to know page is really neat!
I am having issues using to_dataset. Unclear to me if I don't understand what it is suppose to do or it is not working...

docs/goodtoknow.rst

xscen/catalog.py

Co-authored-by: juliettelavoie <juliette.lavoie@hotmail.ca>

docs/columns.rst

docs/goodtoknow.rst

xscen/catalog.py

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

for more information, see https://pre-commit.ci

aulemahal · 2023-02-08T16:41:46Z

@RondeauG I re-edited the docstring because it felt a bit redundant.

xscen/catalog.py

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

RondeauG

I think it's good to go!

aulemahal added 3 commits January 31, 2023 15:31

to_dataset() - mv ensure_correct_time to func

ea9ddb4

merge

9756669

Add doc about opening data

5657988

aulemahal requested review from juliettelavoie and RondeauG February 3, 2023 23:09

aulemahal added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 3, 2023

aulemahal marked this pull request as ready for review February 3, 2023 23:09

juliettelavoie reviewed Feb 6, 2023

View reviewed changes

docs/goodtoknow.rst Show resolved Hide resolved

docs/goodtoknow.rst Outdated Show resolved Hide resolved

xscen/catalog.py Outdated Show resolved Hide resolved

xscen/catalog.py Show resolved Hide resolved

aulemahal and others added 3 commits February 6, 2023 16:45

Apply suggestions from code review

51d9cff

Co-authored-by: juliettelavoie <juliette.lavoie@hotmail.ca>

Fix to_dataset

81092af

Fix to_dataset again - upd hist and doc

220dbe8

RondeauG reviewed Feb 7, 2023

View reviewed changes

xscen/catalog.py Outdated Show resolved Hide resolved

fix id processing

d4fccc3

RondeauG reviewed Feb 8, 2023

View reviewed changes

xscen/catalog.py Outdated Show resolved Hide resolved

xscen/catalog.py Outdated Show resolved Hide resolved

xscen/catalog.py Outdated Show resolved Hide resolved

xscen/catalog.py Show resolved Hide resolved

aulemahal and others added 3 commits February 8, 2023 11:21

Apply suggestions from code review

b33ba05

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

bc3a1dd

for more information, see https://pre-commit.ci

rename ensemble_on, further edits to the docstring

2cf37e7

RondeauG reviewed Feb 8, 2023

View reviewed changes

xscen/catalog.py Outdated Show resolved Hide resolved

Update xscen/catalog.py

dcf49c0

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

RondeauG approved these changes Feb 8, 2023

View reviewed changes

aulemahal merged commit 8e44005 into main Feb 8, 2023

aulemahal deleted the custom-agg branch February 8, 2023 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom aggregation #147

Custom aggregation #147

aulemahal commented Jan 31, 2023 •

edited

juliettelavoie left a comment

aulemahal commented Feb 8, 2023

RondeauG left a comment

Custom aggregation #147

Custom aggregation #147

Conversation

aulemahal commented Jan 31, 2023 • edited

Pull Request Checklist:

What kind of change does this PR introduce?

Example:

Where ?

juliettelavoie left a comment

Choose a reason for hiding this comment

aulemahal commented Feb 8, 2023

RondeauG left a comment

Choose a reason for hiding this comment

aulemahal commented Jan 31, 2023 •

edited