ENH: Improve percentile based indices' descriptions #1050

bzah · 2022-04-08T14:02:28Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes In {cold, warm}_spell_duration_index indicators description, the percentile threshold is not dynamically set. #1047
Tests for the changes have been added (for bug fixes / features)
- (If applicable) Documentation has been added / updated (for bug fixes / features)
HISTORY.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added
bumpversion patch has been called on this branch
The relevant author information has been added to .zenodo.json

What kind of change does this PR introduce?

This PR makes csdi and wsdi indicators' use the proper parameterization used in percentile_doy to format their description.

This concerns, the base percentile value, the length of rolling window of days and the period on which percentiles are computed.

A limit of the current approach is that if percentiles are not computed using percentile_doy the description will use default, non configurable values.

Does this PR introduce a breaking change?

Yes, the signatures of all indices and indicators having a DataArray parameter holding percentiles have been modified.
Before, the parameter name was assuming a percentile value such as in t90.
Now, they are renamed into one of tas_per, tasmax_per, tasmin_per, pr_per depending on the expected variable type.
Their typing is also changed to PercentileDataArray which is now outputted by percentile_doy.

Other information:

With per = percentile_doy(da_per, window=2, per=[42, 27]) the description (reformatted for github) now looks like:

'description': 
"Annual number of days with at least 6 consecutive days where
 the daily minimum temperature is below the [42, 27]th percentile(s). 
a 2 day(s) window, centred on each calendar day in the ['2015-01-01', '2018-12-31'] period,
 is used to compute the [42, 27]th percentile(s)."

Previously, their definitions were possibly incorrect in the case where the user would compute percentiles with custom parameters using `percentile_doy`. This concerns, the base percentile value, the length of rolling window of days and the period on which percentiles are computed. A limit of the current approach is that the percentiles **must** be computed using `percentile_doy` because some metadata are expected in `DataArray.attrs`.

aulemahal

EDIT (I realized this review was quite cold worded) : Thank you for this work. I have done some work on the Indicator class trying to generalize it so it's easier for users to understand how to use it. But I feel it has become quite complicated to extend., and the long-term vision is still unclear... So thank you for diving in.

Instead of default_params and preformatted_attrs, I think all the parsing could be done in Indicator.format. This way, it would be easier to have more than one of those percentile thresholds. My best suggestion for now would be to return "N.A." for the cases where the correct attributes can't be parsed (replacing the default_params), and maybe issuing a warning?

xclim/core/calendar.py

xclim/core/indicator.py

xclim/indicators/atmos/_temperature.py

xclim/indices/_multivariate.py

Zeitsperre

I just had a few things to mention. I trust @huard and @aulemahal on this. Be sure to mention these call signature changes as breaking changes in the History!

xclim/indicators/atmos/_temperature.py

xclim/indices/_multivariate.py

xclim/indicators/atmos/_temperature.py

bzah · 2022-04-08T17:14:23Z

Just a note on bootstrapping (yeah I love this topic).
It's quite unnecessary to bootstrap percentiles when using non extreme percentiles such the 25th or the 75th.

This PR encourages users to use various percentile thresholds on bootstrappable indices.
Given the cost this algo, it would make sense to warn users when they try to bootstrap indices on non-extreme percentiles.

I think this should be addressed in another PR though.

For ref:

[...] This bias is relatively greater for higher percentiles. For example, there would be only one exceedance over the 99th percentile, giving an exceedance rate of 1/150 = 0.67%, which is much smaller than the nominal rate of 1%.
Zhang et al.

- Updated metadata of many indices to benefit from the configurable per params - Added a long_name to wsdi - Added # noqa for bootstrap argument

bzah · 2022-04-11T14:02:38Z

I made an attempt of creating a new InputKind PERCENTILE_VARIABLE = 2 but, it's still a work in progress.

Indicators with multiple percentile DataArray as inputs were lacking metadata update. This also ensure the variables in metadata have a name corresponding to their parameter.

bzah · 2022-04-11T17:23:22Z

The last "unit test" failure is a bit mysterious to me.
It's on test_temperature.py, in test_warm_spell_duration_index.

It seems I somehow changed how the output axes are ordered because, now
np.testing.assert_array_equal(out[0, :, 0], np.array([np.nan, 3, 0, 0, np.nan])) raises an error
but,
np.testing.assert_array_equal(out[:, 0, 0], np.array([np.nan, 3, 0, 0, np.nan])) does not...

aulemahal · 2022-04-11T17:32:50Z

The last "unit test" failure is a bit mysterious to me.

To me too... I quickly checked and I can't see where a change could have modified the dimension order. This being said, we had similar issues before and the current opinion is that the dimension order is not guaranteed. This unit test was written before those discussions and it is kinda wrong to assume the order.

This test was relying on ordered dimensions, which is discouraged when using xarray.

Now default values are used instead of raising an error.

This also includes the handling of non doy percentiles. Plus, some french translations were missing for DAYS_OVER_PRECIP_DOY_THRESH and FRACTION_OVER_PRECIP_DOY_THRESH

review-notebook-app · 2022-04-12T09:48:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

bzah · 2022-04-12T09:50:09Z

xclim/data/fr.json

-    "abstract": "Nombre de jours d'une période où la précipitation est au-dessus d'un percentile quotidien et d'un seuil fixe.",
-    "description": "Nombre {freq:m} de jours où la précipitation au-dessus d'un percentile quotidien. Seuls les jours avec au moins {thresh} sont comptés.",
-    "long_name": "Nombre de jours pluvieux où la précipitation est au-dessus d'un percentile quotidien"
+    "title": "Nombre de jours pluvieux où la précipitation est au-dessus du {pr_per_thresh}e percentile",


Vous dites la précipitation au Québec ?

Haha, c'est drôle on vient d'avoir ce débat. Je ne pense pas que le choix venait d'une différence Québec-France. On se disait que "la précipitation" était synonyme de "précipitation totale", c'est à dire en cumulant toutes les formes de précipitation. On se disait que de dire "les précipitations" impliquait une distinction potentielle entre les différentes phase.

Cela étant dit, si tu penses qu'une version est meilleure que l'autre, si tu connais des références sur lesquelles se baser, nous sommes à l'écoute!

Ça m'a juste fait bizarre en lisant la, mais je ne pense pas que mon avis soit très pertient sur ce sujet.

haha, je comprends. Mais j'insiste par contre, je pense qu'on aimerait savoir ce que d'autres groupes de recherche en pensent. Avez-vous des traductions (officielles ou non) pour les indices de icclim?

Il y a un glossaire sur le portail DRIAS de Meteo France, mais il est un peu pauvre (rien sur la/les precip).

Par contre, je viens de tomber sur un glossaire de Méteo France

Sur précipitation ils disent notamment:

Une précipitation, en météorologie , est un ensemble organisé de particules d'eau liquide ou solide tombant en chute libre au sein de l' atmosphère .

Ce terme est souvent employé au pluriel, ce qui traduit la diversité des types de précipitation, dont les plus communs sont la pluie , la bruine , la neige , la grêle et le grésil ; on recense aussi dans les types de précipitation la neige en grains , la neige roulée , le poudrin de glace et les chutes de granules de glace et de prismes de glace .

[...]

Pour ce qui est des traductions d'indices, @pagecp tu sais si on a quelque chose ?

A priori non, on n'a jamais travaillé sur la traduction des indices.

bzah · 2022-04-25T09:03:40Z

@Zeitsperre the coverage should be better now. I initially commented out test_cli because of some error I have locally on @cli.result_callback(). But, I didn't meant to commit that.

…and_wsdi_description

xclim/core/utils.py

Using __future__ should be addressed in a specific PR, after discussing it in an issue.

…and_wsdi_description

Also, rephrased them.

bzah · 2022-05-16T14:58:09Z

@aulemahal is there something else you would like to discuss on these changes ?

aulemahal

Sorry not to have followed the development of this branch!
Now that I re-read the changes, I'm not sure I understand the need for the need for the PercentileDataArray class.

The annotation is useful so that the base Indicator class knows how to format the param, how to inject the good attributes. But if percentile_doy already sets all the needed attributes, why do we need a suboject entirely? The stuff in get_metadata could be directly in Indicator.format or in a standalone function in formatting.py, no?

As I understand it, right now, the computation works if the user uses a normal DataArray, but all three metadata entries for the percentile variable are "unknown". Could we cast the input as a PercentileDataArray somewhere in the indicator's pipeline to ensure all available informations are added?

Finally, I think it would useful to add the following line to all percentile parameter's docstrings:

Xclim expects an output from :py:func:`~xclim.core.calendar.percentile_doy`.

(or something in the like).

xclim/core/formatting.py

xclim/core/indicator.py

Simplified format function. Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

bzah · 2022-05-16T16:15:58Z

@aulemahal

Now that I re-read the changes, I'm not sure I understand the need for the PercentileDataArray class.

I'm quoting you for why I added the PercentileDataArray class:

I am not convinced parsing the history is the safest way to check this. I think I'd rather prefer a new InputKind entry, that's what they were meant for. The problem here is that a "percentile threshold" is not differentiable from a normal input in the world of typing, and thus I think we'll need to use the same workaround as for DateStr.

I'm not sure how else it would be possible to recognize percentiles inputs.
However, PercentileDataArray could probably be hidden and not exposed in the API if that's what you meant by
"cast the input as a PercentileDataArray somewhere in the indicator's pipeline".
I'll look into that!

On the other hand, I feel like all InputKind logic is some sort of typing over the python typing system. I don't think it's really necessary and IMHO it could be replaced byclasses and TypedDict(s). But, that's my java background speaking, if I don't see classes, POJOs and Singletons every now and then it gets itchy.

- PercentileDataArray is no longer needed to retrieve the attributs used to fill output metadata (window, climatology_bounds...) of DataArray (such as `tasmin_per`). - Removed the corresponding InputKind and replace valid input by attributes and coordinate recognition. - Moved percentile_metadata formatter into formatting module.

bzah · 2022-05-17T10:26:31Z

Alright, I went for a in-between implementation.
I removed the PERCENTILE_VARIABLE InputKind and replaced it by a simple logic check to see if a DataArray is compatible with PercentileDataArray.

The PercentileDataArray class still exists but, is mainly used to easily create a valid percentiles input with from_da method.
I think it's useful in the case where percentiles are not computed with percentile_doy, like in precipitations example:

per = pr.quantile(0.8, "time", keep_attrs=True)
per = PercentileDataArray.from_da(per, climatology_bounds=build_climatology_bounds(pr))

In that case, we need to let the user fill climatology_bounds parameter because in Indicator pipeline we have no way to determine this period for which percentiles were computed.

aulemahal

This last version works for me!

bzah added 2 commits April 8, 2022 15:29

ENH: Add default values for format args

0c6bd06

bzah requested a review from aulemahal April 8, 2022 14:02

aulemahal requested changes Apr 8, 2022

View reviewed changes

xclim/core/calendar.py Outdated Show resolved Hide resolved

xclim/core/indicator.py Outdated Show resolved Hide resolved

xclim/indicators/atmos/_temperature.py Outdated Show resolved Hide resolved

xclim/indices/_multivariate.py Outdated Show resolved Hide resolved

Zeitsperre added standards / conventions Suggestions on ways forward enhancement New feature or request labels Apr 8, 2022

Zeitsperre reviewed Apr 8, 2022

View reviewed changes

xclim/indicators/atmos/_temperature.py Outdated Show resolved Hide resolved

xclim/indices/_multivariate.py Outdated Show resolved Hide resolved

bzah commented Apr 8, 2022

View reviewed changes

xclim/indicators/atmos/_temperature.py Show resolved Hide resolved

Merge branch 'master' into fix/#1047-fix_csdi_and_wsdi_description

e734dc0

bzah added 6 commits April 11, 2022 15:59

FIX: default values

69bb608

TST: Add test for custom percentiles in wsdi

7de7e35

ENH: Simplify things

be3b340

ENH: Rename per parameters

daa6b58

- Updated metadata of many indices to benefit from the configurable per params - Added a long_name to wsdi - Added # noqa for bootstrap argument

Remove default values

7ca31fe

ENH: Add PercentileDataArray InputKind

b9a030b

bzah added 3 commits April 11, 2022 16:11

ENH: Update French translations

2bf252a

MAINT: simplify from_da in PercentileDataArray

f2c9465

ENH: Update for cold_and_dry_days and co.

12fab09

Indicators with multiple percentile DataArray as inputs were lacking metadata update. This also ensure the variables in metadata have a name corresponding to their parameter.

bzah changed the title ~~ENH: Improve csdi/wdsi descriptions~~ ENH: Improve percentile based indices' descriptions Apr 11, 2022

FIX: JSON serialization

5fe520d

bzah added 4 commits April 11, 2022 19:36

MAINT: i18n for CD,CW,WW,WD

6b51061

TST: Rework test

191673b

This test was relying on ordered dimensions, which is discouraged when using xarray.

FIX: unit test

7c5eb88

Now default values are used instead of raising an error.

ENH: Improve precip indicators metadata

946f0f6

This also includes the handling of non doy percentiles. Plus, some french translations were missing for DAYS_OVER_PRECIP_DOY_THRESH and FRACTION_OVER_PRECIP_DOY_THRESH

bzah commented Apr 12, 2022

View reviewed changes

Zeitsperre requested a review from aulemahal April 22, 2022 18:38

bzah added 2 commits April 25, 2022 11:00

Fix typo

0c3a0d1

DOC: Add docstring to public API

5d18a6f

Merge remote-tracking branch 'origin/master' into fix/#1047-fix_csdi_…

b5a2ed3

…and_wsdi_description

Zeitsperre approved these changes Apr 27, 2022

View reviewed changes

xclim/core/utils.py Outdated Show resolved Hide resolved

bzah added 4 commits April 27, 2022 18:01

MAINT: Rollback to former typing

1523af7

Using __future__ should be addressed in a specific PR, after discussing it in an issue.

FIX: from_da typing

2e64f7b

FIX: retry typing rollback

7518ef1

FIX: typing rollback, again

9585847

bzah mentioned this pull request Apr 27, 2022

Proposal: migrate xclim typing to python 3.10 style #1065

Closed

1 task

bzah added 4 commits May 2, 2022 16:28

Merge branch 'master' into fix/#1047-fix_csdi_and_wsdi_description

ef70cdb

Merge branch 'master' into fix/#1047-fix_csdi_and_wsdi_description

553e35d

Merge remote-tracking branch 'origin/master' into fix/#1047-fix_csdi_…

3d28d2a

…and_wsdi_description

[skip-ci] DOC: Move history changes to 0.37

0a1c894

Also, rephrased them.

aulemahal approved these changes May 16, 2022

View reviewed changes

xclim/core/formatting.py Outdated Show resolved Hide resolved

xclim/core/indicator.py Outdated Show resolved Hide resolved

Update xclim/core/formatting.py

559f249

Simplified format function. Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

bzah added 2 commits May 16, 2022 18:34

MAINT: Remove useless filter

f16e3be

bzah force-pushed the fix/#1047-fix_csdi_and_wsdi_description branch from 997dff6 to 98c58d4 Compare May 17, 2022 10:26

bzah requested a review from aulemahal May 17, 2022 12:00

Merge branch 'master' into fix/#1047-fix_csdi_and_wsdi_description

eeb044d

bzah mentioned this pull request May 25, 2022

per-gridcell heat wave indice calculations? #1093

Closed

2 tasks

Merge branch 'master' into fix/#1047-fix_csdi_and_wsdi_description

5d2c321

aulemahal approved these changes May 26, 2022

View reviewed changes

bzah merged commit 6723a07 into master May 26, 2022

bzah deleted the fix/#1047-fix_csdi_and_wsdi_description branch May 26, 2022 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Improve percentile based indices' descriptions #1050

ENH: Improve percentile based indices' descriptions #1050

bzah commented Apr 8, 2022 •

edited

Loading

aulemahal left a comment •

edited

Loading

Zeitsperre left a comment

bzah commented Apr 8, 2022

bzah commented Apr 11, 2022 •

edited

Loading

bzah commented Apr 11, 2022 •

edited

Loading

aulemahal commented Apr 11, 2022 •

edited

Loading

review-notebook-app bot commented Apr 12, 2022

bzah Apr 12, 2022

aulemahal Apr 12, 2022

bzah Apr 12, 2022

aulemahal Apr 12, 2022

bzah Apr 12, 2022

bzah Apr 12, 2022

bzah Apr 25, 2022

bzah commented Apr 25, 2022

bzah commented May 16, 2022

aulemahal left a comment

bzah commented May 16, 2022 •

edited

Loading

bzah commented May 17, 2022

aulemahal left a comment

ENH: Improve percentile based indices' descriptions #1050

ENH: Improve percentile based indices' descriptions #1050

Conversation

bzah commented Apr 8, 2022 • edited Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

aulemahal left a comment • edited Loading

Choose a reason for hiding this comment

Zeitsperre left a comment

Choose a reason for hiding this comment

bzah commented Apr 8, 2022

bzah commented Apr 11, 2022 • edited Loading

bzah commented Apr 11, 2022 • edited Loading

aulemahal commented Apr 11, 2022 • edited Loading

review-notebook-app bot commented Apr 12, 2022

bzah Apr 12, 2022

Choose a reason for hiding this comment

aulemahal Apr 12, 2022

Choose a reason for hiding this comment

bzah Apr 12, 2022

Choose a reason for hiding this comment

aulemahal Apr 12, 2022

Choose a reason for hiding this comment

bzah Apr 12, 2022

Choose a reason for hiding this comment

bzah Apr 12, 2022

Choose a reason for hiding this comment

bzah Apr 25, 2022

Choose a reason for hiding this comment

bzah commented Apr 25, 2022

bzah commented May 16, 2022

aulemahal left a comment

Choose a reason for hiding this comment

bzah commented May 16, 2022 • edited Loading

bzah commented May 17, 2022

aulemahal left a comment

Choose a reason for hiding this comment

bzah commented Apr 8, 2022 •

edited

Loading

aulemahal left a comment •

edited

Loading

bzah commented Apr 11, 2022 •

edited

Loading

bzah commented Apr 11, 2022 •

edited

Loading

aulemahal commented Apr 11, 2022 •

edited

Loading

bzah commented May 16, 2022 •

edited

Loading