-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Improve percentile based indices' descriptions #1050
Conversation
Previously, their definitions were possibly incorrect in the case where the user would compute percentiles with custom parameters using `percentile_doy`. This concerns, the base percentile value, the length of rolling window of days and the period on which percentiles are computed. A limit of the current approach is that the percentiles **must** be computed using `percentile_doy` because some metadata are expected in `DataArray.attrs`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT (I realized this review was quite cold worded) : Thank you for this work. I have done some work on the Indicator class trying to generalize it so it's easier for users to understand how to use it. But I feel it has become quite complicated to extend., and the long-term vision is still unclear... So thank you for diving in.
Instead of default_params
and preformatted_attrs
, I think all the parsing could be done in Indicator.format
. This way, it would be easier to have more than one of those percentile thresholds. My best suggestion for now would be to return "N.A." for the cases where the correct attributes can't be parsed (replacing the default_params
), and maybe issuing a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a few things to mention. I trust @huard and @aulemahal on this. Be sure to mention these call signature changes as breaking changes in the History!
Just a note on bootstrapping (yeah I love this topic). This PR encourages users to use various percentile thresholds on bootstrappable indices. I think this should be addressed in another PR though. For ref:
|
- Updated metadata of many indices to benefit from the configurable per params - Added a long_name to wsdi - Added # noqa for bootstrap argument
I made an attempt of creating a new InputKind |
Indicators with multiple percentile DataArray as inputs were lacking metadata update. This also ensure the variables in metadata have a name corresponding to their parameter.
The last "unit test" failure is a bit mysterious to me. It seems I somehow changed how the output axes are ordered because, now |
To me too... I quickly checked and I can't see where a change could have modified the dimension order. This being said, we had similar issues before and the current opinion is that the dimension order is not guaranteed. This unit test was written before those discussions and it is kinda wrong to assume the order. |
This test was relying on ordered dimensions, which is discouraged when using xarray.
Now default values are used instead of raising an error.
This also includes the handling of non doy percentiles. Plus, some french translations were missing for DAYS_OVER_PRECIP_DOY_THRESH and FRACTION_OVER_PRECIP_DOY_THRESH
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
"abstract": "Nombre de jours d'une période où la précipitation est au-dessus d'un percentile quotidien et d'un seuil fixe.", | ||
"description": "Nombre {freq:m} de jours où la précipitation au-dessus d'un percentile quotidien. Seuls les jours avec au moins {thresh} sont comptés.", | ||
"long_name": "Nombre de jours pluvieux où la précipitation est au-dessus d'un percentile quotidien" | ||
"title": "Nombre de jours pluvieux où la précipitation est au-dessus du {pr_per_thresh}e percentile", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vous dites la précipitation au Québec ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, c'est drôle on vient d'avoir ce débat. Je ne pense pas que le choix venait d'une différence Québec-France. On se disait que "la précipitation" était synonyme de "précipitation totale", c'est à dire en cumulant toutes les formes de précipitation. On se disait que de dire "les précipitations" impliquait une distinction potentielle entre les différentes phase.
Cela étant dit, si tu penses qu'une version est meilleure que l'autre, si tu connais des références sur lesquelles se baser, nous sommes à l'écoute!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ça m'a juste fait bizarre en lisant la
, mais je ne pense pas que mon avis soit très pertient sur ce sujet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, je comprends. Mais j'insiste par contre, je pense qu'on aimerait savoir ce que d'autres groupes de recherche en pensent. Avez-vous des traductions (officielles ou non) pour les indices de icclim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Il y a un glossaire sur le portail DRIAS de Meteo France, mais il est un peu pauvre (rien sur la/les precip).
Par contre, je viens de tomber sur un glossaire de Méteo France
Sur précipitation ils disent notamment:
Une précipitation, en météorologie , est un ensemble organisé de particules d'eau liquide ou solide tombant en chute libre au sein de l' atmosphère .
Ce terme est souvent employé au pluriel, ce qui traduit la diversité des types de précipitation, dont les plus communs sont la pluie , la bruine , la neige , la grêle et le grésil ; on recense aussi dans les types de précipitation la neige en grains , la neige roulée , le poudrin de glace et les chutes de granules de glace et de prismes de glace .
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pour ce qui est des traductions d'indices, @pagecp tu sais si on a quelque chose ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A priori non, on n'a jamais travaillé sur la traduction des indices.
@Zeitsperre the coverage should be better now. I initially commented out |
…and_wsdi_description
Using __future__ should be addressed in a specific PR, after discussing it in an issue.
…and_wsdi_description
Also, rephrased them.
@aulemahal is there something else you would like to discuss on these changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry not to have followed the development of this branch!
Now that I re-read the changes, I'm not sure I understand the need for the need for the PercentileDataArray
class.
The annotation is useful so that the base Indicator
class knows how to format the param, how to inject the good attributes. But if percentile_doy
already sets all the needed attributes, why do we need a suboject entirely? The stuff in get_metadata
could be directly in Indicator.format
or in a standalone function in formatting.py
, no?
As I understand it, right now, the computation works if the user uses a normal DataArray, but all three metadata entries for the percentile variable are "unknown". Could we cast the input as a PercentileDataArray
somewhere in the indicator's pipeline to ensure all available informations are added?
Finally, I think it would useful to add the following line to all percentile parameter's docstrings:
Xclim expects an output from :py:func:`~xclim.core.calendar.percentile_doy`.
(or something in the like).
Simplified format function. Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>
I'm quoting you for why I added the
I'm not sure how else it would be possible to recognize percentiles inputs. On the other hand, I feel like all InputKind logic is some sort of typing over the python typing system. I don't think it's really necessary and IMHO it could be replaced byclasses and TypedDict(s). But, that's my java background speaking, if I don't see classes, POJOs and Singletons every now and then it gets itchy. |
- PercentileDataArray is no longer needed to retrieve the attributs used to fill output metadata (window, climatology_bounds...) of DataArray (such as `tasmin_per`). - Removed the corresponding InputKind and replace valid input by attributes and coordinate recognition. - Moved percentile_metadata formatter into formatting module.
997dff6
to
98c58d4
Compare
Alright, I went for a in-between implementation. The
In that case, we need to let the user fill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last version works for me!
Pull Request Checklist:
{cold, warm}_spell_duration_index
indicators description, the percentile threshold is not dynamically set. #1047number
) and pull request (:pull:number
) has been addedbumpversion patch
has been called on this branch.zenodo.json
What kind of change does this PR introduce?
This PR makes csdi and wsdi indicators' use the proper parameterization used in
percentile_doy
to format their description.This concerns, the base percentile value, the length of rolling window of days and the period on which percentiles are computed.
A limit of the current approach is that if percentiles are not computed using
percentile_doy
the description will use default, non configurable values.Does this PR introduce a breaking change?
Yes, the signatures of all indices and indicators having a DataArray parameter holding percentiles have been modified.
Before, the parameter name was assuming a percentile value such as in
t90
.Now, they are renamed into one of
tas_per
,tasmax_per
,tasmin_per
,pr_per
depending on the expected variable type.Their typing is also changed to
PercentileDataArray
which is now outputted bypercentile_doy
.Other information:
With
per = percentile_doy(da_per, window=2, per=[42, 27])
the description (reformatted for github) now looks like: