New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple indicators #1358
Simple indicators #1358
Conversation
Hi Ludwig, thanks for the nice work. I'm not sure what can cause the fixture problem, I cloned your PR and don't see a problem. There has been some reorganization, the tests in xclim are not where they used to be. Maybe that's the root of the problem? To test a new feature, I activate xclim with
Maybe this way of proceeding would solve your fixtures problem? |
xclim/indices/_threshold.py
Outdated
if unit_thresh._units == unit._units: | ||
prsn = prsn_to_mm_per_day(prsn, snr=snr, const=const) | ||
thresh = convert_units_to(thresh, prsn, context="hydro") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar problem here with context="hydro"
and having prsn
with units mm day-1
.
If we have prsn [mm day-1]
and thresh [kg m-2 day-1]
, then thresh = convert_units_to(thresh, prsn, context="hydro")
will do a conversion that is not consistent with the choice of const=312 kg m-3
. That is, prsn_to_mm_per_day
uses const
as a conversion factor, whereas convert_units_to(... context="hydro")
automatically uses the conversion factor 1000 kg m-3
.
On the other hand, having both prsn [mm day-1]
and thresh [mm day-1]
means that we would use prsn [mm day-1]
as an input in prsn_to_mm_per_day
, so this is a concrete occurence of the problem outlined in my comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could
- Restrict the use of the indicator
first_snowfall
toprsn
with [mass]/[area]/[time] units. (also, in this case, `context="hydro" would not be necessary). - Allow
prsn
with [length]/[time] units. We should then figure out what we do when we haveprsn [mm day-1]
andthresh [kg m-2 day-1]
. We could convertprsn
with a similar functionprsn_to_kg_per_m2_per_day
(with some better name)
Maybe taking thresh
with units [kg m-2 day-1]
, so maybe it just complicates things for nothing. Or maybe it's really unusual to have prsn
with [mm day-1]
to start with, and sticking with option 1) would make sense.
Anyways, I think we need to restrict use cases (1) or make sure every case is covered correctly (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented some if statements to convert thresh
, low
, high
and prsn
if one of them has units [length] / [time]
. What do you think about this approach? Maybe we can add a function str_to_DataArray
to xclim.core.units
to trim the code.
Many thanks!
solved my problem. |
In #1359, we discussed the The same can be said of sfcWindmax_max = Wind(
...
# compute=indices.sfcWindmax_max,
compute=indices.generic.statistics,
parameters={"reducer": "max"},
) Anyways, it's nothing specific to these implementations, there are many indices in XClim such as @Zeitsperre , what is your take on this (and the similar discussion in #1359): Should we indeed reduce the amount of needed indices to a minimum like this or keep more specific indices? Also, we could have a generic function that cover indices like |
xclim/indices/_threshold.py
Outdated
def first_snowfall( | ||
prsn: xarray.DataArray, | ||
thresh: Quantified = "0.5 mm/day", | ||
snr: xarray.DataArray | None = None, | ||
const: Quantified = "312 kg m-3", | ||
freq: str = "AS-JUL", | ||
) -> xarray.DataArray: | ||
r"""First day with solid precipitation above a threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zeitsperre Could you give your insight on this. I realize that this index (and the next one) differs from how pr
indices usually work, and I want to know if we want to allow this extra layer of complexity.
Indices with pr
usually simply keep the units of pr
and simply adapt the threshold if needed. This is OK since the conversion is always uniform, same density 1000 kg m-3 everywhere regardless. The case of prsn
with possibly space-dependent density (prsn
) is more complex.
Should we:
- Allow the conversion of
prsn
inside the index as things are now? - Only let the index convert
thresh
with a constant conversion factor if needed as inpr
indices. In this case, one would need to convertprsn
if [length]/[time] units are desired.
Maybe the confusion also comes from the fact that [precipitation] can mean both [mass]/[area]/[time] and [length]/[time]. In case of actual pr
, this is not so ambiguous, since the conversion factor is uniform. But for prsn
, things are more complex. We were discussing how prsn
should be reserved for [mass]/[area]/[time], and there should be another variable name when units are [length]/[time]. In this case, is the [precipitation] unit type too permissive to work with prsn
(and its [length]/[time] equivalent)?
(Sorry for the back and forth on this Ludwig. Regardless of the final form of the index, the modifications you made have already been very useful, I used the ideas in the generic flux_and_rate_converter
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how quickly this case is becoming complicated; I think we need to agree on some approaches for handling precipitation. The CF Conventions are pretty specific on what counts as Precipitation Flux (pr
; [mass]/[area]/[time]) and Precipitation Amount (no official acronym; [mass]/[area]).
I don't like the idea of hiding conversions within an indice, especially if this conversion step might be used in multiple indices. We don't currently log these conversion steps into the metadata history either, making matters worse. I do think we should be able to support amounts and fluxes within an indicator/index, but how we handle the conversion needs to be clearly communicated (via warnings?) to the user.
The current behaviour to only use a rule-of-thumb liquid water equivalent conversion only if snr
is not provided feels like a good approach, so I wouldn't change that.
The more I think about it, the more I agree that we need to define two variables for snow amount and flux. Or at the very least have some code to handle the case of prsn as an amount.
There are definitely a few key indices for temperature that we opted to not remove when we were doing our last major refactor/simplification, simply because they're always needed (e.g. Striking a balance between what is popular and what reduces redundancy is hard, but we should opt for more succinct indices that can be built from (generics) and more offerings of customized indicators. |
surface snow vs. snow_precips equivalences@ludwiglierhammer I would say:
Do you know if snowfall instead of surface snowMaybe things are clearer if we remove the "surface" distinction for the snow variables? A snowfall amount here is just some "snow_precipitation rate" integrated over time, isn't it? I find other cf_names in this case:
I would say that the time integral of
xclim nameLooking for another name than Additional thoughtsThe index corresponding to
But they do define density of falling snow (100 kg m-3) which differs from their density of snow on ground (250 kg m-3), so there might be something I'm not understanding. You probably need "density of falling snow" to obtain |
If I may : Indeed, I think a |
Thanks, indeed, I think that (those two aspects are corrected above for clarity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're about done. A few bugs need to be addressed before merging, though.
tests/test_indices.py
Outdated
mmday2ms = 86400000 | ||
prsnd = prsnd_series( | ||
(30 - abs(np.arange(366) - 180)) / mmday2ms, start="2000-01-01" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it defeats the point of the tests, but wouldn't we want to use pint here instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I cannot follow your question. Can you give an example how you would do the test? We test whether we get the same results for both prsn
and prsnd
input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, what I was wondering was if we should use the existing units-conversion tools to perform these calculations to set up our time series, i.e. create an array of values in mm/day then use xclim.core.units.convert_units_to
to convert these values to m/s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense to me. I adjusted the tests with xclim.core.units.convert_units_to
and added some more tests for the snowfall indices:
- test with prsnd [mm day-1]
- test with prsnd [m s-1]
- test with prsn [kg m-2 s-1]
All those test yield the same results.
In test_indices.py
there is this K2C
. We could delete it and call the temperature series with:
da = tas_series(a, units="C")
Then we were more independent of hard-coded conversion things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I know in some cases we are testing to see whether the convert_units_to
is working properly, but for examples testing indices specifically, we should migrate, absolutely. Not necessary in this PR!
xclim/indicators/atmos/_precip.py
Outdated
@@ -251,12 +256,27 @@ class HrPrecip(Hourly): | |||
description="{freq} total precipitation.", | |||
abstract="Total accumulated precipitation. If the average daily temperature is given, the phase parameter can be " | |||
"used to restrict the calculation to precipitation of only one phase (liquid or solid). Precipitation is " | |||
"considered solid if the average daily temperature is below 0°C (and vice versa).", | |||
"considered solid if the average daily temperature is below {thresh} (and vice versa).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"considered solid if the average daily temperature is below {thresh} (and vice versa).", | |
"considered solid if the average daily temperature is below 0°C (and vice versa).", |
The convention we landed on for these templates is to only have fillable fields in the description
and long_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added "a given threshold" instead of "0°C" (see other threshold indicators) and added {phase} and {thresh} to long_name
and description
. In indices/_multivariate.py
, should we set default phase
to "solid and liquid". This would yield to the default long_name
"Total accumulated solid and liquid precipitation". I think we should mention this phase and threshold thing in the metadata (long_name
, description
) too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely set the defaults at the indicator or use the defaults that are inherited from indices
. Generally, the description and long_name should have information within it that more specifically details the way that this indicator was called (containing fillable fields, like {thresh}
), while the abstract would be what the user expects to see when reading the docstring for the function (before having called it).
I think we should explain what is returned when calling it without having modified the call signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found the knot in my brain.
The indicators xclim.indicators.atmos.precip_{accumulation|average}
are explicitly defined as indicators respecting both liquid and solid precipitation. If one will compute an indicator for either liquid or solid precipitation one can call xclim.indicators.atmos.{liquid|solid}_precip_{accumulation|average}
.
Thus, we don't need this liquid-solid differentiation in xclim.indicators.atmos.precip_{accumulation|average}
since we can find it in both long_name
and description
of xclim.indicators.atmos.{liquid|solid}_precip_{accumulation|average}
.
Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, precisely! Indicators
make it simpler for users to return a specific variable with a different default call signature from the indice
, which includes all CF-checks and without needing to change any settings.
There are a lot of layers to consider, but the way I envision it is:
indices.generic
: Building blocks for creating climate indexes (exposed but no checks for units, nothing specifically tied to a variable/unit).indices.*
: Climate indexes, includes checks for proper units on both input and output datasets, lots of user control and granularity in settings (operators, thresholds, methods, etc.)indicators.*
: Most user-friendly way of callingindices.*
with all checks integrated. This is where very specific indexes can be defined with fixed default values and lots of finely-tuned metadata.
Co-authored-by: Éric Dupuis <71575674+coxipi@users.noreply.github.com>
Co-authored-by: Trevor James Smith <10819524+Zeitsperre@users.noreply.github.com>
@Zeitsperre: Thanks for your explanation. I'll try this PR (Ouranosinc/xclim-testdata#25). If this is done are there any further parts we should adjust in |
Feel free to ignore the failures from The reason it is failing is that this PR is coming from a forked branch (nothing can really be done about that). |
No, I think we're in the home stretch! I'm waiting on a review of #1388 to fix the final CI-related failures, and I think that this PR will follow right afterwards. Thanks so much once again for all your time and effort! |
Thanks for the nice work @ludwiglierhammer @Zeitsperre |
Hi guys,
this is my first PR regarding issue #1352.
It contains the implementation of the following indices:
snowfall_frequency
snowfall_intensity
mean_daily_windspeed
namedsfcWind_mean
according totg_mean
maximum_daily_maximum_wind_speed
namedsfcWindmax_max
according totx_max
mean_precipitation
namedprecip_average
according toprecip_accumulation
solid_precip_average
andliquid_precip_average
I also implemented a conversion from a snowfall flux (
prsn
) inkg/m**2*s
to a snowfall flux inmm/day
. I use this conversion to calculatesnowfall_frequency
andsnowfall_intensity
. In additon you you can find this conversion infirst_snowfall
,last_snowfall
anddays_with_snow
. This allows to give thresholds in bothkg/m**2*s
andmm/day
.Unfortunately, I can't test the new code snippets in my local repository. I get this error message:
Can you help me with this issue?
I'll add some more tests and adjust
CHANGES.rst
Cheers, Ludwig