Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple indicators #1358

Merged
merged 125 commits into from Jun 9, 2023
Merged

Conversation

ludwiglierhammer
Copy link

Hi guys,
this is my first PR regarding issue #1352.
It contains the implementation of the following indices:

  • snowfall_frequency
  • snowfall_intensity
  • mean_daily_windspeed named sfcWind_mean according to tg_mean
  • maximum_daily_maximum_wind_speed named sfcWindmax_max according to tx_max
  • mean_precipitation named precip_average according to precip_accumulation
  • solid_precip_average and liquid_precip_average

I also implemented a conversion from a snowfall flux (prsn) in kg/m**2*s to a snowfall flux in mm/day. I use this conversion to calculate snowfall_frequency and snowfall_intensity. In additon you you can find this conversion in first_snowfall, last_snowfall and days_with_snow. This allows to give thresholds in both kg/m**2*s and mm/day.

Unfortunately, I can't test the new code snippets in my local repository. I get this error message:

fixture 'xdoctest_namespace' not found

Can you help me with this issue?

I'll add some more tests and adjust CHANGES.rst

Cheers, Ludwig

docs/references.bib Outdated Show resolved Hide resolved
@coxipi
Copy link
Contributor

coxipi commented May 2, 2023

Hi Ludwig, thanks for the nice work.

I'm not sure what can cause the fixture problem, I cloned your PR and don't see a problem.

There has been some reorganization, the tests in xclim are not where they used to be. Maybe that's the root of the problem? To test a new feature, I activate xclim with conda activate xclim then update with

conda env update xclim && pip install ".[dev]"

Maybe this way of proceeding would solve your fixtures problem?

xclim/indices/_threshold.py Outdated Show resolved Hide resolved
xclim/indices/_conversion.py Outdated Show resolved Hide resolved
Comment on lines 1374 to 1376
if unit_thresh._units == unit._units:
prsn = prsn_to_mm_per_day(prsn, snr=snr, const=const)
thresh = convert_units_to(thresh, prsn, context="hydro")
Copy link
Contributor

@coxipi coxipi May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar problem here with context="hydro" and having prsn with units mm day-1.

If we have prsn [mm day-1] and thresh [kg m-2 day-1], then thresh = convert_units_to(thresh, prsn, context="hydro") will do a conversion that is not consistent with the choice of const=312 kg m-3. That is, prsn_to_mm_per_day uses const as a conversion factor, whereas convert_units_to(... context="hydro") automatically uses the conversion factor 1000 kg m-3.

On the other hand, having both prsn [mm day-1] and thresh [mm day-1] means that we would use prsn [mm day-1] as an input in prsn_to_mm_per_day, so this is a concrete occurence of the problem outlined in my comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could

  1. Restrict the use of the indicator first_snowfall to prsn with [mass]/[area]/[time] units. (also, in this case, `context="hydro" would not be necessary).
  2. Allow prsn with [length]/[time] units. We should then figure out what we do when we have prsn [mm day-1] and thresh [kg m-2 day-1]. We could convert prsn with a similar function prsn_to_kg_per_m2_per_day (with some better name)

Maybe taking thresh with units [kg m-2 day-1], so maybe it just complicates things for nothing. Or maybe it's really unusual to have prsn with [mm day-1] to start with, and sticking with option 1) would make sense.

Anyways, I think we need to restrict use cases (1) or make sure every case is covered correctly (2).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented some if statements to convert thresh, low, high and prsn if one of them has units [length] / [time]. What do you think about this approach? Maybe we can add a function str_to_DataArray to xclim.core.units to trim the code.

xclim/indices/_threshold.py Outdated Show resolved Hide resolved
@ludwiglierhammer
Copy link
Author

Hi Ludwig, thanks for the nice work.

I'm not sure what can cause the fixture problem, I cloned your PR and don't see a problem.

There has been some reorganization, the tests in xclim are not where they used to be. Maybe that's the root of the problem? To test a new feature, I activate xclim with conda activate xclim then update with

conda env update xclim && pip install ".[dev]"

Maybe this way of proceeding would solve your fixtures problem?

Many thanks!

pip install ".[dev]"

solved my problem.

@coxipi
Copy link
Contributor

coxipi commented May 3, 2023

In #1359, we discussed the maximum_consecutive_*_days is just a special case of *_spell_max_length.

The same can be said of sfcWindmax_max and sfcWind_mean (although the generalizing function is a bit more abstract here). The function xclim.indices.generic.statistics covers those cases. For instance, the indicator of sfcWindmax_max could be written as:

sfcWindmax_max = Wind(
...
    # compute=indices.sfcWindmax_max,
    compute=indices.generic.statistics,
    parameters={"reducer": "max"},
)

Anyways, it's nothing specific to these implementations, there are many indices in XClim such as tx_max that could also be replaced by a suitable use of indices.generic.statistics. It should be addressed in the future.

@Zeitsperre , what is your take on this (and the similar discussion in #1359): Should we indeed reduce the amount of needed indices to a minimum like this or keep more specific indices? Also, we could have a generic function that cover indices like XYZ_days, but maybe too much abstraction is not what we aim, I'm not sure.

@github-actions github-actions bot added docs Improvements to documenation indicators Climate indices and indicators labels May 3, 2023
Comment on lines 1325 to 1332
def first_snowfall(
prsn: xarray.DataArray,
thresh: Quantified = "0.5 mm/day",
snr: xarray.DataArray | None = None,
const: Quantified = "312 kg m-3",
freq: str = "AS-JUL",
) -> xarray.DataArray:
r"""First day with solid precipitation above a threshold.
Copy link
Contributor

@coxipi coxipi May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zeitsperre Could you give your insight on this. I realize that this index (and the next one) differs from how pr indices usually work, and I want to know if we want to allow this extra layer of complexity.

Indices with pr usually simply keep the units of pr and simply adapt the threshold if needed. This is OK since the conversion is always uniform, same density 1000 kg m-3 everywhere regardless. The case of prsn with possibly space-dependent density (prsn) is more complex.

Should we:

  1. Allow the conversion of prsn inside the index as things are now?
  2. Only let the index convert thresh with a constant conversion factor if needed as in pr indices. In this case, one would need to convert prsn if [length]/[time] units are desired.

Maybe the confusion also comes from the fact that [precipitation] can mean both [mass]/[area]/[time] and [length]/[time]. In case of actual pr, this is not so ambiguous, since the conversion factor is uniform. But for prsn, things are more complex. We were discussing how prsn should be reserved for [mass]/[area]/[time], and there should be another variable name when units are [length]/[time]. In this case, is the [precipitation] unit type too permissive to work with prsn (and its [length]/[time] equivalent)?

(Sorry for the back and forth on this Ludwig. Regardless of the final form of the index, the modifications you made have already been very useful, I used the ideas in the generic flux_and_rate_converter)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how quickly this case is becoming complicated; I think we need to agree on some approaches for handling precipitation. The CF Conventions are pretty specific on what counts as Precipitation Flux (pr; [mass]/[area]/[time]) and Precipitation Amount (no official acronym; [mass]/[area]).

I don't like the idea of hiding conversions within an indice, especially if this conversion step might be used in multiple indices. We don't currently log these conversion steps into the metadata history either, making matters worse. I do think we should be able to support amounts and fluxes within an indicator/index, but how we handle the conversion needs to be clearly communicated (via warnings?) to the user.

The current behaviour to only use a rule-of-thumb liquid water equivalent conversion only if snr is not provided feels like a good approach, so I wouldn't change that.

The more I think about it, the more I agree that we need to define two variables for snow amount and flux. Or at the very least have some code to handle the case of prsn as an amount.

@Zeitsperre
Copy link
Collaborator

@Zeitsperre , what is your take on this (and the similar discussion in #1359): Should we indeed reduce the amount of needed indices to a minimum like this or keep more specific indices? Also, we could have a generic function that cover indices like XYZ_days, but maybe too much abstraction is not what we aim, I'm not sure.

There are definitely a few key indices for temperature that we opted to not remove when we were doing our last major refactor/simplification, simply because they're always needed (e.g. t{n|g|x}_{min|mean|max}). One reason to keep these around is that indicators can intelligently select the proper variable from a dataset if needed. @aulemahal showed a small application of this off yesterday.

Striking a balance between what is popular and what reduces redundancy is hard, but we should opt for more succinct indices that can be built from (generics) and more offerings of customized indicators.

@coxipi
Copy link
Contributor

coxipi commented May 4, 2023

surface snow vs. snow_precips equivalences

@ludwiglierhammer I would say: snw is to snd what prsn is to pr_solid, do you agree?

xclim name snw (kg m-2) swe (m) snd (m)
cfname surface_snow_amount lwe_thickness_of_surface_snow_amount surface_snow_thickness
xclim name prsn (kg m-2 s-1) lwe_prsn (tentative) (m s-1) pr_solid (tentative) (m s-1)
cfname snowfall_flux lwe_snowfall_rate ???

Do you know if pr_solid has cf_name?

snowfall instead of surface snow

Maybe things are clearer if we remove the "surface" distinction for the snow variables? A snowfall amount here is just some "snow_precipitation rate" integrated over time, isn't it? I find other cf_names in this case:

Conversion with Water density Conversion with Snow density
snow vars lwe_thickness_of_snowfall_amount thickness_of_snowfall_amount
snow_precip vars lwe_snowfall_rate snowfall_rate (tentative, didn't find this in CF lists)

snowfall_rate is not defined in CF_lists, but that is the name I would deduce from other entries in the table above.

I would say that the time integral of pr_solid should indeed yield thickness_of_snowfall_amount defined as:

thickness_of_snowfall_amount: "Amount" means mass per unit area. The construction thickness_of_[X_]snowfall_amount means the accumulated "depth" of snow which fell i.e. the thickness of the layer of snow at its own density. There are corresponding standard names for liquid water equivalent (lwe) thickness.

xclim name

Looking for another name than pr_solid: Would prsnd be an acceptable variable name @Zeitsperre ? My logic is that prsn is the precipitation directly related to snw (the time integration of prsn yield the increase in snw). We could have namedprsn as prsnw instead? Logically, the precipitation analog of snd would be prsnd?

Additional thoughts

The index corresponding to first_snowfall in cbcl_climate_2020 seems to use thickness_of_snowfall_amount (m) rather than snowfall_rate (m s-1), no? I'm saying this because the threshold reported is in (cm)

The timing of first snowfall (defined as the first date in the fall when snowfall ≥1 cm)

But they do define density of falling snow (100 kg m-3) which differs from their density of snow on ground (250 kg m-3), so there might be something I'm not understanding. You probably need "density of falling snow" to obtain snowfall_rate and "density of snow on ground" to obtain thickness_of_snowfall_amount, correct?

@aulemahal
Copy link
Collaborator

If I may : snowfall_rate doesn't exist I think. A [mass]/([area]*[time]) is a flux in CF's vocabulary, thus prsn would be snowfall_flux.

Indeed, I think a snowfall_rate would be [length]/[time], the pr_solid that doesn't exist, your thickness_of_snowfall_rate (I think here "thickness of rate" is the part that doesn't make sense in CF).

@coxipi
Copy link
Contributor

coxipi commented May 4, 2023

If I may : snowfall_rate doesn't exist I think. A [mass]/([area]*[time]) is a flux in CF's vocabulary, thus prsn would be snowfall_flux.

Indeed, I think a snowfall_rate would be [length]/[time], the pr_solid that doesn't exist, your thickness_of_snowfall_rate (I think here "thickness of rate" is the part that doesn't make sense in CF).

Thanks, indeed, snowfall_flux is the right expression with units (kg m-2 s-1).

I think that thickness_of_snowfall_rate should have simply been snowfall_rate if I follow the logic of other entries in the second table.

(those two aspects are corrected above for clarity)

Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're about done. A few bugs need to be addressed before merging, though.

tests/test_indices.py Outdated Show resolved Hide resolved
Comment on lines 2495 to 2498
mmday2ms = 86400000
prsnd = prsnd_series(
(30 - abs(np.arange(366) - 180)) / mmday2ms, start="2000-01-01"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it defeats the point of the tests, but wouldn't we want to use pint here instead ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I cannot follow your question. Can you give an example how you would do the test? We test whether we get the same results for both prsn and prsnd input.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, what I was wondering was if we should use the existing units-conversion tools to perform these calculations to set up our time series, i.e. create an array of values in mm/day then use xclim.core.units.convert_units_to to convert these values to m/s.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense to me. I adjusted the tests with xclim.core.units.convert_units_to and added some more tests for the snowfall indices:

  • test with prsnd [mm day-1]
  • test with prsnd [m s-1]
  • test with prsn [kg m-2 s-1]

All those test yield the same results.

In test_indices.py there is this K2C. We could delete it and call the temperature series with:

da = tas_series(a, units="C")

Then we were more independent of hard-coded conversion things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! I know in some cases we are testing to see whether the convert_units_to is working properly, but for examples testing indices specifically, we should migrate, absolutely. Not necessary in this PR!

tests/test_indices.py Outdated Show resolved Hide resolved
tests/test_indices.py Outdated Show resolved Hide resolved
@@ -251,12 +256,27 @@ class HrPrecip(Hourly):
description="{freq} total precipitation.",
abstract="Total accumulated precipitation. If the average daily temperature is given, the phase parameter can be "
"used to restrict the calculation to precipitation of only one phase (liquid or solid). Precipitation is "
"considered solid if the average daily temperature is below 0°C (and vice versa).",
"considered solid if the average daily temperature is below {thresh} (and vice versa).",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"considered solid if the average daily temperature is below {thresh} (and vice versa).",
"considered solid if the average daily temperature is below 0°C (and vice versa).",

The convention we landed on for these templates is to only have fillable fields in the description and long_name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added "a given threshold" instead of "0°C" (see other threshold indicators) and added {phase} and {thresh} to long_name and description. In indices/_multivariate.py, should we set default phase to "solid and liquid". This would yield to the default long_name "Total accumulated solid and liquid precipitation". I think we should mention this phase and threshold thing in the metadata (long_name, description) too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely set the defaults at the indicator or use the defaults that are inherited from indices. Generally, the description and long_name should have information within it that more specifically details the way that this indicator was called (containing fillable fields, like {thresh}), while the abstract would be what the user expects to see when reading the docstring for the function (before having called it).

I think we should explain what is returned when calling it without having modified the call signature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the knot in my brain.

The indicators xclim.indicators.atmos.precip_{accumulation|average} are explicitly defined as indicators respecting both liquid and solid precipitation. If one will compute an indicator for either liquid or solid precipitation one can call xclim.indicators.atmos.{liquid|solid}_precip_{accumulation|average}.

Thus, we don't need this liquid-solid differentiation in xclim.indicators.atmos.precip_{accumulation|average} since we can find it in both long_name and description of xclim.indicators.atmos.{liquid|solid}_precip_{accumulation|average}.

Isn't it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, precisely! Indicators make it simpler for users to return a specific variable with a different default call signature from the indice, which includes all CF-checks and without needing to change any settings.

There are a lot of layers to consider, but the way I envision it is:

  • indices.generic: Building blocks for creating climate indexes (exposed but no checks for units, nothing specifically tied to a variable/unit).
  • indices.*: Climate indexes, includes checks for proper units on both input and output datasets, lots of user control and granularity in settings (operators, thresholds, methods, etc.)
  • indicators.*: Most user-friendly way of calling indices.* with all checks integrated. This is where very specific indexes can be defined with fixed default values and lots of finely-tuned metadata.

xclim/indicators/atmos/_precip.py Outdated Show resolved Hide resolved
xclim/indicators/atmos/_precip.py Outdated Show resolved Hide resolved
xclim/indicators/atmos/_precip.py Outdated Show resolved Hide resolved
xclim/indices/_conversion.py Outdated Show resolved Hide resolved
xclim/indices/_conversion.py Show resolved Hide resolved
Zeitsperre and others added 4 commits June 7, 2023 16:51
Co-authored-by: Éric Dupuis <71575674+coxipi@users.noreply.github.com>
Co-authored-by: Trevor James Smith <10819524+Zeitsperre@users.noreply.github.com>
@ludwiglierhammer
Copy link
Author

ludwiglierhammer commented Jun 8, 2023

@ludwiglierhammer @coxipi This PR is looking very good!

I can explain why some of the doctests are failing; The examples are running pieces of code against actual objects, so if you refer to path_to_sfcWind_file and path_to_sfcWindmax_file, they need to be defined within the doctest setup here:

def add_example_file_paths(cache_dir: Path) -> dict[str]:

The file ERA5/daily_surface_cancities_1990-1993.nc contains a few variables that could be used: sfcWind and wsgsmax (standard name: "wind_speed_of_gust").

If you'd like to rename wsgsmax to sfcWindmax (not necessary), this can be done by modifying the file and opening a pull request here: https://github.com/Ouranosinc/xclim-testdata

@Zeitsperre: Thanks for your explanation. I'll try this PR (Ouranosinc/xclim-testdata#25). If this is done are there any further parts we should adjust in xclim?

@github-actions github-actions bot added the CI Automation and Contiunous Integration label Jun 8, 2023
@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Jun 8, 2023

Feel free to ignore the failures from Verify Testing Data; This is simply a workflow to ensure that the modified testdata tag in our CI matches the latest tag (it does).

The reason it is failing is that this PR is coming from a forked branch (nothing can really be done about that).

@Zeitsperre
Copy link
Collaborator

@Zeitsperre: Thanks for your explanation. I'll try this PR (Ouranosinc/xclim-testdata#25). If this is done are there any further parts we should adjust in xclim?

No, I think we're in the home stretch! I'm waiting on a review of #1388 to fix the final CI-related failures, and I think that this PR will follow right afterwards. Thanks so much once again for all your time and effort!

@github-actions github-actions bot removed the CI Automation and Contiunous Integration label Jun 9, 2023
@coxipi coxipi merged commit 6d76ebf into Ouranosinc:master Jun 9, 2023
19 checks passed
@coxipi
Copy link
Contributor

coxipi commented Jun 9, 2023

Thanks for the nice work @ludwiglierhammer @Zeitsperre

coxipi added a commit to ludwiglierhammer/xclim that referenced this pull request Jun 13, 2023
@ludwiglierhammer ludwiglierhammer deleted the simple_indicators branch June 14, 2023 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests docs Improvements to documenation indicators Climate indices and indicators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants