Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial and relative unit declarations #1414

Merged
merged 26 commits into from Aug 31, 2023
Merged

Partial and relative unit declarations #1414

merged 26 commits into from Aug 31, 2023

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Jul 5, 2023

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • declare_units can accept relative declaration. For example thresh="<tas>" means that thresh should have the same dimensions as tas. More complex declarations are possible : thresh="<tas> / <pr>" or thresh="[length] / <tas> ".
  • It can also be only partially declared with partial=True. In that case, the passed dimensions are stored in the wrapper, but the check_units code is not injected.

Both additions are useful for "generic" indices. For example:

@declare_units(partial=True, threshold="<data>")
def spell_length(
    data: xr.DataArray, threshold: Quantified, reducer: str, freq: str, op: str
) 

Here, spell_length already "knows" that the threshold should have the same units (dimensions) as data, but we still haven't declare those. Different indicators can be created from the function, each one either explicitly re-wrapping with declare_units(data="...") or simply using Indicator's input argument to map data to a known variable.

Another change this brings is an additional dimensions field in variable.yml. I realized that Indicators using input to map generic inputs to precipitation variables were not "aware" that the "hydro" context could be used. This was because the mapping was passing the "canonical units" to declare_units. Units of "kg m-2 s-1" are not necessarily refering to liquid water... By passing a "[precipitation]" string, this allowed check_units and infer_context to properly detect that the variable could use the "hydro" context.

  • There is a small fix in gen_indicator_docstring for the case with no default. The bug was there before this PR.
  • The internals of Indicator.__new__ were changed a bit so that the "unit" info of the parameters is parsed at the end, after the potential final declare_units, when relative dimensions are resolved.

Does this PR introduce a breaking change?

I removed a check in declare_units, "Quantified" variables that have no declared units will raise errors even if they don't have a default. The previous check was there to allow "generic indices" to pass the test. But with the present changes, the idea would be to pass a relative dimension instead.

I'm guessing declare_units is not being used much outside of xclim, with even fewer cases being broken.

EDIT: Also removed the has_units arg of xclim.core.utils.infer_kind_from_parameter. The recent changes in declare_units and this PR remove the need for this check.

Other information:

This is a first step in my plan to reorganize all generic indices so that it becomes easier to construct Indicators directly from these. And eventually remove many similar indices.

The next step, I think, will be to sync xclim with clix-meta: implement the new "index functions" (as generic indices) and update cf.yml.

@github-actions github-actions bot added the indicators Climate indices and indicators label Jul 5, 2023
Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really neat idea. LGTM

xclim/core/units.py Outdated Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
xclim/data/variables.yml Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the approved Approved for additional tests label Jul 5, 2023
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm wondering about is whether we should create another function, e.g. declare_relative_units, instead of adding another layer of logic inside declare_units.

The fact that in this implementation declare_units can return different wrappers based on partial makes it hard to understand.

Another issue is that this requires data's dimension to be ultimately defined. In my mind, we'd have a mechanism to check that thresh has the same units as data, no matter what the dimensions of data are. I suggest we start by discussing this, and then the questions above might resolve themselves.

tests/test_units.py Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
xclim/core/units.py Show resolved Hide resolved
xclim/core/units.py Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
@huard
Copy link
Collaborator

huard commented Jul 18, 2023

See PRs #1419 and #1420 for code experimenting with a few ideas.

@Zeitsperre Zeitsperre added this to the v0.45.0 milestone Jul 25, 2023
@aulemahal
Copy link
Collaborator Author

@huard This new version implements your suggestion if splitting the two decorators: declare_relative_units and declare_units. Also change of wording: relative instead of partial.

It's missing some doc (WIP), but the idea is that "generic" indices can be wrapped by declare_relative_units. Calls will check that things like "thresholds" have compatible units, but no checks would be performed on the "reference" variable (i.e. da). A further declare_units decoration can be applied to have a "non-generic" index. Also, creating an indicator with the generic indice and a input mapping will have the same effect.

You can't stack those decorator though (you could in my previous implementation). It's either a single declare_units or declare_units(declare_relative_units(func)).

A minor question I asked myself : what do we do if a declare_units wraps a declare_relative_units where the former assigns units to an arg already declared in the latter ?
Ex:

@declare_relative_units(thresh='<da>')
def func(da, thresh):
    pass

func2 = declare_units(da="[temperature]", thresh="[temperature]")(func)

There's a double declaration here for thresh. The current code simply overrides with the new spec.

@aulemahal aulemahal requested a review from huard July 31, 2023 14:10
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

xclim/core/units.py Outdated Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
xclim/core/units.py Show resolved Hide resolved
xclim/core/units.py Outdated Show resolved Hide resolved
@coveralls
Copy link

Coverage Status

coverage: 90.667% (-0.01%) from 90.681% when pulling 3060ac9 on generic-indices into 0bcd891 on master.

@aulemahal aulemahal merged commit 0f71622 into master Aug 31, 2023
14 checks passed
@aulemahal aulemahal deleted the generic-indices branch August 31, 2023 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests indicators Climate indices and indicators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants