Clean-up of confusing meta data values #368

huard · 2020-02-14T15:28:12Z

Zeitsperre · 2020-08-05T14:48:46Z

@Ouranosinc/xclim-core We need to have a meeting about this specifically.

huard · 2020-09-01T12:45:38Z

@Zeitsperre Please propose a date, time and agenda.

Zeitsperre · 2020-09-01T20:45:28Z

@Ouranosinc/xclim-core
Proposed talking points (feel free to modify this message as needed):

What metadata do we presently affix into Indicators/indices?
How do we currently handle external metadata?
Which metadata standards and fields should we provide explicit support for?
- How should we handle exceptions to these standards/accepted fields?
What fields should we be adding?
- How can these fields support current project?

Zeitsperre · 2020-09-10T21:12:20Z

Some thoughts:

CF-standartd global attributes should not be found in variable attributes. This is an error/misleading. I would remove these, if/when they are found. This complicates how we write our history. From the summary write-up of CF-1.9 (proposed):

2.6.1 Identification of Conventions
Requirements:
> The Conventions attribute must be a single text string containing a list of convention names separated by blank space or commas, one of which shall be the full CF string as described below.
> Files that conform to the CF version 1.8 conventions must indicate this by setting the global Conventions attribute to contain the CF string value "CF-1.8".

2.6.2 Description of File Contents
Requirements:
> The title, history, institution, source, references, and comment attributes are all type string.
Recommendations:
> The title and history attributes are only defined as global or groups attributes. If they are used as per variable attributes a CF compliant application should treat them exactly as it would treat any other unrecognized attribute.
We should be following the most recent (stable) CF-Conventions guidelines. It's up to the users to ensure they adjust their practices to adhere to more modern standards.
Terms that fall outside the standard should be carried over as much as possible, excepting fields that would provide confusing information (creation_date is not a useful attribute when the indicator is newly created; we should be updating this).

Attributes handled within variables:

branch_time --> Carried
contact --> Carried
~~Conventions~~ --> Removed
creation_date --> Updated
experiment --> Carried
experiment_id --> Carried
forcing --> Carried
~~frequency~~ --> Removed
~~history~~ --> Becomes notes
initialization_method--> Carried
institute_id --> Carried
institution --> Carried
model_id --> Carried
modeling_realm --> Carried
parent_experiment_id --> Carried
parent_experiment_rip --> Carried
physics_version --> Carried
product --> Carried
project_id --> Carried
realization --> Carried
source --> Carried
table_id --> Carried
~~title~~ -- Removed
tracking_id --> Carried

Given that Indicators don't touch global attributes (AFAIK), should the onus be on users to ensure that they write out the proper scaffolding (History and Conventions`) when processing their files? The one case I can see where this might present problems is when it comes to using the CLI (Global Attributes are not carried over to the output file). Is addressing this opening a can of worms?

Feel free to chime in with ideas, opinions or potential "gotchas". I'm starting a PR to address some of these problems next week.

huard · 2020-09-11T14:46:45Z

Shouldn't' frequency reflect the output's frequency ?
Not a fan of "notes" for history, but I can live with it if there is a rationale for it. We already use notes for the math formula in the global attributes.

I note that history adds a <No available history>\n if the input file has no history. I don't think this is useful.

We only get the global attributes if we call the json method, but there is stuff in there that would not go in global attributes. I think we should come up with a clean way to convert a computation into a dataset that includes global attributes, as it would clarify some of the issues raised here.

aulemahal · 2020-09-11T16:07:16Z

To help the discussion, here are all indicator attributes and their translation.

Carried to variables (unique to each output):

standard_name
long_name (formatted)
units
cell_methods (merged with those from the inputs)
description (formatted)
comment
history (generated and merged with those from the inputs)

Not carried to variables (available through Ind.json, unique to the indicator):

title
abstract (no corresponding CF attr)
keywords (no corresponding CF attr)
references
notes (could be used as the global comment CF attr)

I would prefer keeping history in the variable attributes, unless we make the Indicator produce datasets instead of DataArrays. It could be useful to have a method that transforms an Indicator output to a dataset, that was the idea behind the dropped "dataset_output" option we tried when implementing the multi-output Indicators. Without this option, the process has to be in two parts with the user referencing the indicator twice, ex:

out = xc.atmos.Indicator(in1, in2, *params)  # Computes data and adds variable attributes
ds = xc.out_to_ds(out, xc.atmos.Indicator)  # Moves "history" and adds global attributes

About global attributes: I'm not sure the generic Indicator "title" really is fit as a global attribute of a computed dataset. Isn't it generally too general about the used parameters while being too specific about the indicator? Also, notes (-> comment) is usually long and has a docstring layout that might not be fit for a netCDF attribute? (rst and tex markup, multiple lines) That leaves, "references" (often absent) + moving history, does that merit a new function?

Zeitsperre · 2020-09-11T17:00:49Z

I would prefer keeping history in the variable attributes, unless we make the Indicator produce datasets instead of DataArrays.

As it stands, CF dictates that having history in a variable's attributes is non-standard. This is true even for reanalysis datasets that don't typically follow CF. If we had the option to produce Datasets as well as DataArrays, we

It could be useful to have a method that transforms an Indicator output to a dataset, that was the idea behind the dropped "dataset_output" option we tried when implementing the multi-output Indicators. Without this option, the process has to be in two parts with the user referencing the indicator twice, ex:
out = xc.atmos.Indicator(in1, in2, *params)  # Computes data and adds variable attributes
ds = xc.out_to_ds(out, xc.atmos.Indicator)  # Moves "history" and adds global attributes

I like where this proposal is going, but I can see the problem. It would be interesting to be able to send a Dataset to Indicators (e.g. xc.atmos.corn_heat_units(*, ds: Optional[xr.Dataset], da: Optional[xr.DataArray], tas="tas"... etc.). It breaks all function conventions we currently use, but we are currently breaking some CF conventions, so... worth considering for a serious breaking version (v1.0?).

Maybe we can extend xarray's to_netcdf methods to look for presence of an xclim Indicator?

About global attributes: I'm not sure the generic Indicator "title" really is fit as a global attribute of a computed dataset. Isn't it generally too general about the used parameters while being too specific about the indicator? Also, notes (-> comment) is usually long and has a docstring layout that might not be fit for a netCDF attribute? (rst and tex markup, multiple lines) That leaves, "references" (often absent) + moving history, does that merit a new function?

Titles are usually descriptive of the source data, e.g.

// global attributes:
		:title = "IPSL  model output prepared for IPCC Fourth Assessment SRES A2 experiment" ;

I think the history information is perfectly fine, but we need to be prefixing it to the global attributes' history, which can be quite long anyway (for some heavily corrected data, anyway). I do think we need to be able to touch/modify global attributes. If we have that capability, everything becomes a lot easier to standardize.

huard · 2020-09-11T18:15:29Z

Note that some of the fields are intended to feed into WPS process descriptions:

Title | Title of the process, input, and output. Normally available for display to a human. | ows:Title | One (mandatory)
Abstract | Brief narrative description of a process, input, and output. Normally available for display to a human. | ows:Abstract | Zero or one (optional) Include when available and useful.
Keywords | Keywords that characterize a process, its inputs, and outputs. | ows:Keywords | Zero or more (optional) Include when available and useful.
Identifier | Unambiguous identifier of a process, input, and output. | ows:Identifier Value is a URI or HTTP-URI a | One (mandatory)
Metadata | Reference to additional metadata about this item. | ows:Metadata Allowed values are specified in Table 5. | Zero or more (optional)

huard · 2020-09-11T18:16:59Z

We could simply have Indicator.to_dataset(da, thresh=...), which could run __call__ internally and then create the dataset.

aulemahal · 2020-09-18T15:23:39Z

As #559 is merged, moving the rest of this to a later milestone.

Zeitsperre self-assigned this Feb 14, 2020

Zeitsperre added enhancement New feature or request standards / conventions Suggestions on ways forward labels Feb 20, 2020

Zeitsperre modified the milestones: v0.15, v0.16 Feb 20, 2020

nilshempelmann mentioned this issue Mar 11, 2020

output filename conform to DRS filename convention bird-house/finch#69

Open

Zeitsperre modified the milestones: v0.16, v0.17 Apr 23, 2020

huard modified the milestones: v0.17, v0.18, v0.19 May 12, 2020

Zeitsperre modified the milestones: v0.19, v0.20 Aug 5, 2020

aulemahal mentioned this issue Sep 18, 2020

Rename history to xclim_history #559

Merged

6 tasks

aulemahal modified the milestones: v0.20, v0.22 Sep 18, 2020

huard removed this from the v0.22 milestone Oct 5, 2020

Zeitsperre added this to the v1.0 milestone Jan 4, 2021

huard closed this as completed Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean-up of confusing meta data values #368

Clean-up of confusing meta data values #368

huard commented Feb 14, 2020

Zeitsperre commented Aug 5, 2020

huard commented Sep 1, 2020

Zeitsperre commented Sep 1, 2020 •

edited

Loading

Zeitsperre commented Sep 10, 2020

huard commented Sep 11, 2020 •

edited

Loading

aulemahal commented Sep 11, 2020 •

edited

Loading

Zeitsperre commented Sep 11, 2020 •

edited

Loading

huard commented Sep 11, 2020

huard commented Sep 11, 2020

aulemahal commented Sep 18, 2020

Clean-up of confusing meta data values #368

Clean-up of confusing meta data values #368

Comments

huard commented Feb 14, 2020

Description

CMIP5 attributes

Zeitsperre commented Aug 5, 2020

huard commented Sep 1, 2020

Zeitsperre commented Sep 1, 2020 • edited Loading

Zeitsperre commented Sep 10, 2020

huard commented Sep 11, 2020 • edited Loading

aulemahal commented Sep 11, 2020 • edited Loading

Zeitsperre commented Sep 11, 2020 • edited Loading

huard commented Sep 11, 2020

huard commented Sep 11, 2020

aulemahal commented Sep 18, 2020

Zeitsperre commented Sep 1, 2020 •

edited

Loading

huard commented Sep 11, 2020 •

edited

Loading

aulemahal commented Sep 11, 2020 •

edited

Loading

Zeitsperre commented Sep 11, 2020 •

edited

Loading