Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common agreement on loading CF non-compliant NetCDF files #5165

Open
2 tasks
trexfeathers opened this issue Feb 20, 2023 · 1 comment
Open
2 tasks

Common agreement on loading CF non-compliant NetCDF files #5165

trexfeathers opened this issue Feb 20, 2023 · 1 comment
Labels
Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info Feature: NetCDF + CF-conventions

Comments

@trexfeathers
Copy link
Contributor

trexfeathers commented Feb 20, 2023

Iris needs a public statement on how it handles NetCDF files that deviate from the CF conventions. This will serve multiple benefits:

  • More certainty when discussing if/how Iris should load a particular file.
  • Clearer direction when developing the codebase.
  • Set user expectations.

Writing this statement will involve making some difficult decisions. A working group is tackling this now: @tkknight, @bjlittle, @lbdreyer, @pp-mo, @trexfeathers, @stephenworsley, @ESadek-MO, @scottrobinson02, @HGWright

Factors at play

  • More CF compliance means smoother collaboration between institutions, and Iris can play a part in raising awareness.
  • CF evolves over time, so may develop 'opinions' on things that previously didn't matter and invalidate older files.
  • The available tooling can make it difficult to address non-compliances in a file.
  • UX - being strict/verbose about CF compliance makes the user experience more awkward.
  • Iris has a place in the scientific Python community - people choose Iris / Xarray / raw netCDF4 / something else / for different purposes, and CF handling plays a part in that.
  • Continuing to work in the face of CF non-compliances could need more defensive code.

Items affected

(please edit if you know of others)

Tasks

  1. Dragon Sub-Task 🦎 Experience: Medium Feature: NetCDF + CF-conventions Status: Needs Info Type: Enhancement Type: Question
  2. Feature: ESMValTool Feature: NetCDF + CF-conventions Status: Blocked Type: Bug
@trexfeathers trexfeathers self-assigned this Feb 20, 2023
@trexfeathers trexfeathers added the Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info label Jul 10, 2023
@trexfeathers trexfeathers removed their assignment Jul 19, 2023
@trexfeathers
Copy link
Contributor Author

Summary from working group conversations

2023-02-02, 2023-02-14, 2023-03-22

Note this issue is not intended as a debate, hence why it is not posted as a discussion. The below conversations took place in real time, with a group deliberately sized to aid decision making.

Outcome - our ideal implementation

When loading NetCDF files, Iris will load all CF-compliant elements. A container of non-compliant variables and attributes will be attached to the Cube(s).

Encourage users:

If this causes you problems, please reach out to us to see if we can collaborate on a solution.

Implementation considerations

  • How to contain things that can't be represented properly?
  • Associate things with Cubes or isolated in own list?
  • Activate behaviour with a FUTURE flag?

Working group summary comments

  • @trexfeathers: embrace imperfection, skipping non-compliances sounds good if warnings work.
  • @stephenworsley: CF compliance is a good aim, but can't always be expected.
  • @pp-mo: CF offers optional ways of doing things, Iris ought to do its best, but not insist. Discourage 'bad CF'.
  • @bjlittle: KISS. Make users' lives simple, don't be awkward.
  • @lbdreyer: we'll always break someone's workflow. Need a plan to help those who are left behind.
  • @scottrobinson02: spirit of compromise. Accept that going in.
  • @tkknight: KISS. Informative messages when things don't work.
  • @HGWright: if we can do something we should do something. Don't throw toys from pram. Make our actions clear.
  • @ESadek-MO: no easy solution, communicate well, focus on warnings.

Discussion topics

Encouraging compliance in the community

  • We know examples where Iris' strictness has resulted in more compliant - more interoperable - files.
  • CF is a convention, not a standard.
  • CF is the only available convention and is therefore used for anyone looking for help making files interoperable.
  • Iris' scope is wider than CF, and Iris doesn't implement all of CF.
    Need to avoid inventing our own rules.
  • CF's longevity is relevant.

Files changing from acceptable to unacceptable

  • While CF is intended to be backwards compatible, checks (within Iris, cf-checker, whatever) are not a complete implementation and may evolve over time, invalidating previously acceptable files.

Ease of massaging files to be compliant

  • Always going to be somewhat difficult.
  • If Iris can't cope with non-CF, then users forced onto another tool.
    • Could edit the file directly using ncedit or NetCDF4, but this can be challenging, and editing a copy may be unrealistic.
    • All the rich tools (Iris, Xarray, cf-python) have their own opinions.
    • ncdata has the potential to make this much easier.
  • Should Iris include a non-CF layer, lower than a Cube, to help with fixing?

User experience (UX)

  • Cannot be underestimated.
  • Undesirable to flatly refuse to load.
  • Need clarity on what Iris expects.
  • Need user education.
  • Warnings are an opportunity to encourage compliance and help, without 'being awkward'
    • Really important to not ruin UX with even more warnings.
    • Classify warnings? Allowing users granularity for what the care about / ignore?
  • CF brings some inevitable complexity, some user effort required.
  • Compromises are necessary.

Iris' place in the world

  • Interoperability allows using other, more tolerant tools.
  • Learning/adopting other tools is nevertheless not as good as getting everything from one place.
  • We should aim to avoid duplication within the geoscience community.

Ease of software development

  • Defensive code takes extra effort.
  • Iris could be written to work with things it doesn't explicitly understand.
  • API changes could make things easier:
    • Interchange between Cube and _DimensionalMetadata.
    • Easier construction of Cubes from scratch.
  • Might be easier to include user-level fixing tools in Iris, rather than making Iris cope better.

Preferred approaches

Determined via voting.

  1. Iris only loads CF compliant parts of file, skipping non-compliant (maybe raises warning?).
  2. Iris allows the user to configure how it will interpret malformed file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info Feature: NetCDF + CF-conventions
Projects
Status: No status
Status: 📌 Prioritised
Status: No status
Development

No branches or pull requests

1 participant