Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish common data store conventions #330

Closed
3 tasks done
forman opened this issue Sep 1, 2020 · 9 comments · Fixed by #331
Closed
3 tasks done

Establish common data store conventions #330

forman opened this issue Sep 1, 2020 · 9 comments · Fixed by #331
Assignees
Labels
code Issue is related to code refactoring or optimisation in progress The assignee is working on it

Comments

@forman
Copy link
Member

forman commented Sep 1, 2020

Is your feature request related to a problem? Please describe.

The xcube data store framework should establish common conventions so we have a consistent behaviour across all store implementations. The current store implementations (SH, CCI, CDS) behave slightly different.

  1. Open parameters that constrain the dataset to be opened should all have the same names, have same or compatible type, and should be interpreted in the same way.
  2. Stores should share the same idea of what is a coordinate variable and what a data variable.

Describe the solution you'd like

1. Open parameters

For a constraining open parameter P we introduce the following convention:

  • If P is None means, parameter not given, hence no constraint applies, hence full containment.
  • If not P means, we exclude what would otherwise be fully included.
  • Else, the given constraint applies.

Common names are variable_names: List[str], bbox:Union[str,Tuple[float, float, float, float]], crs:str, spatial_res: float, time_range: Tuple[Optional[str], Optional[str]], time_period: str. A store may require all, or individual parameters, or none at all.

E.g. applied to variable_names, this means

  • variable_names is None - include all data variables
  • variable_names == [] - do not include data variables (schema only)
  • variable_names == ["<var_1>", "<var_2>", ...] only include data variables named "<var_1>", "<var_2>", ...

Describe this convention in docstrings.

EDIT

A store implementation should document its available open parameters

  1. programmatically in the DataStore.get_open_data_params_schema()method
  2. in the docstring of the implementing class. Each parameter description should include its name, Python type and description.

2. Coordinate vs Data Variables

  • The open parameter variable_names shall refer to data variables only.
  • A store implementation shall add any available coordinate variables at least for every dimension of the included data variables.

Describe this convention in docstrings.

Additional context

We'll create linked issues and PRs in the store code repos where we fix implementations wrt this issue.

EDIT

This issue resolves only with the following:

@forman forman added in progress The assignee is working on it code Issue is related to code refactoring or optimisation labels Sep 1, 2020
@forman forman self-assigned this Sep 1, 2020
@dzelge
Copy link
Contributor

dzelge commented Sep 1, 2020

Is it possible that a store requires open parameters that are not listed in this convention?

@forman
Copy link
Member Author

forman commented Sep 1, 2020

Is it possible that a store requires open parameters that are not listed in this convention?

Yes, of course. A store can have any open parameters. This convention is about a set of common parameters.

@pont-us
Copy link
Member

pont-us commented Sep 18, 2020

Regarding bbox:Union[str,Tuple[float, float, float, float]] and time_range: Tuple[Optional[str], Optional[str]]: I believe that currently all stores use Python lists rather than tuples here, and tuples won't validate against xcube's JsonArraySchema (however, I have a fix in https://github.com/dcs4cop/xcube/tree/pont-334-json-schema-date-limits to remedy this).

I agree that tuples are more suitable here, but we should be careful about not breaking anything with the change. Even if we merge my branch into xcube before updating the stores, we also need to confirm that the generator UI is OK with tuples.

@pont-us
Copy link
Member

pont-us commented Sep 18, 2020

Also -- in bbox:Union[str,Tuple[float, float, float, float]] , what is the envisaged format of str?

@forman
Copy link
Member Author

forman commented Sep 30, 2020

I believe that currently all stores use Python lists rather than tuples here

What do you mean by use? Examples?

with the change

Which change? You may confuse the JSON representation of things with the object instances they are deserialized into. A JSON array schema that uses the tuple-item form will usually deserialize a JSON-array into a Python tuple rather than a list.

To shorten discussion I'm also fine with Union[Tuple[float, float, float, float], Sequence[float]].

@forman
Copy link
Member Author

forman commented Sep 30, 2020

Also -- in bbox:Union[str,Tuple[float, float, float, float]] , what is the envisaged format of str?

WKT. But I agree, we should avoid it for time being.

@pont-us
Copy link
Member

pont-us commented Sep 30, 2020

What do you mean by use? Examples?

"use" in the sense of "only accept arrays as the bbox argument" (no longer true, but was when I wrote it). e.g. in xcube-sh: JsonArraySchema(items=(JsonNumberSchema(), JsonNumberSchema(), JsonNumberSchema(), JsonNumberSchema())). No longer a problem now that PR 336 is merged, but before that a tuple bbox argument would not have validated.

To shorten discussion I'm also fine with Union[Tuple[float, float, float, float], Sequence[float]].

I think that plain Tuple[float, float, float, float] is preferable now that tuples validate.

@forman
Copy link
Member Author

forman commented Sep 30, 2020

I think that plain Tuple[float, float, float, float] is preferable now that tuples validate.

No, that is not true. They only validate by JSON schema, that is, only when parameters come as JSON objects. Users can still call our methods without invoking JSON schema validation.

@forman
Copy link
Member Author

forman commented Sep 30, 2020

To be more pythonic, let's please use Union[Tuple[float, float, float, float], Sequence[float]].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Issue is related to code refactoring or optimisation in progress The assignee is working on it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants