Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FB] [PI-3478] Common metadata API #3583

Merged
merged 10 commits into from
Dec 19, 2019

Conversation

bjlittle
Copy link
Member

@bjlittle bjlittle commented Dec 5, 2019

This PR introduces a common metadata API for cubes, coordinates and auxiliary factories, and also addresses a lot of connective tissue technical debt. This PR underpins the forthcoming changes for cube arithmetic.

Essentially, this PR proposes a metadata interface, which provides a unified, consistent treatment of metadata within iris:

  • the _as_defn interface has been replaced with the metadata interface
  • metadata state now lives with the metadata behaviour within a MetadataManager class:
    • a factory function creates MetadataManager classes at runtime, see iris.common.metadata.MetadataManagerFactory
    • the factory function requires one of the following namedtuple metadata subclasses, which specifies the behaviour and state associated with the runtime MetadataManager:
      • iris.common.metadata.AncillaryVariableMetadata
      • iris.common.metadata.BaseMetadata
      • iris.common.metadata.CellMeasureMetadata
      • iris.common.metadata.CoordMetadata
  • the metadata interface has been added to:
    • iris.aux_factory.AuxCoordFactory
    • iris.coords._DimensionalMetadata
    • iris.coords.AncillaryVariable
    • iris.coords.CellMeasure
    • iris.coords.Coord
    • iris.cube.Cube
  • the CFVariableMixin methods have been rationalised
    • common member setter and getters now use the metadata interface
    • added a new metadata member setter and getter
      • metadata may be set either with:
        • an iterable
        • a dictionary
        • a relevant metadata namedtuple subclass, see iris.common.metadata
  • there is a new iris.common module, for common, non-core iris behaviour:
    • the _cube_coord_common has been rationalised here, see iris.common.mixin
    • the metadata has been rationalised here, see iris.common.metadata
  • the mixin name method behaviour is now appropriate and specific to its container i.e., the STASH attribute used by name is now only applicable to an iris.cube.Cube
  • the recently introduced mixin names property behaviour is now private and specific to an iris.cube.Cube
  • added new unit test coverage for LimitedAttributeDict, which has always been missing
  • For iris.aux_factory.AuxCoordFactory:
    • the units are no longer a read-only property, but part of the containers metadata
    • climatological is now a read-only property of False

Still to do:

  • Test coverage for new iris.common.metadata.MetadataFactory
  • Test coverage for new iris.common.metadata._BaseMeta
  • Test coverage for existing iris.common.mixin.LimitedAttributeDict

Meta-class to support the convenience of creating a namedtuple from
names/members of the metadata class hierarchy.

"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"attributes",
)

__slots__ = ()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (standard_name, long_name, var_name, stash_name)


def MetadataFactory(cls, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, this is a function, not a class...

match = self.values == other.values
return match

def __getstate__(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match = not match
return match

def __reduce__(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
return "{}({})".format(self.__class__.__name__, args)

def __setstate__(self, state):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace["_names"] = cls._names

# Dynamically create the class.
Metadata = type(name, bases, namespace)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abooton abooton changed the title Common metadata API PI-3478: Common metadata API Dec 6, 2019
Dynamically created classes at runtime cannot be pickled, due to not
being defined at the top level of a module. As a result, we require to
use the __reduce__ interface to allow 'pickle' to recreate this class
instance, and dump and load instance state successfully.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a big PR! But definitely an improvement to what we had before!

Overall I'm very much in support of the changes. Most of my comments relate to a few (mostly) minor points:

  • Naming
    When reviewing this, I did find I was getting confused between what was returned by the MetadataFactory (returned by cube._metadata) and the instances of, for example, CubeMetadata (returned by cube.metadata). I think to provide more clarity between these I would suggest we rename a few things. (see individual comments below)

  • Test names
    There was a bit of inconsistency with the naming of tests when the name of class/function being tests starts with an underscore.

  • The number of asserts per test
    A lot of the tests grouped multiple self.assert*. I would prefer these are split up such that each independent test (i.e. test_standard_name) tests a single scenario

  • I believe we are missing some test coverage for _BaseMeta.__lt__

lib/iris/common/metadata.py Show resolved Hide resolved
@property
def _names(self):
"""
A tuple containing the value of each name participating in the identify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? I would expect it to say identity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good spot 👀

@@ -4,15 +4,15 @@
# See COPYING and COPYING.LESSER in the root of the repository for full
# licensing details.
"""
Unit tests for the :func:`iris._cube_coord_common.get_valid_standard_name`.
Unit tests for the :func:`iris.common.mixin.get_valid_standard_name`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unit tests for the :func:`iris.common.mixin.get_valid_standard_name`.
Unit tests for the :func:`iris.common.mixin._get_valid_standard_name`.

And shouldn't this file be called .../common/mixin/test__get_valid_standard_name.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Yup. Originally it wasn't a private function. I ported it across into iris.common.mixin wrote the tests, then later changed it to be a private function. Forgot to also change the associated comment and test file name. Another good spot!

self.item = CFVariableMixin()
self.item._metadata = metadata

def test(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split this into two tests; one for renaming to air_temperature and one for renaming to nope nope

self.item = CFVariableMixin()
self.item._metadata = metadata

def test_standard_name(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split this up into three tests?
For readability it is nicer this way, but you are testing different scenarios here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also do this for the other tests in this test class?

_TOKEN_PARSE = re.compile(r"""^[a-zA-Z0-9][\w\.\+\-@]*$""")


class _BaseMeta(ABCMeta):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this name a bit confusing as we have both METAdata and METAclass so it's unclear what the Meta refers to.
But I don't have a suggestion just yet. I will keep thinking....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Yup, on reflection I do agree. It is confusing 😕

After a bit of thought (and mutual agreement) we're going with _NamedTupleMeta instead.

from iris.common.metadata import BaseMetadata


class Test(tests.IrisTest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to include tests for __lt__ on BaseMetadata

@@ -5,43 +5,20 @@
# licensing details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about the name of this module. mixindoesn't tell you much about what this module is supposed to contain. Unless you intend all mixin classes to live here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer My intent here was simple, as in, it's a namespace for common mixin classes to live.

It might just be the case that CFVariableMixin will be our only mixin class, but at least we have scope for collecting such classes together in one place, rather than having them sprinkled around the codebase. HTH 😉

raise ValueError(emsg.format(cls.__name__, bad))

# Define the name, (inheritance) bases and namespace of the dynamic class.
name = "Metadata"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this MetadataManager instead?

And perhaps call this function MetadataManagerFactory instead of MetadataFactory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Agreed. Let's do both for clarity.

lib/iris/cube.py Outdated
@@ -775,6 +747,9 @@ def __init__(
if isinstance(data, str):
raise TypeError("Invalid data type: {!r}.".format(data))

# Configure the metadata manager.
self._metadata = MetadataFactory(CubeMetadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call these self._metadata_manager?

Currently we have:
self.metadata - returns an instance of BaseMetadata which is a souped up namedtuple
self._metadata - returns the instance that MetadataFactory returns.
I find that a bit confusing.

Another good thing about self._metadata_manager is follows a similar naming pattern to self._data_manager

Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bjlittle !

I have now fully completed reviewing this PR so it's ready for you to get started on making changes

self._metadata.climatological = False
if not self.units.is_time_reference():
self._metadata.climatological = False
return self._metadata.climatological
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have come around to the idea of this.
We do a is_time_reference check in the setter, so we are being careful with what is being set as having climatological bounds.
There is a risk is if someone modifies an already existing climatological coordinate to something like a latitude coordinate, without updating the climatological property. In these cases, I would prefer we ensure we save things out correctly (i.e. pick up that this coordinate shouldn't be climatological). It would only be a problem if they were to access coord._metadata.climatological directly, but they shouldn't be doing that
.
So this just needs some testing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Thanks for taking the time to think about this one, much appreciated.

It was a bit cheeky of me to include this in this PR, but since I was in the space, I thought I'd try to address something that reeked of a potential issue for users.

I'll add some tests 😄

Co-Authored-By: lbdreyer <lbdreyer@users.noreply.github.com>
Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Happy with the changes. Now just to wait for travis then this can be merged!

@lbdreyer lbdreyer merged commit 569f6b6 into SciTools:cube-arithmetic Dec 19, 2019
@bjlittle
Copy link
Member Author

@lbdreyer and @alastair-gemmell Awesome, thanks! 🍻 🎉

@bjlittle bjlittle changed the title PI-3478: Common metadata API [FB] PI-3478: Common metadata API Jun 4, 2020
@bjlittle bjlittle changed the title [FB] PI-3478: Common metadata API [FB] [PI-3478] Common metadata API Jun 5, 2020
@bjlittle bjlittle mentioned this pull request Jun 15, 2020
26 tasks
pp-mo pushed a commit to pp-mo/iris that referenced this pull request Jun 23, 2020
* common metadata api

* rationalise _cube_coord_common into common

* move state into metadata

* MetadataFactory test coverage

* temporarily pin back iris-grib

* test coverage for iris.common.metadata._BaseMeta

* test coverage for iris.common.mixin.LimitedAttributeDict

* remove temporary iris-grib pin

* review actions

* Update lib/iris/tests/unit/common/metadata/test_BaseMetadata.py

Co-Authored-By: lbdreyer <lbdreyer@users.noreply.github.com>
bjlittle added a commit to bjlittle/iris that referenced this pull request Aug 14, 2020
* common metadata api

* rationalise _cube_coord_common into common

* move state into metadata

* MetadataFactory test coverage

* temporarily pin back iris-grib

* test coverage for iris.common.metadata._BaseMeta

* test coverage for iris.common.mixin.LimitedAttributeDict

* remove temporary iris-grib pin

* review actions

* Update lib/iris/tests/unit/common/metadata/test_BaseMetadata.py

Co-Authored-By: lbdreyer <lbdreyer@users.noreply.github.com>
abooton pushed a commit that referenced this pull request Aug 14, 2020
* PI-3478: Common metadata API (#3583)

* common metadata api

* rationalise _cube_coord_common into common

* move state into metadata

* MetadataFactory test coverage

* temporarily pin back iris-grib

* test coverage for iris.common.metadata._BaseMeta

* test coverage for iris.common.mixin.LimitedAttributeDict

* remove temporary iris-grib pin

* review actions

* Update lib/iris/tests/unit/common/metadata/test_BaseMetadata.py

Co-Authored-By: lbdreyer <lbdreyer@users.noreply.github.com>

* [FB] [PI-3478] Lenient metadata (#3739)

* add lenient infra-structure

* add metadata lenient __eq__ support

* complete __eq__, combine and difference support

* explicit inherited lenient_service + support equal convenience

* fix attributes difference + lenient kwargs

* make lenient public + minor tidy

* rename MetadataManagerFactory to metadata_manager_factory

* extend lenient_client decorator to support services registration

* add lenient test coverage

* purge qualname usage in metadata.py

* support global enable for lenient services

* support partial mapping metadata assignment

* purge Lenient.__setattr__ from api

* add BaseMetadata compare test coverage

* metadata rationalisation

* add BaseMetadata difference test coverage

* added context manager ephemeral comment clarification

* add BaseMetadata __ne__ test coverage

* standardise lenient decorator closure names

* add BaseMetadata equal test coverage

* half dunder context

* add AncillaryVariableMetadata test coverage

* add additional AncillaryVariableMetadata test coverage

* add CellMeasureMetadata test coverage

* Clarify lenient_service operation + simplify code.

* add CoordMetadata test coverage

* add CubeMetadata test coverage

* metadata tests use self.cls

* fix typo

* fix context manager ephemeral services

* add logging

* Pin pillow to make graphics tests work again. (#3630)

* Fixed tests since Numpy 1.18 deprecation of non-int num arguments for linspace. (#3655)

* Switched use of datetime.weekday() to datetime.dayofwk. (#3687)

* New image hashes for mpl 3x2 (#3682)

* New image hash for iris.test.test_plot.TestSymbols.test_cloud_cover with matplotlib 3.2.0.

* Further images changes for mpl3x2.

* Yet more updated image results.

* fix sentinel uniqueness test failure

* remove redundant cdm mapping test

* difference returns None for no difference

* protect Lenient and LENIENT private

* privitise lenient framework and add API veneer

* add explicit maths feature default

* review actions

* review actions

* trexfeathers review actions

* stephenworsley review actions

Co-authored-by: Patrick Peglar <patrick.peglar@metoffice.gov.uk>
Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

* [FB] [PI-3478] Lenient cube arithmetic (#3774)

* initial cube arithmetic

* support in-place cube resolve

* fix non in-place broadcasting

* remove temporary resolve scenario test

* lenient/strict support for attributes dicts with numpy arrays

* lenient/strict treatment of scalar coordinates

* strict points/bounds matching

* lenient/strict prepare local dim/aux/scalar coordinates

* support extended broadcasting

* always raise exception on points/bounds mismatch

* ignore scalar points/bounds mismatches, lenient only

* remove todos

* tidy logger debugs

* qualify src/tgt cube references in debug

* Numpy rounding fix (#3758)

ensure rounding is numpy like (maintains type)

* avoid unittest.mock.sentinel copy issue

* fast load np.int32

* fix cube maths doctest

* fix iris.common.resolve logging configuration

* fix prepare points/bounds + extra metadata cml

* support mapping reversal based on free dims

* var_name fix for lenient equality

* add support for DimCoordMetadata

* fix circular flag + support CoordMetadata and DimCoordMetadata exchange

* fix circular issue for concatenate DimCoord->AuxCoord demotion

* fix concatenate._CubeSignature sorted

* minor tweaks

* keep lenient_client private in maths

* tidy maths

* tidy iris.analysis.maths.IFunc

* refactor IFunc test

* polish in-place support

* tidy metadata_resolve

Co-authored-by: stephenworsley <49274989+stephenworsley@users.noreply.github.com>

* rebase master fix-up for cube arithmetic

* add missing new dependency to readthedocs.yml requirements

Co-authored-by: lbdreyer <lbdreyer@users.noreply.github.com>
Co-authored-by: Patrick Peglar <patrick.peglar@metoffice.gov.uk>
Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
Co-authored-by: stephenworsley <49274989+stephenworsley@users.noreply.github.com>
@bjlittle bjlittle deleted the common-metadata-api branch February 15, 2021 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants