Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

lru metadata manager factory #4227

Merged
merged 10 commits into from Aug 9, 2021

Conversation

bjlittle
Copy link
Member

@bjlittle bjlittle commented Jul 6, 2021

馃殌 Pull Request

Description

This PR cycles back to the Common Metadata API and includes some small but significant optimisations to the iris.common.metadata.metadata_manager_factory function.

The metadata_manager_factory function leverages the benefit of creating metadata manager classes dynamically at runtime, but this is a relatively expensive operation compared to creating equivalent but static classes.

Employing a simple LRU caching strategy for the metadata manager classes created at runtime provides significant savings, along with careful avoidance of property lookups and type checking when creating instances of the dynamic classes.

The following asv metrics provide support for this proposed change:

Capture

In particular, note the performance gains when creating common iris first-class objects, such as:

  • iris.coords.AncillaryVariable (significant)
  • iris.coords.AuxCoord (significant)
  • iris.coords.DimCoord (meh)
  • iris.coords.CellMeasure (significant)
  • iris.cube.Cube (significant)

Consult Iris pull request check list

@bjlittle bjlittle marked this pull request as draft July 7, 2021 08:00
@jamesp
Copy link
Member

jamesp commented Jul 7, 2021

Lovely stuff @bjlittle, speed ups look significant!

@jamesp
Copy link
Member

jamesp commented Jul 7, 2021

Probably worth another quick set of eyes on this from someone who is more familiar with metadata than I am... @trexfeathers or @pp-mo?

Tests are passing and the code looks sensible so I'm happy.

Copy link
Contributor

@vsherratt vsherratt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of a look at this function a while ago (following your comment on #4142) so possibly qualified to comment if you don't mind? I'm excited to see these changes anyway, it was a little convoluted.

The bit that stood out most was definitely that metadata_manager_factory went to the trouble of defining a whole new class, just to return a single instance of it and forget about the class. Which I see is addressed by factoring out _factory_cache.

Personally, I think I wouldn't have used lru_cache at all, just created all managers upfront

for cls in (CubeMetadata, CoordMetadata, ...):
    manager = _manager_class_factory(cls)
    globals()[manager.__name__] = manager

But that's back to the old question of how much should happen at import time vs when actually needed so down to you. The cache is definitely nicer than assigning to globals directly; that's a different tradeoff, for being able to use slightly more idiomatic code everywhere else, ie

self.metadata = iris.common.CubeMetadataManager(**kwargs)

(Assuming the error handling of kwargs is also moved per inline comment)

lib/iris/common/metadata.py Show resolved Hide resolved
@bjlittle
Copy link
Member Author

bjlittle commented Jul 7, 2021

Here's the asv report for the latest commit, with additional metric coverage:

asv

Which still looks not too shabby 馃槃

@bjlittle bjlittle marked this pull request as ready for review July 7, 2021 11:15
@bjlittle
Copy link
Member Author

bjlittle commented Jul 7, 2021

Personally, I think I wouldn't have used lru_cache at all, just created all managers upfront

@bsherratt There are several ways to skin this particular cat, so to speak... but I'm still attracted to on-demand creation of the metadata manager classes. In particular, not building them upfront has the advantage that there isn't a definitive list that goes stale when new metadata classes are created/removed/changed. I was trying to hit the sweet spot for less maintenance burden on developers and I think this is reasonably close (but I'm sure everyone has an opinion on that 馃槃) ... and this approach has already paid dividends recently when we plugged in new metadata classes for our forthcoming UGRID support i.e., it just worked, thank goodness 馃槄

To be honest, I'm not wedded to this implementation, and there are certainly further optimisations and improvements to be had for sure. As long as we continue to cycle back around and start to polish and optimise (as per this PR), then I think ultimately we'll tend towards a reasonably robust and hopefully optimal solution.

@bjlittle bjlittle force-pushed the lru-metadata-manager-factory branch from 4293187 to 7a410a6 Compare July 17, 2021 10:56
@lbdreyer lbdreyer self-assigned this Aug 4, 2021
@lbdreyer lbdreyer added this to In Progress in Iris v3.1.0 via automation Aug 4, 2021
@lbdreyer lbdreyer moved this from In Progress to In Review in Iris v3.1.0 Aug 4, 2021
@bjlittle bjlittle added this to the v3.1.0 milestone Aug 4, 2021
Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice, simple improvement to our metadata code with proven speed ups 馃憤

This should definitely go into Iris 3.1 so that our users can benefit from the optimisations.

By merging this in, we by no means close the door on this! We can reconsider #4241 (or indeed other alternatives) for Iris 3.2, but they will require more discussion and performance evaluations (via asv) so we compare them properly, and I don't want to hold up getting the optimisations to the user. We can always refine it later.

So I'm going to merge this in, and put #4241 under Iris 3.2.

@lbdreyer lbdreyer merged commit 48712d9 into SciTools:main Aug 9, 2021
Iris v3.1.0 automation moved this from In Review to Done Aug 9, 2021
@bjlittle bjlittle deleted the lru-metadata-manager-factory branch September 8, 2021 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Iris v3.1.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants