New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
lru metadata manager factory #4227
lru metadata manager factory #4227
Conversation
Lovely stuff @bjlittle, speed ups look significant! |
Probably worth another quick set of eyes on this from someone who is more familiar with metadata than I am... @trexfeathers or @pp-mo? Tests are passing and the code looks sensible so I'm happy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a bit of a look at this function a while ago (following your comment on #4142) so possibly qualified to comment if you don't mind? I'm excited to see these changes anyway, it was a little convoluted.
The bit that stood out most was definitely that metadata_manager_factory
went to the trouble of defining a whole new class, just to return a single instance of it and forget about the class. Which I see is addressed by factoring out _factory_cache
.
Personally, I think I wouldn't have used lru_cache
at all, just created all managers upfront
for cls in (CubeMetadata, CoordMetadata, ...):
manager = _manager_class_factory(cls)
globals()[manager.__name__] = manager
But that's back to the old question of how much should happen at import time vs when actually needed so down to you. The cache is definitely nicer than assigning to globals
directly; that's a different tradeoff, for being able to use slightly more idiomatic code everywhere else, ie
self.metadata = iris.common.CubeMetadataManager(**kwargs)
(Assuming the error handling of kwargs
is also moved per inline comment)
@bsherratt There are several ways to skin this particular cat, so to speak... but I'm still attracted to on-demand creation of the metadata manager classes. In particular, not building them upfront has the advantage that there isn't a definitive list that goes stale when new metadata classes are created/removed/changed. I was trying to hit the sweet spot for less maintenance burden on developers and I think this is reasonably close (but I'm sure everyone has an opinion on that 馃槃) ... and this approach has already paid dividends recently when we plugged in new metadata classes for our forthcoming UGRID support i.e., it just worked, thank goodness 馃槄 To be honest, I'm not wedded to this implementation, and there are certainly further optimisations and improvements to be had for sure. As long as we continue to cycle back around and start to polish and optimise (as per this PR), then I think ultimately we'll tend towards a reasonably robust and hopefully optimal solution. |
4293187
to
7a410a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice, simple improvement to our metadata code with proven speed ups 馃憤
This should definitely go into Iris 3.1 so that our users can benefit from the optimisations.
By merging this in, we by no means close the door on this! We can reconsider #4241 (or indeed other alternatives) for Iris 3.2, but they will require more discussion and performance evaluations (via asv) so we compare them properly, and I don't want to hold up getting the optimisations to the user. We can always refine it later.
So I'm going to merge this in, and put #4241 under Iris 3.2.
馃殌 Pull Request
Description
This PR cycles back to the Common Metadata API and includes some small but significant optimisations to the
iris.common.metadata.metadata_manager_factory
function.The
metadata_manager_factory
function leverages the benefit of creating metadata manager classes dynamically at runtime, but this is a relatively expensive operation compared to creating equivalent but static classes.Employing a simple LRU caching strategy for the metadata manager classes created at runtime provides significant savings, along with careful avoidance of property lookups and type checking when creating instances of the dynamic classes.
The following
asv
metrics provide support for this proposed change:In particular, note the performance gains when creating common
iris
first-class objects, such as:iris.coords.AncillaryVariable
(significant)iris.coords.AuxCoord
(significant)iris.coords.DimCoord
(meh)iris.coords.CellMeasure
(significant)iris.cube.Cube
(significant)Consult Iris pull request check list