-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
liberfa Python further simplification #3181
Conversation
Note: The build for this was expected to fail since I deliberately left out the generated erfa.json file. It should, presumably, be autogenerated at build time, assuming we go that route. |
It still tries to load the
|
Oops. Fixed. |
github doesn't seem to let me do line comments on In the loop that converts all of the inputs, the In the shape comparison, below that:
this exception should also include the argument name. Including the func_name is probable redundant/irrelevant. I don't know if it matters, so chalk this up to premature optimization, but it seems there are a number of values computed within
In |
I wanted to run my usual timing script on this (based on @jwoillez'), but there seems to be a regression here. The following works on
I get:
Similar errors running the tests. Numpy 1.8.2. |
If time is too short to get all this working before the 1.0 feature freeze, could we at least merge @jwoillez's changes since they significantly reduce the size of the |
Should be easy to fix.
this should probably be added to the unit tests then. All the existing tests pass. |
Indeed, in #3176 the I think if it did turn out to be an issue it could be easily optimized. But I don't think this would add any overhead except in cases where the Python wrapper is being called repeatedly in a loop. But in most cases that would defeat the purpose of the vectorized wrappers and could be considered "doing it wrong". |
make_output_scalar -> make_outputs_scalar
…er than a Python module; the next update will include a replacement for the generated .py file. For now I am *not* including a copy of the generated erfa.json file, as it is more than half a MB, and I think we're still trying to suss out what to do with these generated files...
…py.templ entirely--instead this reads metadata about the ERFA functions in from erfa.json (generated by cython_generator.py which perhaps should be renamed) and creates the Python functions from that. A future enhancement could allow generating the functions on an as-needed basis rather than all at once at import time.
… partially restoring what was core.py.templ, but now it's a separate constants.py.templ just for the constants. The only reason to do this is so that their docstrings can (in principle) be picked up by Sphinx. However, I think it *may* be possible a the C level to give these constants docstrings as well, in which case this can be done in the Cython module (or C module if we switch to just using C directly). But I need to check about that.
…comment) and fixes the regression
cc80e3c
to
504f0af
Compare
Rebased this PR in master, along with some additional cleanup to erfa_generator.py to make it a bit easier to follow. This also fixes the issue raised by @mdboom in this comment and incorporates #3137 for supporting ERFA constants. For now this does reintroduce a (much smaller) .py template since it was the easiest way I could think of to support docstrings for the constants, though as I mentioned in aea2a42 I think it might be possible to assign docstrings to module-level constants at the C level, though I'm not sure about that. |
It would be really nice to milestone this as 1.0, don't you think, @embray? |
iter_shape) | ||
iter_args.append(arg) | ||
|
||
n_in = len([m for m in arg_meta if m['direction'] == 'in']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seemed a bit indirect. Slightly faster and perhaps clearer:
n_in = [m['direction'] for m in arg_meta].count('in')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I don't think that's any more obvious. In fact it requires two loops where what's there only requires one.
I'm not sure I see where that timing script is coming from or how you're running it. That would be useful to have, but I don't see where it's mentioned. I'm really surprised it would be that much slower though, especially for larger arrays. It's still within an order of magnitude so I'm not super worried, but I have to wonder where all that additional overhead is coming from. I also wonder if you would see it for, say, 10^5 elements or greater? |
Another thing I want to look into is import time--is this actually faster to import than, say, the generated That said I also have ideas in mind for making large portions of this "lazy" which would make the import time negligible. |
Sorry -- that timing script is a little buried. Here it is:
|
Welp, some initial tests suggest that, contrary to my expectation, this is actually significantly slower than importing the large Python module. Turns out the JSON parser is slower to read the JSON file than the Python parser is to read a Python module. This is surprising to me and makes the |
Interesting. That surprises me as well. |
Most of the overhead is in the |
I guess loading python modules has been highly optimized over time... |
Actually, it turns out in my previous testing that I was giving the normal Python module way too many advantages :) Most importantly being the generation of the .pyc file. Once that's done there's virtually no overhead to loading the module--no parsing, etc. So this code could have all the same advantages if the resulting functions are also saved out to a pyc, which should be doable (but also getting into the realm of maybe overcomplexity? Not sure...). Lazy loading of functions would also, of course, mitigate this. That takes care of that question though. In fact it turns out that once I factor out things like imports at the top of the module my code does import 3 times faster than the large Python module. It just doesn't have the .pyc to take advantage of later on... |
OK, so it was optimized by just doing the work once... I think in the end the run-time is really what matters, and it does seem odd that there is such a large difference. Any idea what the bottleneck is in the setup? |
@mhvk I'm checking that out now. As I suspected for larger data sizes (1e5, 1e6, etc.) this code is marginally faster. Whereas for <= 10000 the version in master is faster (again only marginally though). |
There are a few things I've been able to identify to reduce the run time for small values slightly. It's really tweaking around the margins though because we're talking about things like 9 ms vs. 16 ms. On a log scale the difference looks much less impressive. Most of the advantage the version in master has is the equivalent of loop unrolling, which is why it's marginally faster. |
@embray - actually, that sounds relatively promising; it should be possible to mimic the setup of the cython loop more closely so that also for small numbers the speed remains roughly the same. I do think it is worth trying to reduce, as it is more than a factor 2 in the plot above. More broadly, just so I get this straight, in the end (with lazy loading), the advantages of this approach over that currently in master would be that for those that do not use erfa, much less memory is used and startup is faster, correct? And for those that do use erfa via Time or Coordinates, at least only the memory needed for the relevant routines would be used. |
That's the idea. It's just less unnecessary code floating around. |
Too many conflicts and unlikely to be completed. |
Too bad, this was going in a good direction, but I agree I don't know when I would ever finish it. |
I'll go be a little annoying and re-open it; I think it is worth keeping the idea around for a while at least. If nothing happens by 3.0, this should be really closed... |
For cases like this where the PR will not be completed, I would prefer to close the PR and open an issue to keep track of the concept and include a link to the PR. We shouldn't keep PRs open just for reference for future otherwise we end up with the current build-up :) |
OK, very good point. I've made the new issue (#6073), which refers to this one, which I'll now close. |
This builds on top of the PR #3176 by @jwoillez by removing the need to build Python functions from a text template entirely. Instead it dumps the information gathered by the
cython_generator.py
script into a JSON file (that should obviously be distributed/installed with the source code). The metadata in this file is then used to generated the Python functions in theerfa
module.If the JSON file seems like too much indirection another alternative would be to just dump this dict directly into a Python file, but I think this is pretty nice since it could in principle be used in other contexts (documentation comes to mind...).
A further enhancement that I want to make to this, if it seems worthwhile, is to make the
erfa
module into a special module subtype that would be loaded via an import hook, so that erfa functions can be generated on demand when accessed, rather than all at once when the module is imported.One downside to using JSON is that the JSON file, while smaller than the generated Python module, is still fairly large. And generating a function from it still means either:
Another possibility would be to use a different storage format that is better for on-demand key lookups, but the options for that tend to be problematic, and there other advantages to having a plain-text format. At the end of the day we're still talking half a megabyte so it's not a huge deal.
The current PR still saves memory at the end of the day, because once the JSON file has been read and the functions generated, the JSON data can be freed from memory, so all that is kept in memory by the Python interpreter is the code objects themselves (and all the docstrings, which take up the majority of space). This is in contrast to having the code objects and the source code in memory, which the Python interpreter does by default.
Another possible enhancement would be to make the docstring for these functions into a property that reads the function's docstring in from the JSON database on demand, but doesn't otherwise load the docstrings.
All this aside, I should be clear that none of this impacts the performance of the functions themselves--we're just talking about import times and the like.