Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liberfa Python further simplification #3181

Closed
wants to merge 8 commits into from

Conversation

embray
Copy link
Member

@embray embray commented Dec 5, 2014

This builds on top of the PR #3176 by @jwoillez by removing the need to build Python functions from a text template entirely. Instead it dumps the information gathered by the cython_generator.py script into a JSON file (that should obviously be distributed/installed with the source code). The metadata in this file is then used to generated the Python functions in the erfa module.

If the JSON file seems like too much indirection another alternative would be to just dump this dict directly into a Python file, but I think this is pretty nice since it could in principle be used in other contexts (documentation comes to mind...).

A further enhancement that I want to make to this, if it seems worthwhile, is to make the erfa module into a special module subtype that would be loaded via an import hook, so that erfa functions can be generated on demand when accessed, rather than all at once when the module is imported.

One downside to using JSON is that the JSON file, while smaller than the generated Python module, is still fairly large. And generating a function from it still means either:

  1. Reading/parsing the entire JSON file every time a function is used for the first time.
  2. Keeping the data structure read from JSON in memory.

Another possibility would be to use a different storage format that is better for on-demand key lookups, but the options for that tend to be problematic, and there other advantages to having a plain-text format. At the end of the day we're still talking half a megabyte so it's not a huge deal.

The current PR still saves memory at the end of the day, because once the JSON file has been read and the functions generated, the JSON data can be freed from memory, so all that is kept in memory by the Python interpreter is the code objects themselves (and all the docstrings, which take up the majority of space). This is in contrast to having the code objects and the source code in memory, which the Python interpreter does by default.

Another possible enhancement would be to make the docstring for these functions into a property that reads the function's docstring in from the JSON database on demand, but doesn't otherwise load the docstrings.

All this aside, I should be clear that none of this impacts the performance of the functions themselves--we're just talking about import times and the like.

@embray
Copy link
Member Author

embray commented Dec 8, 2014

Note: The build for this was expected to fail since I deliberately left out the generated erfa.json file. It should, presumably, be autogenerated at build time, assuming we go that route.

@mdboom
Copy link
Contributor

mdboom commented Dec 8, 2014

It still tries to load the erfa.py.templ file, even though it's been removed:

Traceback (most recent call last):
  File "cython_generator.py", line 478, in <module>
    main(args.srcdir, args.output, args.template_loc)
  File "cython_generator.py", line 392, in main
    erfa_py_in  = env.get_template('erfa.py.templ')
  File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 791, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 765, in _load_template
    template = self.loader.load(self, name, globals)
  File "/usr/lib/python2.7/site-packages/jinja2/loaders.py", line 113, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/usr/lib/python2.7/site-packages/jinja2/loaders.py", line 178, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: erfa.py.templ

@embray
Copy link
Member Author

embray commented Dec 8, 2014

Oops. Fixed.

@mdboom
Copy link
Contributor

mdboom commented Dec 9, 2014

github doesn't seem to let me do line comments on erfa.py because the diff is too large... so... I also have no way of knowing if I'm commenting on @embray's work or @jwoillez' earlier work on which this is based.

In the loop that converts all of the inputs, the np.array call can fail if the inputs are the wrong type. It would be nice to catch and annotate the exception to include the argument name (or index), otherwise the user just gets "some object of many" can't be converted to an array (which is a real problem because some of the ERFA functions have dozens of arguments). (The C implementation in #3164 does this, and it proved really handy there).

In the shape comparison, below that:

 if arg.shape[-len(shape):] != shape:

this exception should also include the argument name. Including the func_name is probable redundant/irrelevant.

I don't know if it matters, so chalk this up to premature optimization, but it seems there are a number of values computed within erfa_func_wrapper that are effectively constants that could be computed outside of it -- perhaps by adding to the func_meta dictionary in make_erfa_func.

  • The conversion from strings to dtypes through the CTYPE_TO_DTYPE mapping
  • n_in = len([m for m in arg_meta if m['direction'] == 'in'])
  • op_flags = [['readonly']]*n_in + [['readwrite']]*(len(iter_args) - n_in) (len(iter_args) should always be len(arg_meta) by this point in the function

In make_erfa_functions it seems odd, though not wrong, for functions to be initialized inside the with statement.

@mdboom
Copy link
Contributor

mdboom commented Dec 9, 2014

I wanted to run my usual timing script on this (based on @jwoillez'), but there seems to be a regression here. The following works on master and #3176, but not this PR:

import numpy as np
import astropy.erfa as erfa

jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, 100)
x = np.zeros((1,), erfa.dt_eraASTROM)
out = erfa.aper(jd, x)

I get:

Traceback (most recent call last):
  File "erfa_timing.py", line 9, in <module>
    out = erfa.aper(jd, x)
  File "/home/mdboom/python/lib/python2.7/site-packages/astropy-1.0.dev10645-py2.7-linux-x86_64.egg/astropy/erfa/erfa.py", line 163, in aper
    name=func_name)
  File "/home/mdboom/python/lib/python2.7/site-packages/astropy-1.0.dev10645-py2.7-linux-x86_64.egg/astropy/erfa/erfa.py", line 155, in erfa_func
    return erfa_func_wrapper(func_name, func_meta, *args)
  File "/home/mdboom/python/lib/python2.7/site-packages/astropy-1.0.dev10645-py2.7-linux-x86_64.egg/astropy/erfa/erfa.py", line 245, in erfa_func_wrapper
    it = np.nditer(iter_args, op_axes=op_axes, op_flags=op_flags)
ValueError: output operand requires a reduction, but reduction is not enabled

Similar errors running the tests. Numpy 1.8.2.

@astrofrog
Copy link
Member

If time is too short to get all this working before the 1.0 feature freeze, could we at least merge @jwoillez's changes since they significantly reduce the size of the erfa.py file?

@embray
Copy link
Member Author

embray commented Dec 11, 2014

Should be easy to fix.

import numpy as np
import astropy.erfa as erfa

jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, 100)
x = np.zeros((1,), erfa.dt_eraASTROM)
out = erfa.aper(jd, x)

this should probably be added to the unit tests then. All the existing tests pass.

@embray
Copy link
Member Author

embray commented Dec 11, 2014

I don't know if it matters, so chalk this up to premature optimization, but it seems there are a number of values computed within erfa_func_wrapper that are effectively constants that could be computed outside of it

Indeed, in #3176 the n_in variable is pre-computed, though the op_flags one isn't. I deliberately did not precompute them just to keep the signature for erfa_function_wrapepr simple, and not require passing in additional values that can already be otherwise determined from the values that are passed in (namely the func_meta argument). Considering that these are loops over typically 3 to 10 element lists I think worrying about them is in the category of premature optimization.

I think if it did turn out to be an issue it could be easily optimized. But I don't think this would add any overhead except in cases where the Python wrapper is being called repeatedly in a loop. But in most cases that would defeat the purpose of the vectorized wrappers and could be considered "doing it wrong".

@astrofrog astrofrog added the erfa label Jan 13, 2015
jwoillez and others added 8 commits January 13, 2015 12:57
make_output_scalar -> make_outputs_scalar
…er than a Python module; the next update will include a replacement for the generated .py file. For now I am *not* including a copy of the generated erfa.json file, as it is more than half a MB, and I think we're still trying to suss out what to do with these generated files...
…py.templ entirely--instead this reads metadata about the ERFA functions in from erfa.json (generated by cython_generator.py which perhaps should be renamed) and creates the Python functions from that. A future enhancement could allow generating the functions on an as-needed basis rather than all at once at import time.
… partially restoring what was core.py.templ, but now it's a separate constants.py.templ just for the constants. The only reason to do this is so that their docstrings can (in principle) be picked up by Sphinx. However, I think it *may* be possible a the C level to give these constants docstrings as well, in which case this can be done in the Cython module (or C module if we switch to just using C directly). But I need to check about that.
@embray embray force-pushed the erfa-factorization branch 2 times, most recently from cc80e3c to 504f0af Compare January 13, 2015 23:48
@embray
Copy link
Member Author

embray commented Jan 13, 2015

Rebased this PR in master, along with some additional cleanup to erfa_generator.py to make it a bit easier to follow.

This also fixes the issue raised by @mdboom in this comment and incorporates #3137 for supporting ERFA constants. For now this does reintroduce a (much smaller) .py template since it was the easiest way I could think of to support docstrings for the constants, though as I mentioned in aea2a42 I think it might be possible to assign docstrings to module-level constants at the C level, though I'm not sure about that.

@mdboom
Copy link
Contributor

mdboom commented Jan 14, 2015

It would be really nice to milestone this as 1.0, don't you think, @embray?

iter_shape)
iter_args.append(arg)

n_in = len([m for m in arg_meta if m['direction'] == 'in'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed a bit indirect. Slightly faster and perhaps clearer:

n_in = [m['direction'] for m in arg_meta].count('in')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I don't think that's any more obvious. In fact it requires two loops where what's there only requires one.

@mdboom
Copy link
Contributor

mdboom commented Jan 14, 2015

FYI: This is about 4x slower than master for small input arrays, and still around 2x slower for large input arrays. (This is the same timing script from @jwoillez #3176).

c_vs_cython

Not that that is a reason to reject this approach -- it has some other advantages, of course.

@embray
Copy link
Member Author

embray commented Jan 14, 2015

I'm not sure I see where that timing script is coming from or how you're running it. That would be useful to have, but I don't see where it's mentioned.

I'm really surprised it would be that much slower though, especially for larger arrays. It's still within an order of magnitude so I'm not super worried, but I have to wonder where all that additional overhead is coming from. I also wonder if you would see it for, say, 10^5 elements or greater?

@embray
Copy link
Member Author

embray commented Jan 14, 2015

Another thing I want to look into is import time--is this actually faster to import than, say, the generated core.py. I think it will be--parsing the JSON file should be significantly faster than parsing/executing a large Python module. Even though right now it later builds all the functions I don't think that has significant overhead over the old way.

That said I also have ideas in mind for making large portions of this "lazy" which would make the import time negligible.

@mdboom
Copy link
Contributor

mdboom commented Jan 14, 2015

Sorry -- that timing script is a little buried. Here it is:

import timeit
import numpy as np
import pylab as plt
import astropy.erfa as erfa


jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, 100)
x = np.zeros((1,), erfa.dt_eraASTROM)
out = erfa.aper(jd, x)


class Test:
    def __init__(self, N):
        self.jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, N)

    def run(self):
        x = np.zeros((1,), erfa.dt_eraASTROM)
        out = erfa.aper(self.jd, x)

N = [1, 3, 10, 30, 100, 300, 1000]
T = []

for n in N:
    t = timeit.timeit(
        'test.run()',
        setup="from __main__ import Test; test = Test({0})".format(n), number=100)/100.0
    print("[%s, %s]," % (n, t))
    T.append(t)

fig, axarr = plt.subplots(1,1)
axarr.loglog(N,T)
axarr.loglog(N,T,'.b')
plt.show()

@embray
Copy link
Member Author

embray commented Jan 14, 2015

Welp, some initial tests suggest that, contrary to my expectation, this is actually significantly slower than importing the large Python module. Turns out the JSON parser is slower to read the JSON file than the Python parser is to read a Python module. This is surprising to me and makes the json module suspect...

@mdboom
Copy link
Contributor

mdboom commented Jan 14, 2015

Welp, some initial tests suggest that, contrary to my expectation, this is actually significantly slower than importing the large Python module.

Interesting. That surprises me as well.

@embray
Copy link
Member Author

embray commented Jan 14, 2015

Most of the overhead is in the make_function_with_signature calls. I think I can speed that up a bit (and possibly almost eliminate it on Python 3). The lazy loading idea would also make that a non-issue. But no matter what parsing the JSON file is slower than executing the entire Python module and that's the bit that surprises me.

@mhvk
Copy link
Contributor

mhvk commented Jan 14, 2015

I guess loading python modules has been highly optimized over time...

@embray
Copy link
Member Author

embray commented Jan 14, 2015

Actually, it turns out in my previous testing that I was giving the normal Python module way too many advantages :) Most importantly being the generation of the .pyc file. Once that's done there's virtually no overhead to loading the module--no parsing, etc.

So this code could have all the same advantages if the resulting functions are also saved out to a pyc, which should be doable (but also getting into the realm of maybe overcomplexity? Not sure...). Lazy loading of functions would also, of course, mitigate this.

That takes care of that question though. In fact it turns out that once I factor out things like imports at the top of the module my code does import 3 times faster than the large Python module. It just doesn't have the .pyc to take advantage of later on...

@mhvk
Copy link
Contributor

mhvk commented Jan 14, 2015

OK, so it was optimized by just doing the work once... I think in the end the run-time is really what matters, and it does seem odd that there is such a large difference. Any idea what the bottleneck is in the setup?

@embray
Copy link
Member Author

embray commented Jan 14, 2015

@mhvk I'm checking that out now. As I suspected for larger data sizes (1e5, 1e6, etc.) this code is marginally faster. Whereas for <= 10000 the version in master is faster (again only marginally though).

@embray
Copy link
Member Author

embray commented Jan 14, 2015

There are a few things I've been able to identify to reduce the run time for small values slightly. It's really tweaking around the margins though because we're talking about things like 9 ms vs. 16 ms. On a log scale the difference looks much less impressive. Most of the advantage the version in master has is the equivalent of loop unrolling, which is why it's marginally faster.

@mhvk
Copy link
Contributor

mhvk commented Jan 14, 2015

@embray - actually, that sounds relatively promising; it should be possible to mimic the setup of the cython loop more closely so that also for small numbers the speed remains roughly the same. I do think it is worth trying to reduce, as it is more than a factor 2 in the plot above.

More broadly, just so I get this straight, in the end (with lazy loading), the advantages of this approach over that currently in master would be that for those that do not use erfa, much less memory is used and startup is faster, correct? And for those that do use erfa via Time or Coordinates, at least only the memory needed for the relevant routines would be used.

@embray
Copy link
Member Author

embray commented Jan 14, 2015

That's the idea. It's just less unnecessary code floating around.

@pllim
Copy link
Member

pllim commented May 11, 2017

Too many conflicts and unlikely to be completed.

@pllim pllim closed this May 11, 2017
@embray
Copy link
Member Author

embray commented May 15, 2017

Too bad, this was going in a good direction, but I agree I don't know when I would ever finish it.

@mhvk
Copy link
Contributor

mhvk commented May 15, 2017

I'll go be a little annoying and re-open it; I think it is worth keeping the idea around for a while at least. If nothing happens by 3.0, this should be really closed...

@mhvk mhvk reopened this May 15, 2017
@mhvk mhvk added this to the v3.0.0 milestone May 15, 2017
@astrofrog
Copy link
Member

For cases like this where the PR will not be completed, I would prefer to close the PR and open an issue to keep track of the concept and include a link to the PR. We shouldn't keep PRs open just for reference for future otherwise we end up with the current build-up :)

@mhvk
Copy link
Contributor

mhvk commented May 15, 2017

OK, very good point. I've made the new issue (#6073), which refers to this one, which I'll now close.

@mhvk mhvk closed this May 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants