liberfa Python further simplification #3181

embray · 2014-12-05T23:48:07Z

This builds on top of the PR #3176 by @jwoillez by removing the need to build Python functions from a text template entirely. Instead it dumps the information gathered by the cython_generator.py script into a JSON file (that should obviously be distributed/installed with the source code). The metadata in this file is then used to generated the Python functions in the erfa module.

If the JSON file seems like too much indirection another alternative would be to just dump this dict directly into a Python file, but I think this is pretty nice since it could in principle be used in other contexts (documentation comes to mind...).

A further enhancement that I want to make to this, if it seems worthwhile, is to make the erfa module into a special module subtype that would be loaded via an import hook, so that erfa functions can be generated on demand when accessed, rather than all at once when the module is imported.

One downside to using JSON is that the JSON file, while smaller than the generated Python module, is still fairly large. And generating a function from it still means either:

Reading/parsing the entire JSON file every time a function is used for the first time.
Keeping the data structure read from JSON in memory.

Another possibility would be to use a different storage format that is better for on-demand key lookups, but the options for that tend to be problematic, and there other advantages to having a plain-text format. At the end of the day we're still talking half a megabyte so it's not a huge deal.

The current PR still saves memory at the end of the day, because once the JSON file has been read and the functions generated, the JSON data can be freed from memory, so all that is kept in memory by the Python interpreter is the code objects themselves (and all the docstrings, which take up the majority of space). This is in contrast to having the code objects and the source code in memory, which the Python interpreter does by default.

Another possible enhancement would be to make the docstring for these functions into a property that reads the function's docstring in from the JSON database on demand, but doesn't otherwise load the docstrings.

All this aside, I should be clear that none of this impacts the performance of the functions themselves--we're just talking about import times and the like.

embray · 2014-12-08T15:22:41Z

Note: The build for this was expected to fail since I deliberately left out the generated erfa.json file. It should, presumably, be autogenerated at build time, assuming we go that route.

mdboom · 2014-12-08T17:17:22Z

It still tries to load the erfa.py.templ file, even though it's been removed:

Traceback (most recent call last):
  File "cython_generator.py", line 478, in <module>
    main(args.srcdir, args.output, args.template_loc)
  File "cython_generator.py", line 392, in main
    erfa_py_in  = env.get_template('erfa.py.templ')
  File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 791, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 765, in _load_template
    template = self.loader.load(self, name, globals)
  File "/usr/lib/python2.7/site-packages/jinja2/loaders.py", line 113, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/usr/lib/python2.7/site-packages/jinja2/loaders.py", line 178, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: erfa.py.templ

embray · 2014-12-08T17:31:56Z

Oops. Fixed.

mdboom · 2014-12-09T00:04:33Z

github doesn't seem to let me do line comments on erfa.py because the diff is too large... so... I also have no way of knowing if I'm commenting on @embray's work or @jwoillez' earlier work on which this is based.

In the loop that converts all of the inputs, the np.array call can fail if the inputs are the wrong type. It would be nice to catch and annotate the exception to include the argument name (or index), otherwise the user just gets "some object of many" can't be converted to an array (which is a real problem because some of the ERFA functions have dozens of arguments). (The C implementation in #3164 does this, and it proved really handy there).

In the shape comparison, below that:

 if arg.shape[-len(shape):] != shape:

this exception should also include the argument name. Including the func_name is probable redundant/irrelevant.

I don't know if it matters, so chalk this up to premature optimization, but it seems there are a number of values computed within erfa_func_wrapper that are effectively constants that could be computed outside of it -- perhaps by adding to the func_meta dictionary in make_erfa_func.

The conversion from strings to dtypes through the CTYPE_TO_DTYPE mapping
n_in = len([m for m in arg_meta if m['direction'] == 'in'])
op_flags = [['readonly']]*n_in + [['readwrite']]*(len(iter_args) - n_in) (len(iter_args) should always be len(arg_meta) by this point in the function

In make_erfa_functions it seems odd, though not wrong, for functions to be initialized inside the with statement.

mdboom · 2014-12-09T00:12:02Z

I wanted to run my usual timing script on this (based on @jwoillez'), but there seems to be a regression here. The following works on master and #3176, but not this PR:

import numpy as np
import astropy.erfa as erfa

jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, 100)
x = np.zeros((1,), erfa.dt_eraASTROM)
out = erfa.aper(jd, x)

I get:

Traceback (most recent call last):
  File "erfa_timing.py", line 9, in <module>
    out = erfa.aper(jd, x)
  File "/home/mdboom/python/lib/python2.7/site-packages/astropy-1.0.dev10645-py2.7-linux-x86_64.egg/astropy/erfa/erfa.py", line 163, in aper
    name=func_name)
  File "/home/mdboom/python/lib/python2.7/site-packages/astropy-1.0.dev10645-py2.7-linux-x86_64.egg/astropy/erfa/erfa.py", line 155, in erfa_func
    return erfa_func_wrapper(func_name, func_meta, *args)
  File "/home/mdboom/python/lib/python2.7/site-packages/astropy-1.0.dev10645-py2.7-linux-x86_64.egg/astropy/erfa/erfa.py", line 245, in erfa_func_wrapper
    it = np.nditer(iter_args, op_axes=op_axes, op_flags=op_flags)
ValueError: output operand requires a reduction, but reduction is not enabled

Similar errors running the tests. Numpy 1.8.2.

astrofrog · 2014-12-11T11:23:47Z

If time is too short to get all this working before the 1.0 feature freeze, could we at least merge @jwoillez's changes since they significantly reduce the size of the erfa.py file?

embray · 2014-12-11T17:53:59Z

Should be easy to fix.

import numpy as np
import astropy.erfa as erfa

jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, 100)
x = np.zeros((1,), erfa.dt_eraASTROM)
out = erfa.aper(jd, x)

this should probably be added to the unit tests then. All the existing tests pass.

embray · 2014-12-11T17:59:19Z

I don't know if it matters, so chalk this up to premature optimization, but it seems there are a number of values computed within erfa_func_wrapper that are effectively constants that could be computed outside of it

Indeed, in #3176 the n_in variable is pre-computed, though the op_flags one isn't. I deliberately did not precompute them just to keep the signature for erfa_function_wrapepr simple, and not require passing in additional values that can already be otherwise determined from the values that are passed in (namely the func_meta argument). Considering that these are loops over typically 3 to 10 element lists I think worrying about them is in the category of premature optimization.

I think if it did turn out to be an issue it could be easily optimized. But I don't think this would add any overhead except in cases where the Python wrapper is being called repeatedly in a loop. But in most cases that would defeat the purpose of the vectorized wrappers and could be considered "doing it wrong".

make_output_scalar -> make_outputs_scalar

…er than a Python module; the next update will include a replacement for the generated .py file. For now I am *not* including a copy of the generated erfa.json file, as it is more than half a MB, and I think we're still trying to suss out what to do with these generated files...

…py.templ entirely--instead this reads metadata about the ERFA functions in from erfa.json (generated by cython_generator.py which perhaps should be renamed) and creates the Python functions from that. A future enhancement could allow generating the functions on an as-needed basis rather than all at once at import time.

… partially restoring what was core.py.templ, but now it's a separate constants.py.templ just for the constants. The only reason to do this is so that their docstrings can (in principle) be picked up by Sphinx. However, I think it *may* be possible a the C level to give these constants docstrings as well, in which case this can be done in the Cython module (or C module if we switch to just using C directly). But I need to check about that.

…comment) and fixes the regression

embray · 2015-01-13T23:49:20Z

Rebased this PR in master, along with some additional cleanup to erfa_generator.py to make it a bit easier to follow.

This also fixes the issue raised by @mdboom in this comment and incorporates #3137 for supporting ERFA constants. For now this does reintroduce a (much smaller) .py template since it was the easiest way I could think of to support docstrings for the constants, though as I mentioned in aea2a42 I think it might be possible to assign docstrings to module-level constants at the C level, though I'm not sure about that.

mdboom · 2015-01-14T13:51:25Z

It would be really nice to milestone this as 1.0, don't you think, @embray?

mhvk · 2015-01-14T14:25:36Z

astropy/erfa/core.py

+                                     iter_shape)
+        iter_args.append(arg)
+
+    n_in = len([m for m in arg_meta if m['direction'] == 'in'])


This seemed a bit indirect. Slightly faster and perhaps clearer:

n_in = [m['direction'] for m in arg_meta].count('in')

Eh, I don't think that's any more obvious. In fact it requires two loops where what's there only requires one.

mdboom · 2015-01-14T15:30:21Z

FYI: This is about 4x slower than master for small input arrays, and still around 2x slower for large input arrays. (This is the same timing script from @jwoillez #3176).

Not that that is a reason to reject this approach -- it has some other advantages, of course.

embray · 2015-01-14T17:58:25Z

I'm not sure I see where that timing script is coming from or how you're running it. That would be useful to have, but I don't see where it's mentioned.

I'm really surprised it would be that much slower though, especially for larger arrays. It's still within an order of magnitude so I'm not super worried, but I have to wonder where all that additional overhead is coming from. I also wonder if you would see it for, say, 10^5 elements or greater?

embray · 2015-01-14T18:02:30Z

Another thing I want to look into is import time--is this actually faster to import than, say, the generated core.py. I think it will be--parsing the JSON file should be significantly faster than parsing/executing a large Python module. Even though right now it later builds all the functions I don't think that has significant overhead over the old way.

That said I also have ideas in mind for making large portions of this "lazy" which would make the import time negligible.

mdboom · 2015-01-14T18:03:36Z

Sorry -- that timing script is a little buried. Here it is:

import timeit
import numpy as np
import pylab as plt
import astropy.erfa as erfa


jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, 100)
x = np.zeros((1,), erfa.dt_eraASTROM)
out = erfa.aper(jd, x)


class Test:
    def __init__(self, N):
        self.jd = np.linspace(2456855.5, 2456855.5+1.0/24.0/60.0, N)

    def run(self):
        x = np.zeros((1,), erfa.dt_eraASTROM)
        out = erfa.aper(self.jd, x)

N = [1, 3, 10, 30, 100, 300, 1000]
T = []

for n in N:
    t = timeit.timeit(
        'test.run()',
        setup="from __main__ import Test; test = Test({0})".format(n), number=100)/100.0
    print("[%s, %s]," % (n, t))
    T.append(t)

fig, axarr = plt.subplots(1,1)
axarr.loglog(N,T)
axarr.loglog(N,T,'.b')
plt.show()

embray · 2015-01-14T18:06:59Z

Welp, some initial tests suggest that, contrary to my expectation, this is actually significantly slower than importing the large Python module. Turns out the JSON parser is slower to read the JSON file than the Python parser is to read a Python module. This is surprising to me and makes the json module suspect...

mdboom · 2015-01-14T18:11:09Z

Welp, some initial tests suggest that, contrary to my expectation, this is actually significantly slower than importing the large Python module.

Interesting. That surprises me as well.

embray · 2015-01-14T18:30:36Z

Most of the overhead is in the make_function_with_signature calls. I think I can speed that up a bit (and possibly almost eliminate it on Python 3). The lazy loading idea would also make that a non-issue. But no matter what parsing the JSON file is slower than executing the entire Python module and that's the bit that surprises me.

mhvk · 2015-01-14T18:56:59Z

I guess loading python modules has been highly optimized over time...

embray · 2015-01-14T19:54:09Z

Actually, it turns out in my previous testing that I was giving the normal Python module way too many advantages :) Most importantly being the generation of the .pyc file. Once that's done there's virtually no overhead to loading the module--no parsing, etc.

So this code could have all the same advantages if the resulting functions are also saved out to a pyc, which should be doable (but also getting into the realm of maybe overcomplexity? Not sure...). Lazy loading of functions would also, of course, mitigate this.

That takes care of that question though. In fact it turns out that once I factor out things like imports at the top of the module my code does import 3 times faster than the large Python module. It just doesn't have the .pyc to take advantage of later on...

mhvk · 2015-01-14T20:00:50Z

OK, so it was optimized by just doing the work once... I think in the end the run-time is really what matters, and it does seem odd that there is such a large difference. Any idea what the bottleneck is in the setup?

embray · 2015-01-14T20:03:38Z

@mhvk I'm checking that out now. As I suspected for larger data sizes (1e5, 1e6, etc.) this code is marginally faster. Whereas for <= 10000 the version in master is faster (again only marginally though).

embray · 2015-01-14T20:19:09Z

There are a few things I've been able to identify to reduce the run time for small values slightly. It's really tweaking around the margins though because we're talking about things like 9 ms vs. 16 ms. On a log scale the difference looks much less impressive. Most of the advantage the version in master has is the equivalent of loop unrolling, which is why it's marginally faster.

mhvk · 2015-01-14T21:28:37Z

@embray - actually, that sounds relatively promising; it should be possible to mimic the setup of the cython loop more closely so that also for small numbers the speed remains roughly the same. I do think it is worth trying to reduce, as it is more than a factor 2 in the plot above.

More broadly, just so I get this straight, in the end (with lazy loading), the advantages of this approach over that currently in master would be that for those that do not use erfa, much less memory is used and startup is faster, correct? And for those that do use erfa via Time or Coordinates, at least only the memory needed for the relevant routines would be used.

embray · 2015-01-14T21:51:53Z

That's the idea. It's just less unnecessary code floating around.

pllim · 2017-05-11T18:40:44Z

Too many conflicts and unlikely to be completed.

embray · 2017-05-15T14:14:52Z

Too bad, this was going in a good direction, but I agree I don't know when I would ever finish it.

mhvk · 2017-05-15T17:08:11Z

I'll go be a little annoying and re-open it; I think it is worth keeping the idea around for a while at least. If nothing happens by 3.0, this should be really closed...

astrofrog · 2017-05-15T18:04:19Z

For cases like this where the PR will not be completed, I would prefer to close the PR and open an issue to keep track of the concept and include a link to the PR. We shouldn't keep PRs open just for reference for future otherwise we end up with the current build-up :)

mhvk · 2017-05-15T19:17:27Z

OK, very good point. I've made the new issue (#6073), which refers to this one, which I'll now close.

jwoillez mentioned this pull request Dec 6, 2014

FYI: ERFA wrapper factorization #3176

Closed

astrofrog added the erfa label Jan 13, 2015

jwoillez and others added 8 commits January 13, 2015 12:57

Factorization of python wrapper code

147b9ec

Fixed typo impacting NPY < 1.8

01e1154

make_output_scalar -> make_outputs_scalar

Fix scalar handing logic

a2b379e

pep8 fix

d4e74a0

Adds a regression test for the case @mdboom gave here: astropy#3181 (…

504f0af

…comment) and fixes the regression

embray force-pushed the erfa-factorization branch 2 times, most recently from cc80e3c to 504f0af Compare January 13, 2015 23:48

mhvk reviewed Jan 14, 2015
View reviewed changes

embray mentioned this pull request Feb 19, 2015

Use Python/C API instead of Cython for ERFA wrappers #3521

Merged

embray force-pushed the master branch from 63540e0 to 34f4c29 Compare October 5, 2015 21:18

pllim closed this May 11, 2017

mhvk reopened this May 15, 2017

mhvk added the Work in progress label May 15, 2017

mhvk added this to the v3.0.0 milestone May 15, 2017

mhvk mentioned this pull request May 15, 2017

Simplify python interface to liberfa #6073

Closed

mhvk closed this May 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

liberfa Python further simplification #3181

liberfa Python further simplification #3181

embray commented Dec 5, 2014

embray commented Dec 8, 2014

mdboom commented Dec 8, 2014

embray commented Dec 8, 2014

mdboom commented Dec 9, 2014

mdboom commented Dec 9, 2014

astrofrog commented Dec 11, 2014

embray commented Dec 11, 2014

embray commented Dec 11, 2014

embray commented Jan 13, 2015

mdboom commented Jan 14, 2015

mhvk Jan 14, 2015

embray Jan 14, 2015

mdboom commented Jan 14, 2015

embray commented Jan 14, 2015

embray commented Jan 14, 2015

mdboom commented Jan 14, 2015

embray commented Jan 14, 2015

mdboom commented Jan 14, 2015

embray commented Jan 14, 2015

mhvk commented Jan 14, 2015

embray commented Jan 14, 2015

mhvk commented Jan 14, 2015

embray commented Jan 14, 2015

embray commented Jan 14, 2015

mhvk commented Jan 14, 2015

embray commented Jan 14, 2015

pllim commented May 11, 2017

embray commented May 15, 2017

mhvk commented May 15, 2017

astrofrog commented May 15, 2017

mhvk commented May 15, 2017

liberfa Python further simplification #3181

liberfa Python further simplification #3181

Conversation

embray commented Dec 5, 2014

embray commented Dec 8, 2014

mdboom commented Dec 8, 2014

embray commented Dec 8, 2014

mdboom commented Dec 9, 2014

mdboom commented Dec 9, 2014

astrofrog commented Dec 11, 2014

embray commented Dec 11, 2014

embray commented Dec 11, 2014

embray commented Jan 13, 2015

mdboom commented Jan 14, 2015

mhvk Jan 14, 2015

Choose a reason for hiding this comment

embray Jan 14, 2015

Choose a reason for hiding this comment

mdboom commented Jan 14, 2015

embray commented Jan 14, 2015

embray commented Jan 14, 2015

mdboom commented Jan 14, 2015

embray commented Jan 14, 2015

mdboom commented Jan 14, 2015

embray commented Jan 14, 2015

mhvk commented Jan 14, 2015

embray commented Jan 14, 2015

mhvk commented Jan 14, 2015

embray commented Jan 14, 2015

embray commented Jan 14, 2015

mhvk commented Jan 14, 2015

embray commented Jan 14, 2015

pllim commented May 11, 2017

embray commented May 15, 2017

mhvk commented May 15, 2017

astrofrog commented May 15, 2017

mhvk commented May 15, 2017