general questions #91

nden · 2015-04-08T14:04:34Z

@mdboom Two questions that came up while testing this with some real data.

I had a file written with a previous version of pyasdf and I can't read it with the latest version. I can write it and read it back correctly. I understand this is still in development but I'm wondering if there's a way to handle this in the future. I am not suggesting "Once ASDF always ASDF" but I suspect we'll have to handle this in some way in the future so I'm raising it for consideration. One thing that can be done now is perhaps write out the version of Pyasdf that created the file.
The second question is about performance. I have 3 files: "dist.asdf" has a compound model consisting of a few polynomials, "foc2sky.asdf" is the typical WCS transformation consisting of linear transformation, tan deprojection and sky rotation, and "image.asdf" is the above two combined into one file. Here's the timing I get from reading them:

timeit f=AsdfFile.read('dist.asdf')
1 loops, best of 3: 422 ms per loop

timeit f=AsdfFile.read('foc2sky.asdf')
10 loops, best of 3: 54.9 ms per loop

timeit f=AsdfFile.read('image.asdf')
1 loops, best of 3: 8.99 s per loop

Why the big difference?

The text was updated successfully, but these errors were encountered:

mdboom · 2015-04-08T14:17:37Z

One thing that can be done now is perhaps write out the version of Pyasdf that created the file.

Yes -- the size of the block header recently changed to support compression. Once we have a first stable numbered release of ASDF, then all versions of pyasdf going forward will support all versions of ASDF, but right now, I'm not dealing with such complications. The ASDF version number is at the top of every ASDF file, and each tag within an ASDF file is individually versioned, so this won't be difficult going forward. One of the major drawbacks of FITS is that it has never had a version number.

Why the big difference?

I suspect the difference is the inclusion of arrays in dist.asdf, but I can't really say. Can you share the files with me?

mdboom · 2015-04-08T16:44:57Z

On the performance issues. There was one obvious bug in pyasdf that was causing runtime to increase exponentially with the number of items in the tree. That should be fixed by #92.

After doing that, the breakdown on the image.asdf example is essentially:

25% YAML parsing (in pyyaml)
25% JSON schema validation (about 25% of which is in pyasdf, the reset in jsonschema)
50% Construction of models

There may be some low-hanging fruit for optimization in models. And there's probably some extra in pyasdf, but not as much.

nden · 2015-04-08T17:03:20Z

This sounds great. Thanks.

embray · 2015-04-09T19:16:51Z

I'm making a little bit of progress on the construction of models. I just eliminated some overhead in creation of model classes, which happens every time you build a compound model. Will do better still once it's possible to convert an expression tree directly to a model class without making intermediate classes.

mdboom · 2015-04-15T01:28:26Z

I'm going to close this because I think I've got all of the low-hanging performance fruit in pyasdf here.

mdboom mentioned this issue Apr 8, 2015

Don't let seen set get out of control #92

Merged

mdboom closed this as completed Apr 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

general questions #91

general questions #91

nden commented Apr 8, 2015

mdboom commented Apr 8, 2015

mdboom commented Apr 8, 2015

nden commented Apr 8, 2015

embray commented Apr 9, 2015

mdboom commented Apr 15, 2015

general questions #91

general questions #91

Comments

nden commented Apr 8, 2015

mdboom commented Apr 8, 2015

mdboom commented Apr 8, 2015

nden commented Apr 8, 2015

embray commented Apr 9, 2015

mdboom commented Apr 15, 2015