Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

general questions #91

Closed
nden opened this issue Apr 8, 2015 · 5 comments
Closed

general questions #91

nden opened this issue Apr 8, 2015 · 5 comments

Comments

@nden
Copy link
Contributor

nden commented Apr 8, 2015

@mdboom Two questions that came up while testing this with some real data.

  • I had a file written with a previous version of pyasdf and I can't read it with the latest version. I can write it and read it back correctly. I understand this is still in development but I'm wondering if there's a way to handle this in the future. I am not suggesting "Once ASDF always ASDF" but I suspect we'll have to handle this in some way in the future so I'm raising it for consideration. One thing that can be done now is perhaps write out the version of Pyasdf that created the file.
  • The second question is about performance. I have 3 files: "dist.asdf" has a compound model consisting of a few polynomials, "foc2sky.asdf" is the typical WCS transformation consisting of linear transformation, tan deprojection and sky rotation, and "image.asdf" is the above two combined into one file. Here's the timing I get from reading them:
timeit f=AsdfFile.read('dist.asdf')
1 loops, best of 3: 422 ms per loop

timeit f=AsdfFile.read('foc2sky.asdf')
10 loops, best of 3: 54.9 ms per loop

timeit f=AsdfFile.read('image.asdf')
1 loops, best of 3: 8.99 s per loop

Why the big difference?

@mdboom
Copy link
Contributor

mdboom commented Apr 8, 2015

One thing that can be done now is perhaps write out the version of Pyasdf that created the file.

Yes -- the size of the block header recently changed to support compression. Once we have a first stable numbered release of ASDF, then all versions of pyasdf going forward will support all versions of ASDF, but right now, I'm not dealing with such complications. The ASDF version number is at the top of every ASDF file, and each tag within an ASDF file is individually versioned, so this won't be difficult going forward. One of the major drawbacks of FITS is that it has never had a version number.

Why the big difference?

I suspect the difference is the inclusion of arrays in dist.asdf, but I can't really say. Can you share the files with me?

@mdboom
Copy link
Contributor

mdboom commented Apr 8, 2015

On the performance issues. There was one obvious bug in pyasdf that was causing runtime to increase exponentially with the number of items in the tree. That should be fixed by #92.

After doing that, the breakdown on the image.asdf example is essentially:

  • 25% YAML parsing (in pyyaml)
  • 25% JSON schema validation (about 25% of which is in pyasdf, the reset in jsonschema)
  • 50% Construction of models

There may be some low-hanging fruit for optimization in models. And there's probably some extra in pyasdf, but not as much.

@nden
Copy link
Contributor Author

nden commented Apr 8, 2015

This sounds great. Thanks.

@embray
Copy link
Contributor

embray commented Apr 9, 2015

I'm making a little bit of progress on the construction of models. I just eliminated some overhead in creation of model classes, which happens every time you build a compound model. Will do better still once it's possible to convert an expression tree directly to a model class without making intermediate classes.

@mdboom
Copy link
Contributor

mdboom commented Apr 15, 2015

I'm going to close this because I think I've got all of the low-hanging performance fruit in pyasdf here.

@mdboom mdboom closed this as completed Apr 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants