Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major rework of BEAST files #299

Open
karllark opened this issue Mar 5, 2019 · 11 comments
Open

Major rework of BEAST files #299

karllark opened this issue Mar 5, 2019 · 11 comments

Comments

@karllark
Copy link
Member

karllark commented Mar 5, 2019

From the Feb 2019 BEAST HackDay, the idea emerged that a major rework of the BEAST file data formats could provide significant benefits. One particular issue is the challenges of using the HDF5 format as it has hard for multiple BEAST developers to understand or manipulate. The benefits could be easier to understand/manipulate, less code for reading/writing -> less code maintenance, and better suited for Mega-BEAST (and BEAST) needs.

Currently, the formats are a mix of ASCII csv, hdf5, and fits files.

So, what would the format(s) of the BEAST files look like w/o the considering the current formats? In other words, if we started with a clean slate, what would we do? We understand the needs of the BEAST and Mega-BEAST much better now.

Please comment with ideas/proposals.

Existing issues that touch on this topic:
Output file format: #53
Size of the files: #295
HDF5 issues; #186, #262
unclosed files: #64
eztables: #9, #10

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

A file format that is more understandable than hdf5 would be nice. Possibilities are FITS or ASDF. But concerns about speed.

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

Fewer files would be easier to manage. Possibilities

  1. Merge spec and spec_w_priors
  2. Merge physics and observation models, but then harder to have separate observation (noisemodels) for a single physicsgrid file.
  3. Merge output stats, pdf1d, and lnp files into a single file. Ideally have one "record" for each star fit.

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

For tables, we should use astropy.tables to take advantage of the extensive work done by the larger community.

@lea-hagen
Copy link
Member

I'd like to change the input file format: make datamodel a text file rather than .py (similar to what is done now for the megabeast).

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

Good point. Would be great to be able to just us a text file. One impact would mean it would be easy to setup the beast to run from any directory with the BEAST scripts installed on the system.

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

HDF5 info

python h5py closer to numpy than pytables:
http://docs.h5py.org/en/stable/faq.html#what-s-the-difference-between-h5py-and-pytables

pytables more database orientated and includes non-numpy data types:
http://www.pytables.org/FAQ.html#how-does-pytables-compare-with-the-h5py-project

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

ASDF info

standard:
https://asdf-standard.readthedocs.io/en/stable/

python implementation:
https://github.com/spacetelescope/asdf

@karllark
Copy link
Member Author

karllark commented Mar 5, 2019

@meredith-durbin : I have a memory of you comparing asdf and hdf5 for speed, but can't find where your results were documented (an issue mabye?). Can you provide a pointer to your results? Or am I remembering something incorrectly and you did not do this?

@lea-hagen
Copy link
Member

Another possibly relevant piece of information: I got this warning when running one of the large files for PHAT production runs. I have no idea if switching file formats would make this better.

/user/lhagen/miniconda3/envs/ac/lib/python3.5/site-packages/tables/group.py:489: PerformanceWarning: group ``/`` is exceeding the recommended maximum number of children (16384); be ready to see PyTables asking for *lots* of memory and possibly slow I/O.

@karllark
Copy link
Member Author

That does not sound good. Any more information of what file this is for? Maybe the lnp file?

@karllark
Copy link
Member Author

karllark commented May 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants