including input table as metadata in Table like in astropy.io.fits and io.fits HDUList.data/header #6429

richardgmcmahon · 2017-08-07T19:28:44Z

astropy.io.fits has a method HDUList.filename() which contains the filename of the file that is read in.

From a data provenance point of view, this is very useful and it would be useful if functionality was also in astropy Table.

It would be also be convenient if the filename was carried forward into the Header and Data 'objects' in astropy.io.fits.

i.e. print(data.filename()) would print the filename that the header came from.

http://docs.astropy.org/en/stable/io/fits/api/hdulists.html

Thanks

MSeifert04 · 2017-08-08T14:24:17Z

I think it's not worth it to subclass np.ndarray just to add a filename attribute to the data property of HDUs. So that's a 👎

However, I'm undecided about having it on HDUs or headers. It probably wouldn't be hard to add it but it could be tricky if one puts HDUs from one file in a new HDUList. That could lead to inconsistent filenames inside one HDUList. That depends on how rigorous we want to keep the filenames in sync.

No idea about Tables. I thought they already had some meta in case it was read from a file.

richardgmcmahon · 2017-08-08T17:46:40Z

Thanks for the response, just to add that in principle any user can use the meta attribute in Table to store the input filename.

table.meta['filename'] = filename

I am advocating that from a scientific point of view it would be good practice if the input filename was a defined attribute. I would argue that just is just as important as image or table column units; the input filename is a fundamental descriptor.

mhvk · 2017-08-08T17:59:57Z

Agreed on the importance as well as on the difficulty: what happens when you create a new table, or join two tables? Or read from a filehandle? And should this particular bit of metadata be saved if you write to a file? If so, what should the filename be when the table is read back in? MIDAS partially solved this by having a HISTORY keyword that logged everything that happened, and even though this had obvious limitations, I found this to be an incredibly useful thing. But it partially worked so well since every larger object always was a file, so it was meaningful to refer to things by their name. For python/astropy, this is much less clear - column names are well-defined, but table names are not.

MSeifert04 · 2017-08-08T22:16:28Z

I'm not arguing that the filename isn't important. That's one thing I really liked about ccdprocs ImageFileCollection.

However cascading the filename can't be the only option. I agree that it might be important for Table and NDData but for the low-level io.fits objects having the filename on the HDUList is (in my opinion) enough.

But I'm open for discussion about this. Except for adding it to the HDUs data because that would require to subclass np.ndarray just to add that attribute.

pllim · 2017-08-09T13:44:05Z

In STScI FITS files, most of them has FILENAME keyword in the primary header (EXT 0), which if read into a Table, it would be in mytable.meta['header']['FILENAME']. However, if I recall correctly, that keyword was added manually (please correct me if I am wrong) and can be outdated if a file is renamed.

But my point is, if you want filename in Table metadata, the current solution is to use a FILENAME keyword in your FITS header.

I agree with the above point that it is difficult to get "filename" attribute in sync if carried around "officially". If I read in a file, and then modified its buffer without saving it back to the same filename, then the "filename" attribute can be misleading.

richardgmcmahon · 2017-11-19T10:12:25Z

For info, I have been reading a hdf5 file and it has a filename attribute which stores the name of the input file. I think it would be shared good practice to have filename metadata in a Table

e.g.
h5 = h5py.File(infile)
print('h5.filename:', h5.filename)

I accept the point that there is some danger if the table content is changed. Maybe this needs to be managed in some way with another piece of metadata that indicates that there has been a table change. This could just be binary; True/False.

MSeifert04 · 2017-11-20T20:25:58Z

yeah, but that h5py.File is really associated with a file, a Table reads the file but after reading it it's a different entity (maybe not if it's memory-mapped but I'm not completely sure on that point).

And it's very easy to add it manually so a simple custom wrapper would be enough to keep the filename around if one needs it and it makes it clearer that it's the responsibility of the user to keep the contents in sync or not.

pllim added Feature Request io.fits table labels Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

including input table as metadata in Table like in astropy.io.fits and io.fits HDUList.data/header #6429

including input table as metadata in Table like in astropy.io.fits and io.fits HDUList.data/header #6429

richardgmcmahon commented Aug 7, 2017

MSeifert04 commented Aug 8, 2017

richardgmcmahon commented Aug 8, 2017

mhvk commented Aug 8, 2017

MSeifert04 commented Aug 8, 2017 •

edited

pllim commented Aug 9, 2017

richardgmcmahon commented Nov 19, 2017

MSeifert04 commented Nov 20, 2017

including input table as metadata in Table like in astropy.io.fits and io.fits HDUList.data/header #6429

including input table as metadata in Table like in astropy.io.fits and io.fits HDUList.data/header #6429

Comments

richardgmcmahon commented Aug 7, 2017

MSeifert04 commented Aug 8, 2017

richardgmcmahon commented Aug 8, 2017

mhvk commented Aug 8, 2017

MSeifert04 commented Aug 8, 2017 • edited

pllim commented Aug 9, 2017

richardgmcmahon commented Nov 19, 2017

MSeifert04 commented Nov 20, 2017

MSeifert04 commented Aug 8, 2017 •

edited