Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round tripping non-float Quantities from QTable to fits converts all Quantities to floats #12494

Open
samaloney opened this issue Nov 18, 2021 · 6 comments · May be fixed by #12505
Open

Round tripping non-float Quantities from QTable to fits converts all Quantities to floats #12494

samaloney opened this issue Nov 18, 2021 · 6 comments · May be fixed by #12505

Comments

@samaloney
Copy link

samaloney commented Nov 18, 2021

Description

Trying to serialise a QTable to and from fits the non-floats are always converted to floats rather then keeping the original dtypes

Expected behavior

The created QTable to have the same dtype as the underling fits column.

Actual behavior

The created QTable has converted the uint and int to float64

Steps to Reproduce

import astropy.units as u
import numpy as np

from astropy.table import QTable
from astropy.io import fits

my_table = QTable()

my_table['uint'] = (list(range(10)) * u.m).astype(np.uint16)
my_table['int'] = (list(range(10)) * u.m).astype(np.int16)
my_table['float32'] = (list(range(10)) * u.m).astype(np.float32)

my_table.dtype
# dtype([('uint', '<u2'), ('int', '<i2'), ('float32', '<f4')])

my_table.write('test.fits')

table_fromfits = QTable.read('test.fits')
table_fromfits.dtype
#dtype([('uint', '<f8'), ('int', '<f8'), ('float32', '>f4')])

A simple but not elegant solution is to use the dtype info from the HDUL to cast the Quantity back to the desired dtype
This actually doesn't work for the uint case or cases with mixed quantities an non-quantities dtype([('uint', '<i2'), ('int', '<i2'), ('float32', '<f4')]) should be dtype([('uint', '<u2'), ('int', '<i2'), ('float32', '<f4')])

hdul = fits.open('test.fits')

hdul[1].data.columns

#ColDefs(
#    name = 'uint'; format = 'I'; unit = 'm'; bzero = 32768
#    name = 'int'; format = 'I'; unit = 'm'
#    name = 'float32'; format = 'E'; unit = 'm'
#)

fixed_table = QTable([u.Quantity(table_fromfits[c.name], dtype=c.dtype) for c in hdul[1].data.columns])

fixed_table.dtype
# dtype([('uint', '<i2'), ('int', '<i2'), ('float32', '<f4')])

System Details

Linux-5.4.0-33-generic-x86_64-with-glibc2.29
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0]
Numpy 1.20.2
pyerfa 1.7.3
astropy 4.3.1
Scipy 1.6.3
Matplotlib 3.4.1

@dhomeier
Copy link
Contributor

The second step of the workaround

fixed_table = QTable([u.Quantity(table_fromfits[c.name], dtype=c.dtype) for c in hdul[1].data.columns])

could be made a bit more convenient at least as
QTable(table_fromfits, dtype=table_fromfits.dtype)
by having QTable._convert_col_for_table apply the dtype from *kwargs individually to q_cls.

@dhomeier
Copy link
Contributor

dhomeier commented Nov 18, 2021

As for the I/O part, I had assumed that it was a design choice to construct all Quantity-type columns as float types, but am wondering now if this has in fact been overlooked when discarding the dtype info in _decode_mixins

# Use the `datatype` attribute info to update column attributes that are
# NOT already handled via standard FITS column keys (name, dtype, unit).
for col in info['datatype']:
for attr in ['description', 'meta']:

Edit: whatever the intended default behaviour, setting the attrs above will not make any difference (as the tbl[col['name']].info.dtype in question already/still are of integer type at that point); any datatype information would probably need to be passed on with the tbl.meta['__serialized_columns__'] dict to the _construct_mixin[s] utilities.

@dhomeier
Copy link
Contributor

@samaloney I have pushed a fix for correctly reading back the mixins in #12505 ; if you have chance to test if this addresses your problems, feedback would be great.

For a temporary workaround, as you noted that your use cases typically involve reading in large volumes of data, I strongly recommend to use getheader and getdata to read the tables in their original dtypes and then extract the units to directly construct
u.Quantity(tab[col], dtype=tab[col].dtype, name=col, unit=u.Unit(header.get('TUNITn'), format='fits')).
It's a bit cumbersome, but would avoid conversion losses and in particular the temporary creation of arrays 4x the original size or larger, in particular since the type conversion does not let you take advantage of memmap=True.

@samaloney
Copy link
Author

Sorry for the delay @dhomeier I've just loaded some of my fits files using your branch seems to work as expected the dtypes are respected.

@dhomeier
Copy link
Contributor

dhomeier commented Nov 25, 2021

Sorry for the delay @dhomeier I've just loaded some of my fits files using your branch seems to work as expected the dtypes are respected.

Thanks for the confirmation! Note that the currently favoured implementation will store the dtype information to its own __serialized_columns__ field and thus not recover that info from files written with the current Astropy version. So for existing files the header would still need to be fixed separately. Also we are still discussing in #12505 (comment) whether to set the original dtype automatically or require a new QTable.read() option.

@parejkoj
Copy link
Contributor

parejkoj commented Nov 8, 2022

Poking this one, as we've just discovered this behavior in LSST. At the very least, we should make the green note on this page more prominent.

Any kind of "changes everything to a float" behavior can be dangerous for bit fields, ids or large explicitly countable values. This is related to a problem we've had with pandas, where it changes dtypes to float in order to fill in missing data with NaN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants