You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When astropy.table reads or writes FITS tables, the TNULL parameter for each integer column is always reset to 999999.
When reading in a table, the reader respects the TNULL in the FITS header, such that the returned table (with masked columns) has the masks where the values in the column were TNULL, however the fill_value attribute of the column is set to 999999.
When writing a table with masked columns (with an arbitrary fill_value) the TNULL for the column is always set as 999999, regardless of the choice of fill_value.
This causes problems where a table read in, then written out may have its TNULL values changed, such that the input table is no longer equal to the output table. Not to mention, for an integer column, one may expect 999999 to be a valid (e.g. non-NULL) value, but with astropy.table this value can be inadvertently converted to NULL upon write.
I note that if a column with an arbitrary fill value has its filled() method called, the masked rows are filled with the correct value. Given that fill values seem to be respected in table operations within astropy.table, I suspect that in table I/O numpy's masked array default fill values are being used rather than those set for the column (e.g. column.fill_value), as numpy.ma's default fill value for ints is 999999. Also of note is that numpy.ma has the default float mask value to be 1E20. This seems to be the default fill_value for a masked column containing floats (e.g. see my example code).
My best guess is when reading a table/column in, it creates the masked array properly, but fails to propagate TNULL to MaskedColumn.fill_value. When writing the table, instead of using MaskedColumn.fill_value it uses np.ma.default_fill_value)
Expected behavior
For a table being read in, if an integer column's TNULL ise set to, say, -1, then that column's fill_value attribute should be equal to -1, not 999999.
For a table being written out, if a given column's fill_value attribute is set to, say, -1, then TNULL for that column should be -1 in the FITS header, not 999999
How to Reproduce
The below code illustrates the behaviour.
First a table with a float32 column and an int32 column is created with some masked values (using BinTableHDU which correctly fills in the NULL values). The null value for the integer column is set to -1. This table then read in by astropy.table, and we find the TNULL for the integer column has been changed to 999999. Upon writing this table out again, TNULL is set to 999999 in the output table.
Secondly we create a table using astropy.table with masked columns. Again, the integer column has its null/fill value set to -1. Using Table.filled(), we note that the masked integer values are correctly filled with -1. Upon writing this table out, the masked elements are filled with 999999, and the TNULL property for that column is set to 999999.
importnumpyasnpfromastropy.tableimportTable, Column, MaskedColumnfromastropy.ioimportfitsfromastropyimport__version__INT_NULL=-1print(f"Using Astropy {__version__}")
print("")
print(f"NULL for integers has been defined as {INT_NULL}")
rng=np.random.RandomState(seed=1)
# generate some integer and float dataint_data=np.asarray([iforiinrange(10)], dtype=np.int32)
float_data=np.asarray([float(i) foriinrange(10)], dtype=np.float32)
mask=rng.randint(2,size=10, dtype=bool)
masked_rows= [ ifori, maskedinenumerate(mask) ifmasked]
# First of all, write a (correct) table with NULL values in columns, and see how astropy.Table handles theseprint("")
print("Create table with BinTableHDU:")
# make table via astropy.io.fits floats=np.copy(float_data)
floats[mask] =np.NaNints=np.copy(int_data)
ints[mask] =-1c1=fits.Column(name="float_col", format="D", array=floats)
# This column's NULL will be defined in the TNULL2 card in the table's FITS headerc2=fits.Column(name="int_col", format="K", null=INT_NULL, array=ints)
hdu=fits.BinTableHDU.from_columns([c1,c2])
hdul=fits.HDUList([fits.PrimaryHDU(), hdu])
hdul.writeto("t_io_fits.fits", overwrite=True)
# open this table (with astropy.io.fits) and check what TNULL2 is (should be INT_NULL as defined above)withfits.open("t_io_fits.fits") ashdul:
header=hdul[1].headernull=header["TNULL2"]
print(f"BinTableHDU table has TNULL2 as {null}. Correct? {null==INT_NULL}")
# Now read this table in with astropy.tablet=Table.read("t_io_fits.fits")
print(f"Rows that should be masked are {masked_rows}")
print("Table read in by astropy.table:")
print(t)
null=t["int_col"].fill_valueprint(f"Table read in by astropy.table has TNULL2 as {null}. Correct? {null==INT_NULL}")
float_null=t["float_col"].fill_valueprint(f"Float column has default fill value of {float_null}")
# write the table out with astropy.table, reading in with astropy.io.fits and checking what null ist.write("t_table.fits", overwrite=True)
withfits.open("t_table.fits") ashdul:
header=hdul[1].headernull=header["TNULL2"]
print(f"Table written by astropy.table has TNULL2 as {null}. Correct? {null==INT_NULL}")
# Now try creating a table with astropy.tableprint("")
print("Create table with astropy.Table:")
c1=MaskedColumn(name="float_col", data=float_data, mask=mask, fill_value=np.NaN)
c2=MaskedColumn(name="int_col", data=int_data, mask=mask, fill_value=-1)
t2=Table(data= [c1,c2])
null=t2["int_col"].fill_valueprint(f"Table created with Table has TNULL2 as {null}. Correct? {null==INT_NULL}")
# Try the filled method of the tablet3=t2.filled()
masked_vals=t3["int_col"][mask][0]
print(f"Filled table has NULL values as {masked_vals}. Correct? {masked_vals==INT_NULL}")
# write the (unfilled) table outt2.write("t2.fits", overwrite=True)
withfits.open("t2.fits") ashdul:
header=hdul[1].headernull=header["TNULL2"]
print(f"Output Table has TNULL2 as {null}. Correct? {null==INT_NULL}")
Output of code snippet:
Using Astropy 5.1.1
NULL for integers has been defined as -1
Create table with BinTableHDU:
BinTableHDU table has TNULL2 as -1. Correct? True
Rows that should be masked are [0, 2, 5]
Table read in by astropy.table:
float_col int_col
--------- -------
-- --
1.0 1
-- --
3.0 3
4.0 4
-- --
6.0 6
7.0 7
8.0 8
9.0 9
Table read in by astropy.table has TNULL2 as 999999. Correct? False
Float column has default fill value of 1e+20
Table written by astropy.table has TNULL2 as 999999. Correct? False
Create table with astropy.Table:
Table created with Table has TNULL2 as -1. Correct? True
Filled table has NULL values as -1. Correct? True
Output Table has TNULL2 as 999999. Correct? False
Thank you for reporting and looking into this! Since you basically know where the bug is, would you be interested in submitting a patch to get credit for fixing it? Usually it would also come with a change log and a regression test.
Description
When
astropy.table
reads or writes FITS tables, the TNULL parameter for each integer column is always reset to 999999.When reading in a table, the reader respects the TNULL in the FITS header, such that the returned table (with masked columns) has the masks where the values in the column were TNULL, however the
fill_value
attribute of the column is set to 999999.When writing a table with masked columns (with an arbitrary
fill_value
) the TNULL for the column is always set as 999999, regardless of the choice offill_value
.This causes problems where a table read in, then written out may have its TNULL values changed, such that the input table is no longer equal to the output table. Not to mention, for an integer column, one may expect 999999 to be a valid (e.g. non-NULL) value, but with astropy.table this value can be inadvertently converted to NULL upon write.
I note that if a column with an arbitrary fill value has its
filled()
method called, the masked rows are filled with the correct value. Given that fill values seem to be respected in table operations withinastropy.table
, I suspect that in table I/O numpy's masked array default fill values are being used rather than those set for the column (e.g.column.fill_value
), asnumpy.ma
's default fill value for ints is 999999. Also of note is thatnumpy.ma
has the default float mask value to be 1E20. This seems to be the defaultfill_value
for a masked column containing floats (e.g. see my example code).My best guess is when reading a table/column in, it creates the masked array properly, but fails to propagate TNULL to
MaskedColumn.fill_value
. When writing the table, instead of usingMaskedColumn.fill_value
it usesnp.ma.default_fill_value
)Expected behavior
For a table being read in, if an integer column's TNULL ise set to, say, -1, then that column's
fill_value
attribute should be equal to -1, not 999999.For a table being written out, if a given column's
fill_value
attribute is set to, say, -1, then TNULL for that column should be -1 in the FITS header, not 999999How to Reproduce
The below code illustrates the behaviour.
First a table with a float32 column and an int32 column is created with some masked values (using
BinTableHDU
which correctly fills in the NULL values). The null value for the integer column is set to -1. This table then read in byastropy.table
, and we find the TNULL for the integer column has been changed to 999999. Upon writing this table out again, TNULL is set to 999999 in the output table.Secondly we create a table using
astropy.table
with masked columns. Again, the integer column has its null/fill value set to -1. UsingTable.filled()
, we note that the masked integer values are correctly filled with -1. Upon writing this table out, the masked elements are filled with 999999, and the TNULL property for that column is set to 999999.Output of code snippet:
Versions
Linux-5.19.0-40-generic-x86_64-with-glibc2.35
Python 3.10.6
astropy 5.1.1
Numpy 1.21.5
pyerfa 2.0.0.1
Scipy 1.8.0
The text was updated successfully, but these errors were encountered: