Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FITS BINTABLE HDUs needless strip trailing whitespace from string fields #11341

Open
embray opened this issue Feb 22, 2021 · 0 comments
Open
Labels
API change PRs and issues that change an existing API, possibly requiring a deprecation period Bug Effort-medium io.fits Package-intermediate

Comments

@embray
Copy link
Member

embray commented Feb 22, 2021

Description

This is a follow-up to #11312.

Expected behavior

When saving string data in a FITS binary table column, the data should be preserved as-written (assuming the data fits in the column width).

Actual behavior

However, if a string contains trailing whitespace (particularly spaces, as FITS allows any printable ASCII characters in text columns) that trailing whitespace is removed, even going so far as to modify the data of existing FITS files when opened in update mode (#11312).

Steps to Reproduce

The following demonstrates the problem simply:

>>> from astropy.io import fits
>>> import numpy as np
>>> data = np.array([b'abc', b'ab ', b'a  ', b'   '], dtype=[('a', 'S3')])
>>> data['a']
array([b'abc', b'ab ', b'a  ', b'   '], dtype='|S3')
>>> hdu = fits.BinTableHDU.from_columns(data)
>>> hdu.header
XTENSION= 'BINTABLE'           / binary table extension                         
BITPIX  =                    8 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                    3 / length of dimension 1                          
NAXIS2  =                    4 / length of dimension 2                          
PCOUNT  =                    0 / number of group parameters                     
GCOUNT  =                    1 / number of groups                               
TFIELDS =                    1 / number of table fields                         
TTYPE1  = 'a       '                                                            
TFORM1  = '3A      '                                                            
>>> hdu.writeto('a.fits')
>>> with fits.open('a.fits') as hdul:
...     print(hdul[1].data)
...     print(hdul[1].data.tobytes())
... 
[('abc',) ('ab',) ('a',) ('',)]
b'abcab\x00a\x00\x00 \x00\x00'

The last line demonstrates that this is not just an issue of how the string field is displayed when printing: the underlying bytes are modified as well. Weirdly, for the last row, a single leading is kept but the following ones are replaced with zeros.

Additional Background

The current behavior is as designed. Previously there was a belief that this might be required by the FITS standard, but actually we can find no evidence that that is the case in the latest draft (if it was in a previous draft I can't find that either). However, the existing functionality has been in place--probably just a side effect of implementation details of Numpy at the time--since early versions of PyFITS.

Since the existing functionality has been in place for such a long time, care needs to be taken in removing it.

Proposed Solution

See #11312 (comment)

@embray embray added io.fits Bug Package-intermediate Effort-medium Priority-Medium API change PRs and issues that change an existing API, possibly requiring a deprecation period labels Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change PRs and issues that change an existing API, possibly requiring a deprecation period Bug Effort-medium io.fits Package-intermediate
Projects
None yet
Development

No branches or pull requests

1 participant