Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading and saving fits file multiply file size by 2 #16325

Closed
lelouedec opened this issue Apr 23, 2024 · 12 comments
Closed

Loading and saving fits file multiply file size by 2 #16325

lelouedec opened this issue Apr 23, 2024 · 12 comments

Comments

@lelouedec
Copy link

Description

We are processing a lot of fits files for a dataset creation and noticed that the files once loaded and then saved after modification were twice the size for same data and slightly different header.
To verify the behavior I just tried the following:

from astropy.io import fits 
filea = fits.open("20230515_000831_s4h1A.fts")
filea.writeto("20230515_000831_s4h1A.fts",overwrite=True)

and

from astropy.io import fits 
filea = fits.open("20230515_000831_s4h1A.fts")
data = filea[0].data
header = filea[0].header
thudl = fits.PrimaryHDU(data,header)
thudl.writeto("20230515_000831_s4h1A.fts",overwrite=True)

For both the with a freshly downloaded raw file, the resulted file is 8.4 MB instead of the original 4.2MB.

Expected behavior

Saved and loaded files should be the same size.

How to Reproduce

use code above and one file like this one :
https://stereo-ssc.nascom.nasa.gov/pub/ins_data/secchi/L0/a/img/hi_1/20240120/20240120_004831_s4h1A.fts

Versions

astropy==5.3

@lelouedec lelouedec added the Bug label Apr 23, 2024
Copy link

Welcome to Astropy 👋 and thank you for your first issue!

A project member will respond to you as soon as possible; in the meantime, please double-check the guidelines for submitting issues and make sure you've provided the requested details.

GitHub issues in the Astropy repository are used to track bug reports and feature requests; If your issue poses a question about how to use Astropy, please instead raise your question in the Astropy Discourse user forum and close this issue.

If you feel that this issue has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.

@neutrinoceros
Copy link
Contributor

Thank you for reporting this. Without knowing the details too well, any chance you might be reading single precision data and that it's being re-exported as double precision ?

@lelouedec
Copy link
Author

I was wondering that as well, but in the case where I am writing the HDUL directly after reading, I couldn't be sure if astropy was converting precision somehow when reading?

@cmarmo
Copy link
Member

cmarmo commented Apr 23, 2024

Hi @lelouedec , I've downloaded the file you linked above 20240120_004831_s4h1A.fts and executed your commands with the last development version of astropy: I was unable to reproduce your issue.

Do you mind checking on your side with a recent version of astropy? Thanks!

@lelouedec
Copy link
Author

lelouedec commented Apr 23, 2024

Hey so I retried opening and then saving the HDUL and it is saved with same size. But if I do the following :

from astropy.io import fits 
filea = fits.open("20230515_000831_s4h1A.fts")
data = filea[0].data
header = filea[0].header
thudl = fits.ImageHDU(data,header)
thudl.writeto("20230515_000831_s4h1A.fts",overwrite=True)

it becomes twice the size still.

Weird thing is : filea.writeto saves it as 4MB, while filea[0].filebytes() returns 8MB

>>> filea = fits.open("20230519_000831_s4h1A.fts")
>>> filea.writeto("test.fts",overwrite=True)
>>> filea[0].filebytes()
4216320
>>> filea[0].filebytes()
4216320
>>> filea.writeto("test.fts",overwrite=True)
>>> hdul = fits.ImageHDU(filea[0].data,filea[0].header)
>>> filea[0].filebytes()
8409600
>>> filea.writeto("test.fts",overwrite=True)

there must be something to do with references , first test.fts is 4.2MB, and last one saved is 8MB

@cmarmo
Copy link
Member

cmarmo commented Apr 23, 2024

Your fits file contains an ImageHDU as Primary header and a BinTableHDU as first extension.

When your read and save the file nothing changes (at least with recent version of astropy... :) )

When you extract the image extension it is saved with

XTENSION= 'IMAGE   '           / Image extension                                
BITPIX  =                  -64 / array data type                                

as @neutrinoceros said.

I guess there is a way to tell astropy the type we want to write back? ... sorry I cannot find it right now....

@lelouedec
Copy link
Author

Ah I see! there was indeed some kind of conversion under the hood when taking data and header and putting them back in a ImageHDU/directly doing fits.writeto(data,header).

I am also trying to see if it is possible to do it immediately with astropy. I have a few thousands file I already saved with wrong size, and it would be great to just reopen them and save them correctly ^^

@saimn
Copy link
Contributor

saimn commented Apr 23, 2024

Your data is stored as int32 but with BSCALE=1.0 / BZERO=0.0 which causes a conversion to float64 when you access .data:

BITPIX  =                   32 /  32-bit twos complement binary integer         
BSCALE  =              1.00000 /                                                
BZERO   =              0.00000 /      

So when you access filea[0].data the data is rescaled to float64 which then causes PrimaryHDU(data,header) to be written as float.

To avoid the conversion:

  • force conversion to int32 (data.astype(int32), BSCALE / BZERO will be removed in the output file)
  • or force conversion with BSCALE=1.0 / BZERO=0.0 : thudl.scale("int32", bscale=1, bzero=0)

@lelouedec
Copy link
Author

Ah, I see! Thanks a lot for the help. I am going to write a small script that converts all the previously saved fit files to 32-bit precision! Which should also be float 32 right ?

Thanks again

Copy link

Hi humans 👋 - this issue was labeled as Close? approximately 16 hours ago. If you think this issue should not be closed, a maintainer should remove the Close? label - otherwise, I will close this issue in 7 days.

If you believe I commented on this issue incorrectly, please report this here

@dhomeier
Copy link
Contributor

Which should also be float 32 right ?

Per the workflows above, it should become int32 again – float32 might result in loss of precision.
In theory you might still see some roundoff errors from converting first to float64 and back to int32, so the best way to keep the original data intact might be, as suggested by @astrofrog

hdul = fits.open('20240120_004831_s4h1A.fts', do_not_scale_image_data=True)

Copy link

github-actions bot commented May 2, 2024

I'm going to close this issue as per my previous message, but if you feel that this issue should stay open, then feel free to re-open and remove the Close? label.

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here

@github-actions github-actions bot closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants