Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalogs headers from products of dev1 pipeline cannot be read by external libraries #438

Closed
simonegarrappa opened this issue May 6, 2024 · 12 comments
Assignees

Comments

@simonegarrappa
Copy link
Collaborator

After processing data with pipeline dev1 version (as of this message date) I cannot read the catalogs header with external libraries (e.g. astropy).

Example file:
/last04e/data1/archive/LAST.01.04.01/2024/05/03/proc/005030v0/LAST.01.04.01_20240504.005310.921_clear_OP313_009_001_016_sci_proc_Cat_1.fits

Error:
I get an error when trying to read the hdu containing the header (2), I'm copying below full error log if it can be of any use. It looks like there is a float where there should be an int, somewhere.

File ~/Analysis/pyabscal/lastcatutils.py:27, in LastCatUtils.tables_from_lastcat(self, catfile)
25 hdu = pyfit.open(catfile)
26 last_cat = hdu[1].data
---> 27 info_cat = hdu[2]
29 return last_cat, info_cat

File ~/anaconda3/envs/LAST/lib/python3.9/site-packages/astropy/io/fits/hdu/hdulist.py:379, in HDUList.getitem(self, key)
375 # Originally this used recursion, but hypothetically an HDU with
376 # a very large number of HDUs could blow the stack, so use a loop
377 # instead
378 try:
--> 379 return self._try_while_unread_hdus(
380 super().getitem, self._positive_index_of(key)
381 )
382 except IndexError as e:
383 # Raise a more helpful IndexError if the file was not fully read.
384 if self._read_all:

File ~/anaconda3/envs/LAST/lib/python3.9/site-packages/astropy/io/fits/hdu/hdulist.py:1248, in HDUList._try_while_unread_hdus(self, func, *args, **kwargs)
1246 return func(*args, **kwargs)
1247 except Exception:
-> 1248 if self._read_next_hdu():
1249 continue
1250 else:

File ~/anaconda3/envs/LAST/lib/python3.9/site-packages/astropy/io/fits/hdu/hdulist.py:1290, in HDUList._read_next_hdu(self)
1287 offset = last._data_offset + last._data_size
1288 fileobj.seek(offset, os.SEEK_SET)
-> 1290 hdu = _BaseHDU.readfrom(fileobj, **kwargs)
1291 except EOFError:
1292 self._read_all = True

File ~/anaconda3/envs/LAST/lib/python3.9/site-packages/astropy/io/fits/hdu/base.py:366, in _BaseHDU.readfrom(cls, fileobj, checksum, ignore_missing_end, **kwargs)
360 hdu = cls._readfrom_internal(
361 fileobj, checksum=checksum, ignore_missing_end=ignore_missing_end, **kwargs
362 )
364 # If the checksum had to be checked the data may have already been read
365 # from the file, in which case we don't want to seek relative
--> 366 fileobj.seek(hdu._data_offset + hdu._data_size, os.SEEK_SET)
367 return hdu

File ~/anaconda3/envs/LAST/lib/python3.9/site-packages/astropy/io/fits/file.py:428, in _File.seek(self, offset, whence)
426 if not hasattr(self._file, "seek"):
427 return
--> 428 self._file.seek(offset, whence)
429 pos = self._file.tell()
430 if self.size and pos > self.size:

TypeError: 'float' object cannot be interpreted as an integer

@agioffe
Copy link
Collaborator

agioffe commented May 8, 2024

Can you read this header with some other tools, like HEASARC ftools/fv or ftools/fdump?

@agioffe
Copy link
Collaborator

agioffe commented May 20, 2024

The problematic file is already not there?
ocs@last04e> ls /last04e/data1/archive/LAST.01.04.01/2024/05/03/proc/005030v0/LAST.01.04.01_20240504.005310.921_clear_OP313_009_001_016_sci_proc_Cat_1.fits
ls: cannot access '/last04e/data1/archive/LAST.01.04.01/2024/05/03/proc/005030v0/LAST.01.04.01_20240504.005310.921_clear_OP313_009_001_016_sci_proc_Cat_1.fits': No such file or directory
ocs@last04e> ls /last04e/data1/archive/LAST.01.04.01/2024/05/03/proc/005030v0/
ls: cannot access '/last04e/data1/archive/LAST.01.04.01/2024/05/03/proc/005030v0/': No such file or directory
ocs@last04e> ls /last04e/data1/archive/LAST.01.04.01/2024/05/03/proc/
ocs@last04e>

@agioffe
Copy link
Collaborator

agioffe commented May 20, 2024

With fv or fdump this file is read fine without errors, while when I read it with matlab as
AH = AstroHeader('LAST.01.04.01_20240504.005310.921_clear_OP313_009_001_016_sci_proc_Cat_1.fits',2)
and ask for data types of AH.Data, I am indeed getting doubles where there should be integers:

AH.Data
96×3 cell array
{'XTENSION'} {'BINTABLE' } {'binary table extension'}
{'BITPIX' } {[ 8]} {'8-bit bytes' }
{'NAXIS' } {[ 2]} {'2-dimensional binary…'}
{'NAXIS1' } {[ 320]} {'width of table in by…'}
{'NAXIS2' } {[ 305]} {'number of rows in ta…'}
{'PCOUNT' } {[ 0]} {'size of special data…'}
{'GCOUNT' } {[ 1]} {'one data group (requ…'}
{'TFIELDS' } {[ 40]} {'number of fields in …'}
for i=2:8, class(AH.Data{i,2})
end

ans = 'double'
ans = 'double'
ans = 'double'
ans = 'double'
ans = 'double'
ans = 'double'
ans = 'double'

Matlab is not sensitive to it, but probably, python is, if astropy tells it that some of these keywords should be of a different type.

As far as I understand, a relatively recent novelty that we have introduced is that we use a MEX function to write FITS headers.
For example, for the catalogs we call it from FITS.writeTable1 as

io.fits.mex.mex_fits_table_write_image_header(Header.Header, FileName);

@chentishler , is it possible that the MEX function puts doubles instead of integers?

@agioffe
Copy link
Collaborator

agioffe commented May 22, 2024

It came out that io.fits.mex.mex_fits_table_write_image_header adds a decimal point (".") to all the double values in a header without a decimal point, so there are two ways to solve it:

  1. a local one, when we change the type of NAXIS* and BITPIX keywords to integer in the header just before it is passed to io.fits.mex.mex_fits_table_write_image_header (I have added some lines to FITS.writeTable1 to make sure it works fine, but I did not push it yet)
  2. a global one, when we should somehow check that all the values written to NAXIS* and BITPIX rows of an AstroHeader are of some integer type (uint16 and int8, respectively).

@EranOfek , what is you opinion?

@EranOfek
Copy link
Owner

EranOfek commented May 22, 2024 via email

@agioffe
Copy link
Collaborator

agioffe commented May 22, 2024

How do you see it? We could make a dictionary of FITS header keywords which should take only integer values and implement checks/conversions into FITS.readHeader1, but any later assignment (e.g., when we cut an image and thus change NAXIS1 and NAXIS2) could overwrite them to double again. I am just wondering if it is possible to do it in some general way or we would have to implement explicit checks at any header operation.

@agioffe
Copy link
Collaborator

agioffe commented Jun 9, 2024

Looks like due to the improvements introduced by @chentishler the headers are fine and do not crash astropy anymore.

@agioffe agioffe closed this as completed Jun 9, 2024
@simonegarrappa simonegarrappa reopened this Oct 8, 2024
@simonegarrappa
Copy link
Collaborator Author

Reopening this issue:

I have found an header with a -nan value for the key 'FWHM'. This value is not compatible to FITS standard and makes astropy crash with the following error

VerifyError: Unparsable card (FWHM), fix it first with .verify('fix').

Usually astropy runs by itself this .verify('fix') method but it is not able to fix the header entry in this case.

The catalog is

mnt/marvin/LAST.01.05.04/2024/08/02/proc/012938v0/LAST.01.05.04_20240803.013458.549_clear_4Cp2750_017_001_009_sci_proc_Cat_1.fits

The issue is very rare since I've found this error only after reading > 100k catalogs.

@agioffe
Copy link
Collaborator

agioffe commented Oct 8, 2024

OK, I'll check it out.

@EastEriq
Copy link
Collaborator

EastEriq commented Oct 8, 2024

For my part, I can say that there are many numeric keys in the raw image headers which could happen to get a NaN value in some circumstances. What would you like me to do? One option could be to leave them like that and rely on the sanitisation done by some reader; another may bet to omit writing altogether the keys which are Nan; a third option would be to use a sentinel unphysical value, like -99999, but I'd find it more prone to misinterpretation.

@agioffe
Copy link
Collaborator

agioffe commented Oct 9, 2024

I can see that in the relevant coadd the FWHM is non-NaN (~2.1), will try to reproduce this in the proc image.

@agioffe
Copy link
Collaborator

agioffe commented Oct 9, 2024

@simonegarrappa, thank you for this finding! After some investigation I found that the FWHM function we are employing was used in the pipleline with a non-optimal step parameter. In the case of a very good and sharp FWHM this led no NaN, and in other cases -- to some overestimation of the FWHM value. Now it is fixed and pushed to dev1: in imProc.psf.fwhm the explicit 'Step' parameter is now passed to imUtil.psf.curve_of_growth as Obj(Iobj).PSFData.fwhm('curveArgs',{'Step',0.5});

@agioffe agioffe closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants