Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FITS_rec._scale_back(): Incorrectly checks heapsize for TFORMn='Q' #14810

Merged
merged 1 commit into from May 16, 2023

Conversation

BillCleveland-USRA
Copy link
Contributor

Description

Package: astropy.io.fits
Module: fitsrec.py

If a table is created with a column using TFORMn="Q(...)" and the heapsize becomes greater than 2**31, a ValueError that recommends the user to use the "Q" format is incorrectly raised.

The problem originates in line 1133 of fitsrec.py within the "main" branch:

                if heapsize >= 2**31:
                    raise ValueError(
                        "The heapsize limit for 'P' format has been reached. "
                        "Please consider using the 'Q' format for your file."
                    )

Since _FormatQ is a subclass of _FormatP, the condition at line 1110 if isinstance(recformat, _FormatP): is True for both types. The above heapsize check is performed when 1110 condition is True.

The fix is to modify line 1133 of fitsrec.py to:

              if type(recformat) == _FormatP and heapsize >= 2**31:
                    raise ValueError(
                        "The heapsize limit for 'P' format has been reached. "
                        "Please consider using the 'Q' format for your file."
                    )

This causes the heapsize check to be only performed when recformat's type is _FormatP.

Fixes #14808

@github-actions
Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see "When to rebase and squash commits".
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@pllim pllim added this to the v5.0.7 milestone May 11, 2023
@pllim pllim added Bug 💤 backport-v5.0.x on-merge: backport to v5.0.x 💤 backport-v5.2.x on-merge: backport to v5.2.x 💤 backport-v5.3.x on-merge: backport to v5.3.x labels May 11, 2023
@pllim
Copy link
Member

pllim commented May 11, 2023

Thank you! Since this is a bug fix, it would also need a change log in https://github.com/astropy/astropy/tree/main/docs/changes/io.fits , following instructions laid out by https://github.com/astropy/astropy/blob/main/docs/changes/README.rst .

Usually for bug fix, we also recommend adding a regression test though with only enough data to reproduce the problem. Not sure if it is easy to do in your case or not, so I'll leave that to the FITS maintainers to decide.

@pllim pllim requested a review from astrofrog May 11, 2023 01:51
@cmarmo
Copy link
Member

cmarmo commented May 11, 2023

Hello!
If I may step in ... I plead guilty for introducing the bug in the code... thanks @BillCleveland-USRA for fixing it!
The related test is here:

@pytest.mark.skipif(sys.maxsize < 2**32, reason="requires 64-bit system")
@pytest.mark.skipif(sys.platform == "win32", reason="Cannot test on Windows")
@pytest.mark.hugemem
def test_heapsize_P_limit(self):
"""
Regression test for https://github.com/astropy/astropy/issues/10812
Check if the error is raised when the heap size is bigger than what can be
indexed with a 32 bit signed int.
"""
# a matrix with variable length array elements is created
nelem = 2**28
matrix = np.zeros(1, dtype=np.object_)
matrix[0] = np.arange(0.0, float(nelem + 1))
col = fits.Column(name="MATRIX", format=f"PD({nelem})", unit="", array=matrix)
t = fits.BinTableHDU.from_columns([col])
t.name = "MATRIX"
with pytest.raises(
ValueError, match="Please consider using the 'Q' format for your file."
):
t.writeto(self.temp("matrix.fits"))

I think it would be nice to check the fact that for Q format the error is not raised.

@pllim
Copy link
Member

pllim commented May 11, 2023

@cmarmo , just for the record, introduced in #13429 ? In that case, existing milestone still stands. Thanks for the clarification!

@BillCleveland-USRA , at some point, the commits will need to be squashed, but maybe after adding that test? Thanks!

@cmarmo
Copy link
Member

cmarmo commented May 11, 2023

just for the record, introduced in #13429 ?

Yes, indeed. 🙁

@BillCleveland-USRA
Copy link
Contributor Author

@cmarmo When I modified 5.2.2 and used it, I no longer experienced the error while processing large data files.

I'm not sure I have time to come up with a regression test for it, but I provided the following instructions on how to test this with our software:

Download and install the latest FTOOLS from the HEASARC
https://heasarc.gsfc.nasa.gov/lheasoft/download.html

IXPE python tools uses Astropy to read and process FITS files.

Download the data files
https://heasarc.gsfc.nasa.gov/FTP/ixpe/data/obs/02/02001099/event_l1/ixpe02001001_det1_evt1_v03.fits.gz
https://heasarc.gsfc.nasa.gov/FTP/ixpe/data/obs/02/02001099/hk/ixpe02001001_det1_att_v01.fits.gz

Run our IXPE utility:

ixpedet2j2000.py sc=y clobber=y \
  infile=ixpe02001001_det1_evt1_v03.fits.gz \
  attitude=ixpe02001001_det1_att_v01.fits.gz \
  outfile=det2j2000_out_1.fits

@cmarmo
Copy link
Member

cmarmo commented May 11, 2023

I'm not sure I have time to come up with a regression test for it,

You might want to reuse my test, modifying the type of the column

import numpy as np
from astropy.io import fits

# a matrix with variable length array elements is created
nelem = 2**28
matrix = np.zeros(1, dtype=np.object_)
matrix[0] = np.arange(0.0, float(nelem + 1))

col = fits.Column(name="MATRIX", format=f"QD({nelem})", unit="", array=matrix)

t = fits.BinTableHDU.from_columns([col])
t.name = "MATRIX"

t.writeto("matrix.fits", overwrite=True)

this code throws the error and it shouldn't.

@BillCleveland-USRA
Copy link
Contributor Author

Sure thing. I'll push an update later today.

Added condition type(recformat) == _FormatP to mitigate the issue cause by _FormatQ is a subclass of _FormatP.
Added regress test for Issue 14808.
Added "what's new" file for pull 14810.
@BillCleveland-USRA
Copy link
Contributor Author

I added the regression test, and confirmed its validity by:

  • Removing the fix, running the test, and confirming that it fails due to the ValueError being raised.
  • Reinserting the fix, rerunning the test, and confirming it now passes thanks to the ValueError not being raised.

I believe I've done all the required tasks:

  • Created an issue
  • Created a branch named for the issue
  • Modified the code to fix the issue
  • Added the regression test to verify the issue has been fixed
  • Created a pull request that fixes the issue
  • Added a "what's new" file for the pull request
  • Squashed all the commits into a single one.

Thanks for your patience while I learn the process. I'm ready to contribute more in the future.

Copy link
Contributor

@saimn saimn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @BillCleveland-USRA for the PR (and @cmarmo for the review :)).

@saimn saimn merged commit 5e73a25 into astropy:main May 16, 2023
25 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request May 16, 2023
meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request May 16, 2023
meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request May 16, 2023
saimn added a commit that referenced this pull request May 16, 2023
…810-on-v5.2.x

Backport PR #14810 on branch v5.2.x (FITS_rec._scale_back(): Incorrectly checks heapsize for TFORMn='Q')
saimn added a commit that referenced this pull request May 16, 2023
…810-on-v5.3.x

Backport PR #14810 on branch v5.3.x (FITS_rec._scale_back(): Incorrectly checks heapsize for TFORMn='Q')
saimn added a commit that referenced this pull request May 16, 2023
…810-on-v5.0.x

Backport PR #14810 on branch v5.0.x (FITS_rec._scale_back(): Incorrectly checks heapsize for TFORMn='Q')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug io.fits 💤 backport-v5.0.x on-merge: backport to v5.0.x 💤 backport-v5.2.x on-merge: backport to v5.2.x 💤 backport-v5.3.x on-merge: backport to v5.3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fits_rec._scale_back() incorrectly warns about heapsize for 64-bit array (TFORM = "Q()")
4 participants