Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault with error report in FITS compression #15489

Merged
merged 3 commits into from Oct 19, 2023
Merged

Conversation

saimn
Copy link
Contributor

@saimn saimn commented Oct 17, 2023

ffpmsg requires the GIL.

Fix #15482

Description

This pull request is to address ...

Fixes #

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@saimn saimn added the io.fits label Oct 17, 2023
@saimn saimn added this to the v5.3.5 milestone Oct 17, 2023
@github-actions
Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@pllim pllim added the Bug label Oct 17, 2023
PyErr_SetString(CfitsioException, err_message);
PyGILState_Release(gstate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assuming that we are always releasing the GIL when calling out to cfitsio? If for some reason that were to change would it explode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void PyGILState_Release(PyGILState_STATE)
Part of the Stable ABI.
Release any resources previously acquired. After this call, Python’s state will be the same as it was prior to the corresponding PyGILState_Ensure() call (but generally this state will be unknown to the caller, hence the use of the GILState API).

So should be fine ?
https://docs.python.org/3/c-api/init.html#c.PyGILState_Release
But this will not work with subinterpreters (https://docs.python.org/3/c-api/init.html#non-python-created-threads).
The other option would be to stop releasing the GIL before calling cfitsio, not sure what the impact is ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not releasing the GIL would mean no longer being able to do multithreaded decompression of tiles which does work at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the fix here is fine for now, but we may have to reconsider when subinterpreters become a thing (planned for 3.13) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to call Python code from these threads (often this will be part of a callback API provided by the aforementioned third-party library), you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure.

Nope, I am out lol too much C 🤣

@Cadair
Copy link
Member

Cadair commented Oct 17, 2023

Don't suppose you have any bright ideas on how we could test this? I am pretty sure I ran screaming from the idea of testing the error stuff when I originally wrote this.

@saimn
Copy link
Contributor Author

saimn commented Oct 17, 2023

Not really, I guess we could create a file that would trigger the warnings/errors but not sure how.

@pllim pllim added the 💤 backport-v5.3.x on-merge: backport to v5.3.x label Oct 18, 2023
@pllim
Copy link
Member

pllim commented Oct 18, 2023

I cannot tell if more fixes are needed or not based on Simon's comment in that issue, so I'll let @saimn decide when to merge. Thanks, all!

@astrofrog
Copy link
Member

I think I have an idea for a regression test - please don't merge yet so I can double check.

@astrofrog
Copy link
Member

I have come up with a regression test that causes the segfault and have pushed it to this PR. I have also temporarily reverted @saimn's change so that we can double check that a segfault does indeed happen in the CI.

@astrofrog
Copy link
Member

Ok so confirming this causes a segfault:

Screenshot 2023-10-19 at 13 45 49

I'll remove the revert commit.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, but @saimn please check the test I added!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magical!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧙

@saimn
Copy link
Contributor Author

saimn commented Oct 19, 2023

Excellent, thanks @astrofrog !
Let's merge this and we can address the remaining issues in #15482 later.

@saimn saimn merged commit d1f92fe into astropy:main Oct 19, 2023
25 of 26 checks passed
@saimn saimn deleted the fits-15482 branch October 19, 2023 16:59
@lumberbot-app

This comment was marked as resolved.

pllim added a commit that referenced this pull request Oct 19, 2023
Backport PR #15489 on branch v5.3.x (Fix segfault with error report in FITS compression)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug io.fits 💤 backport-v5.3.x on-merge: backport to v5.3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Version 5.3 fails to read large CompImageHDUs
4 participants