Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed #35237

Open
jleibs opened this issue Apr 19, 2023 · 4 comments

Comments

@jleibs
Copy link

jleibs commented Apr 19, 2023

Describe the bug, including details regarding any error messages, version, and platform.

This is a relatively straightforward problem in which a thread that is continuing to run during shutdown tries to register an atexit handler.

This only happens if the pandas library is installed causing the associated shims to be used. This happens regardless of whether or not pandas is in-use by the application.

The problem can be avoided by making sure to join all theads before main exits, but this is not generally required by python so should be considered a bug.

Context to reproduce:

requirements.txt

pandas==2.0.0
pyarrow==11.0.0

main.py

import threading
import pyarrow


def use_pyarrow() -> None:
    table = pyarrow.table({"a": [1, 2, 3]})


def main() -> None:
    t = threading.Thread(target=use_pyarrow, args=())
    t.start()

if __name__ == "__main__":
    main()

Run:

$ python main.py 
Traceback (most recent call last):
  File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
  File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
    import concurrent.futures.thread  # noqa
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown
Exception ignored in: 'pyarrow.lib._PandasAPIShim._have_pandas_internal'
Traceback (most recent call last):
  File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
  File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
    import concurrent.futures.thread  # noqa
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown

Component(s)

Python

@westonpace
Copy link
Member

As a workaround, does it work if you add:

from pyarrow.pandas_compat import _pandas_api

at the top level of your program (before main shuts down)?

This seems to be a transitive consequence of python/cpython#86813 (comment)

@jorisvandenbossche jorisvandenbossche changed the title RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed Apr 22, 2023
@jorisvandenbossche jorisvandenbossche added this to the 13.0.0 milestone Apr 22, 2023
@raulcd raulcd modified the milestones: 13.0.0, 14.0.0 Jul 7, 2023
@jorisvandenbossche jorisvandenbossche modified the milestones: 14.0.0, 15.0.0 Oct 6, 2023
@AlenkaF
Copy link
Member

AlenkaF commented Dec 4, 2023

Just to note, I was able to reproduce the error with pandas==2.1.1 and dev version of pyarrow (15.0.0.dev). The issue doesn't happen if I add the import as Weston suggested at the top level of the program.

@jorisvandenbossche
Copy link
Member

cc @pitrou

@pitrou
Copy link
Member

pitrou commented Dec 13, 2023

This should be trivial to workaround in PyArrow.

@raulcd raulcd modified the milestones: 15.0.0, 16.0.0 Jan 8, 2024
@raulcd raulcd modified the milestones: 16.0.0, 17.0.0 Apr 8, 2024
@raulcd raulcd modified the milestones: 17.0.0, 18.0.0 Jun 28, 2024
@AlenkaF AlenkaF removed this from the 18.0.0 milestone Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants