Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap corruption on Python when torch is imported before juliacall, but not the reverse #215

Closed
MilesCranmer opened this issue Sep 5, 2022 · 6 comments

Comments

@MilesCranmer
Copy link
Contributor

MilesCranmer commented Sep 5, 2022

Here is my system information:

  • Python 3.8.9
  • Julia 1.8.0
  • macOS 12
  • M1 chip (ARM64)
  • Python from homebrew (not conda)

I have not tested this on other systems.

Here is the trigger:

>>> import torch
>>> from juliacall import Main as jl

and the error:

Python(65251,0x104cf8580) malloc: Heap corruption detected, free list is damaged at 0x600001c17280
*** Incorrect guard value: 1903002876
Python(65251,0x104cf8580) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    65251 abort      ipython

However, I can run the following just fine:

>>> from juliacall import Main as jl
>>> import torch

Here are some related issues: JuliaPy/pyjulia#125, pytorch/pytorch#78829. Particularly, check out the comment from @tttc3: pytorch/pytorch#78829 (comment).

@cjdoris
Copy link
Collaborator

cjdoris commented Sep 6, 2022

Oof, fun error!

If the root cause is the same as in the comment you linked (which looks highly plausible) then I doubt it can be fixed from JuliaCall. It could/should be documented in a troubleshooting/FAQ section in the docs. We could maybe add a warning via an import hook, but that seems a bit much.

@MilesCranmer
Copy link
Contributor Author

Thanks - is there a way that solution 1 could be used here? (is this similar to how JuliaCall works?) i.e.,

Use RTLD_DEEPBIND when loading _C.cpython-310-x86_64-linux-gnu.so. This ensures that pytorch will look for the symbol within libtorch_cpu.so before looking at the globally imported ones from libjulia-internals.so. However, I do not know if this would have some other unintended consequences?

Unfortunately the solutions 2 and 3 don't seem to help me (maybe because I'm on a mac).

@cjdoris
Copy link
Collaborator

cjdoris commented Sep 6, 2022

I'm not sure what _C is, is it referring to C bindings for the Torch library? I assume it would require a change to PyTorch to use DEEPBIND.

@MilesCranmer
Copy link
Contributor Author

Ah, yes, I think you are right... Thanks!

@MilesCranmer
Copy link
Contributor Author

MilesCranmer commented Sep 6, 2022

Just added a warning for this to PySR until it gets solved - feel free to do something similar in PythonCall.jl! I think this will prevent users getting discouraged, as a random segfault when starting julia would be a mystery - especially if torch is imported by a package rather than directly imported.

def check_for_conflicting_libraries():  # pragma: no cover
    """Check whether there are conflicting modules, and display warnings."""
    # See https://github.com/pytorch/pytorch/issues/78829: importing
    # pytorch before running `pysr.fit` causes a segfault.
    torch_is_loaded = "torch" in sys.modules
    if torch_is_loaded:
        warnings.warn(
            "`torch` was loaded before the Julia instance started. "
            "This may cause a segfault when running `PySRRegressor.fit`. "
            "To avoid this, please run `pysr.julia_helpers.init_julia()` *before* "
            "importing `torch`. "
            "For updates, see https://github.com/pytorch/pytorch/issues/78829"
        )

This simple sys.modules check seems to be enough:

sys.modules contains all modules used anywhere in the current instance of the interpreter and so shows up if imported in any other Python module.

@cjdoris
Copy link
Collaborator

cjdoris commented Sep 8, 2022

Good idea. I'm documenting it in a new Troubleshooting section in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants