Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation error for torch==2.2.1 on MacOs #121101

Open
CloseChoice opened this issue Mar 3, 2024 · 3 comments
Open

Segmentation error for torch==2.2.1 on MacOs #121101

CloseChoice opened this issue Mar 3, 2024 · 3 comments
Labels
module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: intel Specific to x86 architecture module: macos Mac OS related issues module: openmp Related to OpenMP (omp) support in PyTorch needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@CloseChoice
Copy link

CloseChoice commented Mar 3, 2024

馃悰 Describe the bug

At shap, we have run into problems with our CI jobs on macOs, e.g. see here. I tracked this down to an issue with torch==2.2.1.

Here is code to reproduce the issue (this works on torch==2.2.0):

import time

import torch
from sklearn.datasets import fetch_california_housing


def test_something():
    X, y = fetch_california_housing(return_X_y=True)
    torch.tensor(X)
    time.sleep(3)

(execute with python -m pytest <filename>)

Stacktrace:

bash-3.2$ python -m pytest tests/explainers/test_segfault_minimal_example2.py                                                                                                                               
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.11.8, pytest-8.1.0, pluggy-1.4.0
Matplotlib: 3.8.3
Freetype: 2.6.1
rootdir: /Users/runner/work/shap/shap
configfile: pyproject.toml
plugins: cov-4.1.0, mpl-0.17.0
collected 1 item                                                                                                                                                                                           

tests/explainers/test_segfault_minimal_example2.py Fatal Python error: Segmentation fault

Thread 0x00000001140ad600 (most recent call first):
  File "/Users/runner/work/shap/shap/tests/explainers/test_segfault_minimal_example2.py", line 8 in test_something
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 1769 in runtestSegmentation fault: 11

Versions

PyTorch version: 2.2.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.7.3 (x86_64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: version 3.28.3
Libc version: N/A

Python version: 3.11.8 (v3.11.8:db85d51d3e, Feb  6 2024, 18:02:37) [Clang 13.0.0 (clang-1300.0.29.30)] (64-bit runtime)
Python platform: macOS-12.7.3-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.1
[pip3] torchvision==0.17.0
[conda] No relevant packages

cc @malfet @albanD @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@malfet malfet added module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: macos Mac OS related issues triage review labels Mar 5, 2024
@malfet
Copy link
Contributor

malfet commented Mar 5, 2024

Is this reproducible if one uses Apple Silicon M1 runners? (Though Torch-2.2 is the last release to support Intel Macs per #114602 )

At least I can not reproduce it on M1, trying it in x86 Rosetta mode.
Can not reproduce it in Rosetta environment either:

arch -arch x86_64 "/Applications/Python 3.11//IDLE.app/Contents/MacOS/Python" -mpytest ~/test/bug-121101.py

Nor can I repro in GitHub CI: https://github.com/malfet/deleteme/actions/runs/8150940508/job/22278030319?pr=79

@malfet malfet added needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: intel Specific to x86 architecture and removed triage review labels Mar 5, 2024
@connortann
Copy link

connortann commented Mar 12, 2024

I can reproduce in GitHub CI (over in the shap repo) with a slightly different setup:

I'll see if I can identify what the relevant difference is between that job and your run above- perhaps it's related to having different dependencies installed.

@connortann
Copy link

connortann commented Mar 12, 2024

I've done a bit of debugging with GitHub CI, and I think I have a found an environment for the minimal reproducible example above.

The test snippet above passes in an environment created with pip install pytest torch scikit-learn, but fails if the env also includes lightgbm.

The examples below ran on GitHub Actions with macos-latest, python=3.11.8, torch 2.2.1.

Passing run

Example passing run: https://github.com/shap/shap/actions/runs/8248044359/job/22557508223
Output of pip list:

Package           Version
----------------- -----------
certifi           2024.2.2
filelock          3.13.1
fsspec            2024.2.0
iniconfig         2.0.0
Jinja2            3.1.3
joblib            1.3.2
MarkupSafe        2.1.5
mpmath            1.3.0
networkx          3.2.1
numpy             1.26.4
packaging         24.0
pip               24.0
pluggy            1.4.0
pytest            8.1.1
scikit-learn      1.4.1.post1
scipy             1.12.0
setuptools        65.5.0
sympy             1.12
threadpoolctl     3.3.0
torch             2.2.1
typing_extensions 4.10.0

Failing run

Example failing run: https://github.com/shap/shap/actions/runs/8248015803/job/22557423230
Output of pip list (identical apart from lightgbm):

Package           Version
----------------- -----------
certifi           2024.2.2
filelock          3.13.1
fsspec            2024.2.0
iniconfig         2.0.0
Jinja2            3.1.3
joblib            1.3.2
lightgbm          4.3.0
MarkupSafe        2.1.5
mpmath            1.3.0
networkx          3.2.1
numpy             1.26.4
packaging         24.0
pip               24.0
pluggy            1.4.0
pytest            8.1.1
scikit-learn      1.4.1.post1
scipy             1.12.0
setuptools        65.5.0
sympy             1.12
threadpoolctl     3.3.0
torch             2.2.1
typing_extensions 4.10.0

@malfet malfet added the module: openmp Related to OpenMP (omp) support in PyTorch label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: intel Specific to x86 architecture module: macos Mac OS related issues module: openmp Related to OpenMP (omp) support in PyTorch needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants