Skip to content

Bug on linux tests and doc build with Pytorch 2.12 #816

@rflamary

Description

@rflamary

Describe the bug

Also see temporary fix by max version on torch #815 I have set max torch version to 2.11 and it woks again but we obviously need to either fix the bug or report it to triton/pytorch devs.

Help needed this one is very challenging. All linux test and documentation fails to build with cryptic segmentation fault. Looking at teh error it seems like i comes from a triton knob . Coming back to pytorch 2.11 seems to fix the error so we will put that until we can fix or report the porblem with pytorch/triton (very hard to reproduce all scripts and tests pass individually but fail when in full folder tests and doc build)

typical error below where tone of torch tests pass and then at one point something breaks:

test/test_ot.py::test_emd_emd2_types_devices[numpy] PASSED               [ 40%]
test/test_ot.py::test_emd_emd2_types_devices[jax] PASSED                 [ 40%]
test/test_ot.py::test_emd_emd2_types_devices[torch] PASSED               [ 40%]
test/test_ot.py::test_emd_emd2_types_devices[tf] PASSED                  [ 40%]
test/test_ot.py::test_emd_emd2_devices_tf PASSED                         [ 40%]
test/test_ot.py::test_emd2_gradients PASSED                              [ 40%]
test/test_ot.py::test_emd_emd2 PASSED                                    [ 40%]
test/test_ot.py::test_omp_emd2 PASSED                                    [ 40%]
test/test_ot.py::test_emd_empty PASSED                                   [ 40%]
test/test_ot.py::test_emd2_multi PASSED                                  [ 40%]
test/test_ot.py::test_lp_barycenter PASSED                               [ 40%]
test/test_ot.py::test_free_support_barycenter PASSED                     [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[numpy] PASSED     [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[jax] PASSED       [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[torch] PASSED     [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[tf] PASSED        [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter PASSED         [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[numpy] PASSED [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[jax] PASSED [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[torch] PASSED [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[tf] PASSED [ 40%]
test/test_ot.py::test_free_support_barycenter_generic_costs PASSED       [ 40%]
Fatal Python error: Segmentation fault

Current thread 0x00007fcccc40fb80 (most recent call first):
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1233 in create_module
  File "<frozen importlib._bootstrap>", line 573 in module_from_spec
  File "<frozen importlib._bootstrap>", line 676 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/knobs.py", line 15 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1232 in _handle_fromlist
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 11 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/runtime/__init__.py", line 1 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/__init__.py", line 8 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/utils/_triton.py", line 10 in has_triton_package
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 2716 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 44 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 54 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 62 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/aot_compile.py", line 17 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1232 in _handle_fromlist
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/__init__.py", line 13 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_compile.py", line 47 in inner
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/optim/optimizer.py", line 405 in __init__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/optim/sgd.py", line 65 in __init__
  File "/home/runner/work/POT/POT/ot/lp/_barycenter_solvers.py", line 687 in ground_bary
  File "/home/runner/work/POT/POT/ot/lp/_barycenter_solvers.py", line 737 in free_support_barycenter_generic_costs
  File "/home/runner/work/POT/POT/test/test_ot.py", line 568 in test_free_support_barycenter_generic_costs_auto_ground_bary
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/python.py", line 1720 in runtest
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 353 in from_call
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/main.py", line 372 in _main
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/main.py", line 318 in wrap_session
  ...

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cyutility, scipy._cyutility, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, charset_normalizer.md, charset_normalizer.cd, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._pcg64, numpy.random._generator, numpy.random._mt19937, numpy.random._philox, numpy.random._sfc64, numpy.random.mtrand, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._batched_linalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_schur_sqrtm, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._special_ufuncs, scipy.special._gufuncs, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, cuda.bindings._bindings.cyruntime_ptds, cuda.bindings._bindings.cyruntime, cuda.bindings.cyruntime, cuda.bindings.runtime, jaxlib.cpu_feature_guard, google._upb._message, requests.packages.charset_normalizer.md, requests.packages.chardet.md, requests.packages.charset_normalizer.cd, requests.packages.chardet.cd, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._npystrings, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5o, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5l, h5py._selector, PIL._imaging, kiwisolver._cext, sklearn.__check_build._check_build, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpacklib, scipy.sparse.linalg._propack, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._hausdorff, scipy.spatial._distance_wrap, scipy.spatial.transform._rotation_cy, scipy.spatial.transform._rigid_transform_cy, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._slsqplib, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, scipy.stats._qmvnt_cy, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn._cyutility, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, ot.lp.emd_wrap, cvxopt.base, cvxopt.blas, cvxopt.lapack, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._fast_dict, sklearn.cluster._hierarchical_fast, sklearn.cluster._k_means_common, sklearn.cluster._k_means_elkan, sklearn.cluster._k_means_lloyd, sklearn.cluster._k_means_minibatch, sklearn.cluster._dbscan_inner, sklearn.neighbors._partition_nodes, sklearn.neighbors._ball_tree, sklearn.neighbors._kd_tree, sklearn.utils.arrayfuncs, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.linear_model._cd_fast, _loss, sklearn._loss._loss, sklearn.linear_model._sag_fast, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.decomposition._online_lda_fast, sklearn.decomposition._cdnmf_fast, sklearn.cluster._hdbscan._tree, sklearn.cluster._hdbscan._linkage, sklearn.cluster._hdbscan._reachability, sklearn._isotonic, sklearn.tree._utils, sklearn.tree._tree, sklearn.tree._partitioner, sklearn.tree._splitter, sklearn.tree._criterion, sklearn.neighbors._quad_tree, sklearn.manifold._barnes_hut_tsne, sklearn.manifold._utils, ot.partial.partial_cython, _cvxcore, scs._scs_direct, cvxopt.glpk, markupsafe._speedups (total: 213)
/home/runner/work/_temp/56add7cf-2359-408c-a37d-a54f563bc838.sh: line 1: 15989 Segmentation fault      (core dumped) python -m pytest --durations=20 -v test/ ot/ --doctest-modules --color=yes --cov=./ --cov-report=xml
test/test_ot.py::test_free_support_barycenter_generic_costs_auto_ground_bary 

To Reproduce

Steps to reproduce the behavior:

  1. ...

Screenshots

Code sample

Expected behavior

Environment (please complete the following information):

  • OS (e.g. MacOS, Windows, Linux):
  • Python version:
  • How was POT installed (source, pip, conda):
  • Build command you used (if compiling from source):
  • Only for GPU related bugs:
    • CUDA version:
    • GPU models and configuration:
    • Any other relevant information:

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions