Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyserini segfaults during search with ray #1867

Open
jasper-xian opened this issue Apr 22, 2024 · 2 comments
Open

pyserini segfaults during search with ray #1867

jasper-xian opened this issue Apr 22, 2024 · 2 comments

Comments

@jasper-xian
Copy link
Member

Am using pyserini + ray, doing LuceneSearcher.search("some text") after ray.init() but outside of a ray worker and running into a segfault after some non-deterministic number of searches.

Dependencies:

pyserini                  0.22.0
ray                       2.6.1

Code to reproduce:

from pyserini.search import LuceneSearcher
import ray
from tqdm import tqdm

if __name__ == '__main__':
    ray.init()
    searcher = LuceneSearcher.from_prebuilt_index(f"beir-v1.0.0-fiqa.flat")

    for i in tqdm(range(1000)):
        searcher.search('hi there', k=100)

Error Log:

2024-04-22 12:35:29,543 INFO worker.py:1621 -- Started a local Ray instance.
  9%|████████████████▊                                                                                                                                                                          | 90/1000 [00:01<00:08, 101.61it/s]*** SIGSEGV received at time=1713803732 on cpu 37 ***
PC: @     0x7f9c7fa9b5ff  (unknown)  (unknown)
    @     0x7f9cf92aa520  (unknown)  (unknown)
    @     0x7f9c77dd5cc5        120  (unknown)
    @     0x7f9c77dd5fc0        104  (unknown)
    @     0x7f9c77dd60a2        120  (unknown)
    @     0x7f9c77dd5fc0        160  (unknown)
    @     0x7f9c77dcccc9        152  (unknown)
    @     0x7f9c8ffff1bc        256  JavaCalls::call_helper()
    @     0x7f9c9007dba9        368  jni_invoke_nonstatic()
    @     0x7f9c9007e5a4        208  jni_CallObjectMethodA
    @     0x7f9c90960c52  (unknown)  __pyx_f_5jnius_10JavaMethod_call_method
    @     0x56326f638520  (unknown)  (unknown)
[2024-04-22 12:35:32,104 E 1443709 1443709] logging.cc:361: *** SIGSEGV received at time=1713803732 on cpu 37 ***
[2024-04-22 12:35:32,108 E 1443709 1443709] logging.cc:361: PC: @     0x7f9c7fa9b5ff  (unknown)  (unknown)
[2024-04-22 12:35:32,110 E 1443709 1443709] logging.cc:361:     @     0x7f9cf92aa520  (unknown)  (unknown)
[2024-04-22 12:35:32,112 E 1443709 1443709] logging.cc:361:     @     0x7f9c77dd5cc5        120  (unknown)
[2024-04-22 12:35:32,117 E 1443709 1443709] logging.cc:361:     @     0x7f9c77dd5fc0        104  (unknown)
[2024-04-22 12:35:32,121 E 1443709 1443709] logging.cc:361:     @     0x7f9c77dd60a2        120  (unknown)
[2024-04-22 12:35:32,125 E 1443709 1443709] logging.cc:361:     @     0x7f9c77dd5fc0        160  (unknown)
[2024-04-22 12:35:32,129 E 1443709 1443709] logging.cc:361:     @     0x7f9c77dcccc9        152  (unknown)
[2024-04-22 12:35:32,129 E 1443709 1443709] logging.cc:361:     @     0x7f9c8ffff1bc        256  JavaCalls::call_helper()
[2024-04-22 12:35:32,129 E 1443709 1443709] logging.cc:361:     @     0x7f9c9007dba9        368  jni_invoke_nonstatic()
[2024-04-22 12:35:32,129 E 1443709 1443709] logging.cc:361:     @     0x7f9c9007e5a4        208  jni_CallObjectMethodA
[2024-04-22 12:35:32,129 E 1443709 1443709] logging.cc:361:     @     0x7f9c90960c52  (unknown)  __pyx_f_5jnius_10JavaMethod_call_method
[2024-04-22 12:35:32,133 E 1443709 1443709] logging.cc:361:     @     0x56326f638520  (unknown)  (unknown)
Fatal Python error: Segmentation fault

Stack (most recent call first):
  File "/store2/scratch/j5xian/.conda/envs/tmp/lib/python3.10/site-packages/pyserini/search/lucene/_searcher.py", line 152 in search
  File "/store2/scratch/j5xian/tmp/tmp.py", line 10 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, jnius.jnius, scipy._lib._ccallback_c, faiss._swigfaiss_avx512, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, regex._regex, sklearn.__check_build._check_build, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, psutil._psutil_linux, psutil._psutil_posix, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, msgpack._cmsgpack, google._upb._message, setproctitle, ray._raylet, grpc._cython.cygrpc (total: 184)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9cf92fe9fc (sent by kill), pid=1443709, tid=1443709
#
# JRE version: OpenJDK Runtime Environment (11.0.15) (build 11.0.15-internal+0-adhoc..src)
# Java VM: OpenJDK 64-Bit Server VM (11.0.15-internal+0-adhoc..src, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0x969fc]  pthread_kill+0x12c
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /store2/scratch/j5xian/tmp/core.1443709)
#
# An error report file with more information is saved as:
# /store2/scratch/j5xian/tmp/hs_err_pid1443709.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
[failure_signal_handler.cc : 329] RAW: Signal 6 raised at PC=0x7f9cf92fe9fc while already in AbslFailureSignalHandler()
*** SIGABRT received at time=1713803732 on cpu 37 ***
PC: @     0x7f9cf92fe9fc  (unknown)  pthread_kill
    @     0x7f9cf92aa520  (unknown)  (unknown)
[2024-04-22 12:35:32,217 E 1443709 1443709] logging.cc:361: *** SIGABRT received at time=1713803732 on cpu 37 ***
[2024-04-22 12:35:32,217 E 1443709 1443709] logging.cc:361: PC: @     0x7f9cf92fe9fc  (unknown)  pthread_kill
[2024-04-22 12:35:32,217 E 1443709 1443709] logging.cc:361:     @     0x7f9cf92aa520  (unknown)  (unknown)
Fatal Python error: Aborted

Stack (most recent call first):
  File "/store2/scratch/j5xian/.conda/envs/tmp/lib/python3.10/site-packages/pyserini/search/lucene/_searcher.py", line 152 in search
  File "/store2/scratch/j5xian/tmp/tmp.py", line 10 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, jnius.jnius, scipy._lib._ccallback_c, faiss._swigfaiss_avx512, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, regex._regex, sklearn.__check_build._check_build, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, psutil._psutil_linux, psutil._psutil_posix, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, msgpack._cmsgpack, google._upb._message, setproctitle, ray._raylet, grpc._cython.cygrpc (total: 184)
Aborted (core dumped)
@lintool
Copy link
Member

lintool commented Apr 22, 2024

Trace seems to show JDK 11... but we're on JDK 21 now? Dunno if that makes a difference?

@jasper-xian
Copy link
Member Author

Ah this is with pyserini 0.22.0, which was on JDK 11 I think.... I tried this same snippet with the latest pyserini and JDK 21 to the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants