New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Sporadic crashes in arrow-dataset-scanner-test #20442
Comments
Antoine Pitrou / @pitrou: |
Percy Camilo Triveño Aucahuasi / @aucahuasi: I used docker buildx on a Macbook Pro M1. These were the cmake args: cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/someprefix -DCMAKE_INSTALL_LIBDIR=lib -DARROW_CUDA=OFF -DARROW_NO_DEPRECATED_API=OFF -DARROW_BUILD_STATIC=OFF -DARROW_INSTALL_NAME_RPATH=ON -DARROW_USE_LD_GOLD=ON -DARROW_USE_CCACHE=ON -DARROW_USE_ASAN=OFF -DARROW_USE_TSAN=OFF -DARROW_USE_UBSAN=OFF -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_TESTS=ON -DARROW_TEST_MEMCHECK=OFF -DARROW_ENABLE_TIMING_TESTS=ON -DARROW_BUILD_INTEGRATION=ON -DARROW_BUILD_BENCHMARKS=OFF -DARROW_BUILD_BENCHMARKS_REFERENCE=OFF -DARROW_BUILD_OPENMP_BENCHMARKS=OFF -DARROW_BUILD_DETAILED_BENCHMARKS=OFF -DARROW_TEST_LINKAGE=shared -DARROW_FUZZING=OFF -DARROW_LARGE_MEMORY_TESTS=OFF -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DPARQUET_BUILD_EXAMPLES=ON -DPARQUET_BUILD_EXECUTABLES=OFF -DARROW_ORC=OFF -DARROW_PYTHON=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=OFF -DARROW_JEMALLOC_USE_SHARED=ON -DARROW_JSON=ON -DARROW_S3=OFF -DARROW_HDFS=OFF -DARROW_GCS=OFF -DARROW_FLIGHT=OFF -DARROW_GANDIVA=OFF -DARROW_PLASMA=OFF -DARROW_SUBSTRAIT=OFF -DARROW_BUILD_UTILITIES=OFF -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_USE_CCACHE=ON -DCMAKE_UNITY_BUILD=OFF -DARROW_USE_PRECOMPILED_HEADERS=OFF -DBUILD_WARNING_LEVEL=PRODUCTION -DARROW_WITH_OPENTELEMETRY=OFF -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_THRIFT=OFF -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DGTest_SOURCE=BUNDLED .. Both cases printed at the end: [----------] Global test environment tear-down Tomorrow I'll try to explore this issue on a windows machine. |
Antoine Pitrou / @pitrou: First run bash inside the container: ARCH=i386 DEBIAN=11 archery docker run -e ARROW_S3=OFF -e ARROW_GANDIVA=OFF debian-cpp bash Then under bash: # set +e
# /arrow/ci/scripts/cpp_build.sh /arrow /build
# taskset -c 0,2 /build/cpp/debug/arrow-dataset-scanner-test |
Antoine Pitrou / @pitrou: |
Antoine Pitrou / @pitrou: |
Raúl Cumplido / @raulcd: |
That is more or less intentional. The goal of the test is to have every batch delivered by its own thread for maximum chaos. The test itself has 16 fragments, each with 128 batches so I think 2048 threads will be created during the lifetime of the test. |
Antoine Pitrou / @pitrou: |
Antoine Pitrou / @pitrou: |
Percy Camilo Triveño Aucahuasi / @aucahuasi: Yesterday I managed to reproduce this issue using Windows (msys2-mingw32) and I got this stacktrace: 1 ?? 0xfeeefeee
2 std::__uniq_ptr_impl<void, void ( *)(void *)>::reset unique_ptr.h 203 0x15b7e36
3 std::__uniq_ptr_impl<void, void ( *)(void *)>::operator= unique_ptr.h 183 0x15b7f1f
4 std::__uniq_ptr_data<void, void ( *)(void *), true, true>::operator= unique_ptr.h 235 0x15acebd
5 std::unique_ptr<void, void ( *)(void *)>::operator= unique_ptr.h 406 0x14d884d
6 arrow::Future<arrow::internal::Empty>::SetResult future.h 650 0x113578a
7 arrow::Future<arrow::internal::Empty>::DoMarkFinished future.h 655 0x1134713
8 arrow::Future<arrow::internal::Empty>::MarkFinished<arrow::internal::Empty, void> future.h 409 0x1134699
9 operator() gtest_util.cc 783 0x332fb6
10 std::__invoke_impl<void, arrow::SleepABitAsync()::<lambda()>>(std::__invoke_other, struct {...} &&) invoke.h 61 0x338f42
11 std::__invoke<arrow::SleepABitAsync()::<lambda()>>(struct {...} &&) invoke.h 96 0x338eca
12 std::thread::_Invoker<std::tuple<arrow::SleepABitAsync()::<lambda()>>>::_M_invoke<0>(std::_Index_tuple<0>) std_thread.h 252 0x338e5a
13 std::thread::_Invoker<std::tuple<arrow::SleepABitAsync()::<lambda()>>>::operator()(void) std_thread.h 259 0x338e14
14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::SleepABitAsync()::<lambda()>>>>::_M_run(void) std_thread.h 210 0x338de0
15 ?? 0x6b6bfee0
16 ?? 0x6b8d4e7d
17 msvcrt!_beginthreadex 0x75716cff
18 msvcrt!_endthreadex 0x75716dc1
19 KERNEL32!BaseThreadInitThunk 0x753dfa29
20 ntdll!RtlGetAppContainerNamedObjectPath 0x77307b5e
... <More> It seems there was a segmentation fault with the status that we were trying to use for marking the future as finished; sometimes I got this message as well: Running main() from C:/M/mingw-w64-gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
Note: Google Test filter = BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BasicNewScannerTests/TestScannerBase
[ RUN ] BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b
C:/arrow/cpp/src/arrow/util/future.ccC:/arrow/cpp/src/arrow/util/future.cc:125: Check failed: !IsFutureFinished(state_) :125: Check failed: !IsFutureFinished(state_) Future already marked finished
FPress <RETURN> to close this window... Or this other one too: Running main() from C:/M/mingw-w64-gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
Note: Google Test filter = BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BasicNewScannerTests/TestScannerBase
[ RUN ] BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b
C:/arrow/cpp/src/arrow/dataset/scanner_test.cc:538: Failure
Value of: _fut138.Wait(::arrow::kDefaultAssertFinishesWaitSeconds)
Actual: false
Expected: true
C:/arrow/cpp/src/arrow/dataset/scanner_test.cc:538: Failure
Failed
Future did not finish in a timely fashion |
Antoine Pitrou / @pitrou: |
Percy Camilo Triveño Aucahuasi / @aucahuasi: cmake -G "MSYS Makefiles" -DARROW_BUILD_TESTS=ON -DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_DATASET=ON -DARROW_COMPUTE=ON -DCMAKE_BUILD_TYPE="Debug" -DBUILD_WARNING_LEVEL="PRODUCTION" -DCMAKE_CXX_STANDARD=17 .. |
Percy Camilo Triveño Aucahuasi / @aucahuasi: Running main() from C:/M/mingw-w64-gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
Note: Google Test filter = BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BasicNewScannerTests/TestScannerBase
[ RUN ] BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b
[ OK ] BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b (3085 ms)
[----------] 1 test from BasicNewScannerTests/TestScannerBase (3096 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (3112 ms total)
[ PASSED ] 1 test.
Press <RETURN> to close this window... Also, when I run all the tests (without --gtest_filter="BasicNewScannerTests/TestScannerBase.ScanOrdered/3slow16f128b*" ) I cannot reproduce the issue either. So it seems that the stacktrace and outputs I sent in the previous comment were related to the original issue in this ticket. |
Antoine Pitrou / @pitrou: |
Seen e.g. on this PR:
https://github.com/apache/arrow/actions/runs/3181284056/jobs/5185859816#step:11:255
Reporter: Antoine Pitrou / @pitrou
Assignee: Antoine Pitrou / @pitrou
PRs and other links:
Note: This issue was originally created as ARROW-17927. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: