Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] cpp-micro benchmarks are failing on mac arm machine #20477

Closed
asfimport opened this issue Oct 27, 2022 · 5 comments
Closed

[C++] cpp-micro benchmarks are failing on mac arm machine #20477

asfimport opened this issue Oct 27, 2022 · 5 comments

Comments

@asfimport
Copy link
Collaborator

Reporter: Elena Henderson / @ElenaHenderson
Assignee: Weston Pace / @westonpace

PRs and other links:

Note: This issue was originally created as ARROW-18183. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Elena Henderson / @ElenaHenderson:
cpp-micro benchmarks started failing on test-mac-arm since this change: ec579df

First failed build: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1628

 

Steps to reproduce:

  1. Run benchmarks for commit 89c0214, which was made right before 
    ec579df
cd ~
rm -rf arrow
git clone https://github.com/apache/arrow.git
cd arrow
git checkout 89c0214fa43f8d1bf2e19e3bae0fc3009df51e15
conda create -y -n arrow-commit -c conda-forge \
  --file ci/conda_env_unix.txt \
  --file ci/conda_env_cpp.txt \
  --file ci/conda_env_python.txt \
  compilers \
  python=3.8 \
  pandas
conda activate arrow-commit
pip install -r python/requirements-build.txt -r python/requirements-test.txt
source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars
export RANLIB=`which $RANLIB`
export AR=`which $AR`
export ARROW_JEMALLOC=OFF
pip install -e dev/archery
archery benchmark run

 

 

Note benchmarks are NOT failing!

 

  1. Run benchmarks for ec579df when cpp-micro benchmarks started failing:

 

cd ~
rm -rf arrow
git clone https://github.com/apache/arrow.git
cd arrow
git checkout ec579df631deaa8f6186208ed2a4ebec00581dfa
conda create -y -n arrow-commit -c conda-forge \
  --file ci/conda_env_unix.txt \
  --file ci/conda_env_cpp.txt \
  --file ci/conda_env_python.txt \
  compilers \
  python=3.8 \
  pandas
conda activate arrow-commit
pip install -r python/requirements-build.txt -r python/requirements-test.txt
source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars
export RANLIB=`which $RANLIB`
export AR=`which $AR`
export ARROW_JEMALLOC=OFF
pip install -e dev/archery
archery benchmark run

 

Note benchmarks stop running:

 

...
CoalesceScalarStringBench/6                 6174977 ns      6174982 ns          111 bytes_per_second=664.007M/s items_per_second=169.81M/s length=1048.58k null%=99 num_args=2
ChooseBench64/1048576/0                     6787276 ns      6787272 ns          103 bytes_per_second=1.15105G/s items_per_second=154.492M/s
ChooseBench64/1048576/99                    6722603 ns      6722606 ns          104 bytes_per_second=1.16201G/s items_per_second=155.963M/s
/Users/voltrondata/arrow/cpp/src/arrow/memory_pool.cc:113: Unsupported backend 'mimalloc' specified in ARROW_DEFAULT_MEMORY_POOL (supported backends are 'system')
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2022-10-27T18:58:59-04:00
Running /var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/arrow-archery-18pv4wou/WORKSPACE/build/release/arrow-dataset-scanner-benchmark
Run on (8 X 24.2407 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 15.79, 15.29, 10.64
---------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------------------
MinimalEndToEndBench/num_batches:1000/batch_size:10/scan_alg:0/iterations:10/real_time     23472042 ns      5977400 ns           10 bytes_per_second=2.03151M/s items_per_second=42.6039k/s
Traceback (most recent call last):
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/bin/archery", line 33, in <module>
    sys.exit(load_entry_point('archery', 'console_scripts', 'archery')())
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1130, in _call_
    return self.main(*args, **kwargs)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/voltrondata/arrow/dev/archery/archery/cli.py", line 515, in benchmark_run
    json_out = json.dumps(runner_base, cls=JsonEncoder)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/_init_.py", line 234, in dumps
    return cls(
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 35, in default
    return BenchmarkRunnerCodec.encode(o)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 79, in encode
    return {"suites": [BenchmarkSuiteCodec.encode(s) for s in br.suites]}
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 79, in <listcomp>
    return {"suites": [BenchmarkSuiteCodec.encode(s) for s in br.suites]}
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/runner.py", line 176, in suites
    suite = self.suite(suite_name, suite_bin)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/runner.py", line 152, in suite
    results = suite_cmd.results(repetitions=self.repetitions)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/google.py", line 63, in results
    self.run(*argv, check=True)
  File "/Users/voltrondata/arrow/dev/archery/archery/utils/command.py", line 78, in run
    return subprocess.run(invocation, **kwargs)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/arrow-archery-18pv4wou/WORKSPACE/build/release/arrow-dataset-scanner-benchmark', '--benchmark_repetitions=1', '--benchmark_out=/var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/tmpzwwjwb94', '--benchmark_out_format=json']' died with <Signals.SIGBUS: 10>.

 

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
Tested on M1, all "arrow-dataset-scanner-benchmark/scan_alg=1" tests failed with SIGBUS. "scan_alg=0" tests are okay.
Stack depth is approaching 4000 from the backtrace. Looks there are a call loop among {future,async_util}.{h,cc}.
ASAN identified stack overflow, logs attached.
cc @westonpace

# all scan_alg:0 tests are okay, all scan_alg:1 tests cause sigbus
% debug/arrow-dataset-scanner-benchmark --benchmark_filter=".*scan_alg:1.*" 

/Users/linux/cyb/arrow/cpp/src/arrow/memory_pool.cc:113: Unsupported backend 'mimalloc' specified in ARROW_DEFAULT_MEMORY_POOL (supported backends are 'jemalloc', 'system')
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2022-11-01T17:02:15+08:00
Running debug/arrow-dataset-scanner-benchmark
Run on (8 X 24.2408 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 2.06, 2.81, 2.72
AddressSanitizer:DEADLYSIGNAL
=================================================================
==75674==ERROR: AddressSanitizer: stack-overflow on address 0x00016b9b3fc0 (pc 0x000106b4b3b4 bp 0x000106b4b3a0 sp 0x00016b9b3fa0 T1)
    #0 0x106b4b3b4 in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put(__sanitizer::StackTrace, bool*)+0x4 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x5f3b4)

SUMMARY: AddressSanitizer: stack-overflow (libclang_rt.asan_osx_dynamic.dylib:arm64+0x5f3b4) in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put(__sanitizer::StackTrace, bool*)+0x4
Thread T1 created by T0 here:
    #0 0x106b2680c in wrap_pthread_create+0x50 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x3a80c)
    #1 0x113b7a408 in std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:375
    #2 0x113b7a128 in std::__1::thread::thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3, void>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3&&) thread:309
    #3 0x113b67e94 in std::__1::thread::thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3, void>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3&&) thread:301
    #4 0x113b66794 in arrow::internal::ThreadPool::LaunchWorkersUnlocked(int) thread_pool.cc:412
    #5 0x113b68444 in arrow::internal::ThreadPool::SpawnReal(arrow::internal::TaskHints, arrow::internal::FnOnce<void ()>, arrow::StopToken, arrow::internal::FnOnce<void (arrow::Status const&)>&&) thread_pool.cc:448
    #6 0x10488dfd8 in arrow::Result<arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > > arrow::internal::Executor::Submit<arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0, arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > >(arrow::internal::TaskHints, arrow::StopToken, arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0&&) thread_pool.h:167
    #7 0x10488be74 in arrow::Result<arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > > arrow::internal::Executor::Submit<arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0, arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > >(arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0&&) thread_pool.h:193
    #8 0x10488ac0c in arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression) scan_node.cc:64
    #9 0x10488a010 in arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() scan_node.cc:318
    #10 0x113fc43e0 in arrow::compute::(anonymous namespace)::ExecPlanImpl::StartProducing() exec_plan.cc:183
    #11 0x113fc362c in arrow::compute::ExecPlan::StartProducing() exec_plan.cc:400
    #12 0x104462260 in arrow::dataset::MinimalEndToEndScan(unsigned long, unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::compute::ExecNodeOptions> > (unsigned long, unsigned long)>) scanner_benchmark.cc:159
    #13 0x104468ebc in arrow::dataset::MinimalEndToEndBench(benchmark::State&) scanner_benchmark.cc:272
    #14 0x1055dbc8c in benchmark::internal::BenchmarkInstance::Run(long long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const+0x44 (libbenchmark.1.7.0.dylib:arm64+0xbc8c)
    #15 0x1055ed708 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::BenchmarkInstance const*, long long, int, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*)+0x58 (libbenchmark.1.7.0.dylib:arm64+0x1d708)
    #16 0x1055ed2c8 in benchmark::internal::BenchmarkRunner::DoNIterations()+0x2c0 (libbenchmark.1.7.0.dylib:arm64+0x1d2c8)
    #17 0x1055edfec in benchmark::internal::BenchmarkRunner::DoOneRepetition()+0xb0 (libbenchmark.1.7.0.dylib:arm64+0x1dfec)
    #18 0x1055d4fb8 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)+0x9f0 (libbenchmark.1.7.0.dylib:arm64+0x4fb8)
    #19 0x1055d4564 in benchmark::RunSpecifiedBenchmarks()+0x3c (libbenchmark.1.7.0.dylib:arm64+0x4564)
    #20 0x10556becc in main+0x50 (libbenchmark_main.1.7.0.dylib:arm64+0x3ecc)
    #21 0x10562d0f0 in start+0x204 (dyld:arm64+0x50f0)
    #22 0xa25d7ffffffffffc  (<unknown module>)

==75674==ABORTING
zsh: abort      debug/arrow-dataset-scanner-benchmark --benchmark_filter=".*scan_alg:1.*"

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
I tried setting ARROW_DEFAULT_MEMORY_POOL to "jemalloc" and "system", same error happens.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
Thank you. I will look at this today.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
Issue resolved by pull request 14562
#14562

@asfimport asfimport added this to the 11.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants