ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this #5701

wesm · 2019-10-19T17:05:54Z

The current default behavior causes applications dealing in large datasets to hold on to a large amount of physical operating system memory. While this may improve performance in some cases, it empirically seems to be causing problems for users.

There's some discussion of this issue in some other contexts here

jemalloc/jemalloc#1128

Here is a test script I used to check the RSS while reading a large Parquet file (~10GB in memory) in a loop (requires downloading the file http://public-parquet-test-data.s3.amazonaws.com/big.snappy.parquet)

https://gist.github.com/wesm/c75ad3b6dcd37231aaacf56a80a5e401

This patch enables jemalloc background page reclamation and reduces the time decay from 10 seconds to 1 second so that memory is returned to the OS more aggressively.

wesm · 2019-10-19T17:10:11Z

@ursabot benchmark

github-actions · 2019-10-19T17:16:19Z

https://issues.apache.org/jira/browse/ARROW-6910

ursabot · 2019-10-19T17:44:30Z

AMD64 Ubuntu 18.04 C++ Benchmark (#71877) builder failed.

Revision: e0b2a70

Archery: 'archery benchmark ...' step's stderr:

Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

Cloning into '/tmp/arrow-archery-68634yvr/master/arrow'...
done.
Note: checking out '2966c01cfacbd877de2db7e4eaa2a87dcfdfd1e2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 2966c01cf Merge e0b2a70d65d8e5c942bd7c486f2d296f5d7c92bb into d9234628a4f510117d7d7c36e5c816e5bea1d9c4
Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

2019-10-19 17:18:46
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-ipc-read-write-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 22.26, 34.35, 26.15
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:20:42
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-utf8-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 4.94, 23.98, 23.32
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:21:39
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-hashing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 3.21, 20.30, 22.10
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:22:07
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-csv-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.73, 18.55, 21.46
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:22:44
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.38, 16.71, 20.73
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:24:42
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-decimal-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.30, 11.60, 18.37
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:25:40
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-compare-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.12, 9.82, 17.37
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:26:39
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-aggregate-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.04, 8.21, 16.34
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:27:08
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-bit-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.02, 7.52, 15.85
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:28:11
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-builder-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.01, 6.24, 14.85
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:30:09
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-int-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 4.56, 13.23
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:30:39
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-sort-to-indices-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 4.22, 12.84
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:32:05
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-trie-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 3.42, 11.80
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:32:20
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-json-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 3.30, 11.62
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:33:03
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-io-memory-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 2.97, 11.12
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:34:31
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-type-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 3.51, 3.36, 10.54
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:35:07
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-io-file-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.36, 3.08, 10.14
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:35:51
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-take-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.00, 2.89, 9.78
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:41:44
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-number-parsing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.02, 1.60, 6.99
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
arrow-number-parsing-benchmark: src/utils.h:248: void double_conversion::StringBuilder::AddSubstring(const char*, int): Assertion `!is_finalized() && position_ + n < buffer_.length()' failed.
Traceback (most recent call last):
  File "/usr/local/bin/archery", line 11, in <module>
    load_entry_point('archery', 'console_scripts', 'archery')()
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/cli.py", line 461, in benchmark_diff
    for comparator in runner_comp.comparisons:
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/compare.py", line 114, in comparisons
    for suite_name, (suite_cont, suite_base) in suites:
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/compare.py", line 89, in pairwise_compare
    dict_contender = {e.name: e for e in contender}
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/compare.py", line 89, in <dictcomp>
    dict_contender = {e.name: e for e in contender}
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/runner.py", line 177, in suites
    suite = self.suite(suite_name, suite_bin)
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/runner.py", line 154, in suite
    results = suite_cmd.results()
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/google.py", line 64, in results
    self.run(*argv, check=True)
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/utils/command.py", line 74, in run
    return subprocess.run(invocation, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-number-parsing-benchmark', '--benchmark_repetitions=10', '--benchmark_out=/tmp/tmpfvisjoek', '--benchmark_out_format=json']' died with <Signals.SIGABRT: 6>.

wesm · 2019-10-19T19:21:44Z

@fsaintjacques seems that the benchmark command got broken

wesm · 2019-10-19T19:26:03Z

Guess it could be a legitimate failure. I'll look later

pitrou · 2019-10-19T20:25:10Z

Have you tried "background_thread:true" instead, as mentioned in jemalloc/jemalloc#1128 ? I fear that returning memory immediately to the OS (which setting the "decay" options to 0 may be doing) may reduce performance on some workloads.

wesm · 2019-10-20T15:13:56Z

I tried background_thread:true with 1 second decay instead of 10 seconds (which is recommended in that issue thread) and here's the output, with some 1 second sleep inserted

$ python test.py 
sleeping
RSS: 10880475136, change: 10797621248
sleeping
RSS: 13594267648, change: 2713792512
sleeping
RSS: 12921249792, change: -673017856
sleeping
RSS: 10334597120, change: -2586652672
sleeping
RSS: 10237149184, change: -97447936
sleeping
RSS: 10437951488, change: 200802304
sleeping
RSS: 9919725568, change: -518225920
sleeping
RSS: 9462968320, change: -456757248
sleeping
RSS: 10122711040, change: 659742720
sleeping
RSS: 9661595648, change: -461115392
Took 47.43621349334717 seconds
RSS: 7910539264, change: -1751056384
RSS: 5868593152, change: -2041946112
RSS: 4238458880, change: -1630134272
RSS: 3196747776, change: -1041711104
RSS: 2589286400, change: -607461376
RSS: 1776238592, change: -813047808
RSS: 820801536, change: -955437056
RSS: 377745408, change: -443056128
RSS: 164728832, change: -213016576
RSS: 164728832, change: 0
RSS: 149159936, change: -15568896
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0

Does this seem acceptable? It seems to do a better job of not letting peak RSS get out of hand

wesm · 2019-10-20T15:14:13Z

(I don't know how much CPU the background thread takes up, hopefully not a lot)

pitrou · 2019-10-20T15:27:40Z

(I don't know how much CPU the background thread takes up, hopefully not a lot)

Probably less than doing things eagerly (also it should avoid adding latency to the main program).

pitrou

LGTM, just a couple details.

pitrou · 2019-10-20T15:29:30Z

cpp/src/arrow/memory_pool.h

+/// \brief Set jemalloc memory page purging behavior for future-created arenas
+/// to the indicated number of milliseconds. See dirty_decay_ms and
+/// muzzy_decay_ms options in jemalloc for a description of what these do. The
+/// default is configured to 0 which releases memory more aggressively to the


Update comment? It doesn't default to 0 anymore.

pitrou · 2019-10-20T15:30:13Z

cpp/src/arrow/memory_pool.cc

+    }                                                  \
+  } while (0)
+
+Status jemalloc_set_decay_ms(int ms) {


Can we add a test for this? The mallctl calls are not trivial.

pitrou · 2019-10-20T15:30:41Z

cpp/src/arrow/memory_pool.cc

+    ("oversize_threshold:0,"
+     "dirty_decay_ms:1000,"
+     "muzzy_decay_ms:1000,"
+     "background_thread:true");


It would be nice to add a comment explaining how we came to these settings.

pitrou · 2019-10-20T15:31:26Z

Does this seem acceptable? It seems to do a better job of not letting peak RSS get out of hand

Yes, I think that's the main goal here.

wesm · 2019-10-20T19:14:40Z

Thanks for the review. I'll update the comments and add a test.

It seems that this change wreaks havoc on the Plasma valgrind tests. I'm just going to disable valgrind in the Plasma Python tests and hope for the best.

pitrou · 2019-10-20T19:15:48Z

Hmm, I wasn't aware that Valgrind was still enabled somewhere in our test suite. Are you sure that is the case?

wesm · 2019-10-20T19:26:21Z

https://github.com/apache/arrow/blob/master/.travis.yml#L97

I don't think it's worth the effort of figuring out how to fix this.

fsaintjacques · 2019-10-21T12:40:59Z

Just to confirm, it is a failed assert more than an archery benchmark failure?

xhochy · 2019-10-21T13:00:51Z

@wesm Changes look good from my side. It would be nice when you could do a mini-blogpost somewhere. These microoptimizations are a really good point why someone should use Arrow's own implementation around the memory format.

…unction to set the values to something else

wesm · 2019-10-21T19:02:46Z

@ursabot benchmark

ursabot · 2019-10-21T19:25:56Z

AMD64 Ubuntu 18.04 C++ Benchmark (#72178) builder failed.

Revision: 9290470

Archery: 'archery benchmark ...' step's stderr:

Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

Cloning into '/tmp/arrow-archery-b4via207/master/arrow'...
done.
Note: checking out 'abb8a1e43a27f0a96cd1a8f82eb493fa72d4b37a'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at abb8a1e43 Merge 92904703475fa6221a3e0d0bd1e0b65e98ccffa1 into bbad94af5ec92a008806a37ed45e0beb1b236998
Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

2019-10-21 19:07:58
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-compute-compare-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 19.62, 19.37, 11.05
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:09:00
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-hashing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 7.84, 16.02, 10.42
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:09:28
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-builder-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 5.50, 14.82, 10.16
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:11:33
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-utf8-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.53, 9.95, 8.96
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:12:31
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-ipc-read-write-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.21, 8.44, 8.50
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:14:27
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-bit-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.06, 6.08, 7.63
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:15:30
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-int-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.02, 5.08, 7.18
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:15:59
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-csv-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.07, 4.71, 6.99
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:16:35
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-trie-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.04, 4.30, 6.77
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:16:50
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-compute-aggregate-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.03, 4.13, 6.67
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

pitrou · 2019-10-22T14:17:43Z

Now some OS X / C Glib build fails for another reason that I don't understand (can't find LLVM?):
https://travis-ci.org/apache/arrow/jobs/601244672

I'd be tempted to merge anyway, as it seems clear that this build is currently broken, including on master. @kou please voice up quickly if you oppose it.

(also cc @xhochy )

pitrou · 2019-10-22T14:41:24Z

AppVeyor build: https://ci.appveyor.com/project/pitrou/arrow/builds/28288488

This reverts commit ab67abb.

pitrou · 2019-10-22T14:46:32Z

I'll merge without the protobuf warnings fix, actually, since those warnings also fail on git master.

kou · 2019-10-22T21:45:55Z

@kou please voice up quickly if you oppose it.

No objection. We can fix it later.

…se dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this The current default behavior causes applications dealing in large datasets to hold on to a large amount of physical operating system memory. While this may improve performance in some cases, it empirically seems to be causing problems for users. There's some discussion of this issue in some other contexts here jemalloc/jemalloc#1128 Here is a test script I used to check the RSS while reading a large Parquet file (~10GB in memory) in a loop (requires downloading the file http://public-parquet-test-data.s3.amazonaws.com/big.snappy.parquet) https://gist.github.com/wesm/c75ad3b6dcd37231aaacf56a80a5e401 This patch enables jemalloc background page reclamation and reduces the time decay from 10 seconds to 1 second so that memory is returned to the OS more aggressively. Closes apache#5701 from wesm/ARROW-6910 and squashes the following commits: 8fc8aa8 <Antoine Pitrou> Revert "Try to fix protobuf-related clang warning" ab67abb <Antoine Pitrou> Try to fix protobuf-related clang warning 9290470 <Wes McKinney> Review comments, disable PLASMA_VALGRIND in Travis 8c4d367 <Wes McKinney> Use background_thread:true and 1000ms decay daa5416 <Wes McKinney> Set jemalloc dirty_decay_ms and muzzy_decay_ms to 0 by default, add function to set the values to something else Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

wesm requested review from pitrou and xhochy October 19, 2019 17:06

wesm force-pushed the ARROW-6910 branch from e0b2a70 to 0f869a5 Compare October 20, 2019 15:19

pitrou requested changes Oct 20, 2019

View reviewed changes

wesm added 3 commits October 21, 2019 12:48

Set jemalloc dirty_decay_ms and muzzy_decay_ms to 0 by default, add f…

daa5416

…unction to set the values to something else

Use background_thread:true and 1000ms decay

8c4d367

Review comments, disable PLASMA_VALGRIND in Travis

9290470

wesm force-pushed the ARROW-6910 branch from 0f869a5 to 9290470 Compare October 21, 2019 17:48

pitrou approved these changes Oct 21, 2019

View reviewed changes

pitrou force-pushed the ARROW-6910 branch from a25734a to 8ef27b2 Compare October 22, 2019 10:23

Try to fix protobuf-related clang warning

ab67abb

pitrou force-pushed the ARROW-6910 branch from 8ef27b2 to ab67abb Compare October 22, 2019 12:03

fsaintjacques mentioned this pull request Oct 22, 2019

ARROW-6769: [Dataset][C++] End to end test #5675

Closed

Revert "Try to fix protobuf-related clang warning"

8fc8aa8

This reverts commit ab67abb.

pitrou closed this in 1ae946c Oct 22, 2019

asfimport mentioned this pull request Nov 6, 2019

[Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits #23234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this #5701

ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this #5701

wesm commented Oct 19, 2019 •

edited

Loading

wesm commented Oct 19, 2019

github-actions bot commented Oct 19, 2019

ursabot commented Oct 19, 2019

wesm commented Oct 19, 2019

wesm commented Oct 19, 2019

pitrou commented Oct 19, 2019

wesm commented Oct 20, 2019

wesm commented Oct 20, 2019

pitrou commented Oct 20, 2019

pitrou left a comment •

edited

Loading

pitrou Oct 20, 2019

wesm Oct 21, 2019

pitrou Oct 20, 2019

wesm Oct 21, 2019

pitrou Oct 20, 2019

wesm Oct 21, 2019

pitrou commented Oct 20, 2019

wesm commented Oct 20, 2019

pitrou commented Oct 20, 2019

wesm commented Oct 20, 2019

fsaintjacques commented Oct 21, 2019

xhochy commented Oct 21, 2019

wesm commented Oct 21, 2019

ursabot commented Oct 21, 2019

pitrou commented Oct 22, 2019 •

edited

Loading

pitrou commented Oct 22, 2019

pitrou commented Oct 22, 2019

kou commented Oct 22, 2019

ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this #5701

ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this #5701

Conversation

wesm commented Oct 19, 2019 • edited Loading

wesm commented Oct 19, 2019

github-actions bot commented Oct 19, 2019

ursabot commented Oct 19, 2019

wesm commented Oct 19, 2019

wesm commented Oct 19, 2019

pitrou commented Oct 19, 2019

wesm commented Oct 20, 2019

wesm commented Oct 20, 2019

pitrou commented Oct 20, 2019

pitrou left a comment • edited Loading

Choose a reason for hiding this comment

pitrou Oct 20, 2019

Choose a reason for hiding this comment

wesm Oct 21, 2019

Choose a reason for hiding this comment

pitrou Oct 20, 2019

Choose a reason for hiding this comment

wesm Oct 21, 2019

Choose a reason for hiding this comment

pitrou Oct 20, 2019

Choose a reason for hiding this comment

wesm Oct 21, 2019

Choose a reason for hiding this comment

pitrou commented Oct 20, 2019

wesm commented Oct 20, 2019

pitrou commented Oct 20, 2019

wesm commented Oct 20, 2019

fsaintjacques commented Oct 21, 2019

xhochy commented Oct 21, 2019

wesm commented Oct 21, 2019

ursabot commented Oct 21, 2019

pitrou commented Oct 22, 2019 • edited Loading

pitrou commented Oct 22, 2019

pitrou commented Oct 22, 2019

kou commented Oct 22, 2019

wesm commented Oct 19, 2019 •

edited

Loading

pitrou left a comment •

edited

Loading

pitrou commented Oct 22, 2019 •

edited

Loading