Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6910: [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this #5701

Closed
wants to merge 5 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Oct 19, 2019

The current default behavior causes applications dealing in large datasets to hold on to a large amount of physical operating system memory. While this may improve performance in some cases, it empirically seems to be causing problems for users.

There's some discussion of this issue in some other contexts here

jemalloc/jemalloc#1128

Here is a test script I used to check the RSS while reading a large Parquet file (~10GB in memory) in a loop (requires downloading the file http://public-parquet-test-data.s3.amazonaws.com/big.snappy.parquet)

https://gist.github.com/wesm/c75ad3b6dcd37231aaacf56a80a5e401

This patch enables jemalloc background page reclamation and reduces the time decay from 10 seconds to 1 second so that memory is returned to the OS more aggressively.

@wesm wesm requested review from pitrou and xhochy October 19, 2019 17:06
@wesm
Copy link
Member Author

wesm commented Oct 19, 2019

@ursabot benchmark

@github-actions
Copy link

@ursabot
Copy link

ursabot commented Oct 19, 2019

AMD64 Ubuntu 18.04 C++ Benchmark (#71877) builder failed.

Revision: e0b2a70

Archery: 'archery benchmark ...' step's stderr:

Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

Cloning into '/tmp/arrow-archery-68634yvr/master/arrow'...
done.
Note: checking out '2966c01cfacbd877de2db7e4eaa2a87dcfdfd1e2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 2966c01cf Merge e0b2a70d65d8e5c942bd7c486f2d296f5d7c92bb into d9234628a4f510117d7d7c36e5c816e5bea1d9c4
Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

2019-10-19 17:18:46
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-ipc-read-write-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 22.26, 34.35, 26.15
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:20:42
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-utf8-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 4.94, 23.98, 23.32
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:21:39
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-hashing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 3.21, 20.30, 22.10
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:22:07
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-csv-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.73, 18.55, 21.46
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:22:44
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.38, 16.71, 20.73
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:24:42
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-decimal-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.30, 11.60, 18.37
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:25:40
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-compare-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.12, 9.82, 17.37
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:26:39
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-aggregate-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.04, 8.21, 16.34
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:27:08
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-bit-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.02, 7.52, 15.85
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:28:11
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-builder-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.01, 6.24, 14.85
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:30:09
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-int-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 4.56, 13.23
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:30:39
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-sort-to-indices-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 4.22, 12.84
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:32:05
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-trie-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 3.42, 11.80
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:32:20
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-json-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 3.30, 11.62
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:33:03
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-io-memory-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.00, 2.97, 11.12
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:34:31
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-type-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 3.51, 3.36, 10.54
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:35:07
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-io-file-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.36, 3.08, 10.14
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:35:51
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-compute-take-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 2.00, 2.89, 9.78
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-19 17:41:44
Running /tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-number-parsing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.02, 1.60, 6.99
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
arrow-number-parsing-benchmark: src/utils.h:248: void double_conversion::StringBuilder::AddSubstring(const char*, int): Assertion `!is_finalized() && position_ + n < buffer_.length()' failed.
Traceback (most recent call last):
  File "/usr/local/bin/archery", line 11, in <module>
    load_entry_point('archery', 'console_scripts', 'archery')()
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/cli.py", line 461, in benchmark_diff
    for comparator in runner_comp.comparisons:
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/compare.py", line 114, in comparisons
    for suite_name, (suite_cont, suite_base) in suites:
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/compare.py", line 89, in pairwise_compare
    dict_contender = {e.name: e for e in contender}
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/compare.py", line 89, in <dictcomp>
    dict_contender = {e.name: e for e in contender}
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/runner.py", line 177, in suites
    suite = self.suite(suite_name, suite_bin)
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/runner.py", line 154, in suite
    results = suite_cmd.results()
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/benchmark/google.py", line 64, in results
    self.run(*argv, check=True)
  File "/buildbot/AMD64_Ubuntu_18_04_C___Benchmark/dev/archery/archery/utils/command.py", line 74, in run
    return subprocess.run(invocation, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/arrow-archery-68634yvr/WORKSPACE/build/release/arrow-number-parsing-benchmark', '--benchmark_repetitions=10', '--benchmark_out=/tmp/tmpfvisjoek', '--benchmark_out_format=json']' died with <Signals.SIGABRT: 6>.

@wesm
Copy link
Member Author

wesm commented Oct 19, 2019

@fsaintjacques seems that the benchmark command got broken

@wesm
Copy link
Member Author

wesm commented Oct 19, 2019

Guess it could be a legitimate failure. I'll look later

@pitrou
Copy link
Member

pitrou commented Oct 19, 2019

Have you tried "background_thread:true" instead, as mentioned in jemalloc/jemalloc#1128 ? I fear that returning memory immediately to the OS (which setting the "decay" options to 0 may be doing) may reduce performance on some workloads.

@wesm
Copy link
Member Author

wesm commented Oct 20, 2019

I tried background_thread:true with 1 second decay instead of 10 seconds (which is recommended in that issue thread) and here's the output, with some 1 second sleep inserted

$ python test.py 
sleeping
RSS: 10880475136, change: 10797621248
sleeping
RSS: 13594267648, change: 2713792512
sleeping
RSS: 12921249792, change: -673017856
sleeping
RSS: 10334597120, change: -2586652672
sleeping
RSS: 10237149184, change: -97447936
sleeping
RSS: 10437951488, change: 200802304
sleeping
RSS: 9919725568, change: -518225920
sleeping
RSS: 9462968320, change: -456757248
sleeping
RSS: 10122711040, change: 659742720
sleeping
RSS: 9661595648, change: -461115392
Took 47.43621349334717 seconds
RSS: 7910539264, change: -1751056384
RSS: 5868593152, change: -2041946112
RSS: 4238458880, change: -1630134272
RSS: 3196747776, change: -1041711104
RSS: 2589286400, change: -607461376
RSS: 1776238592, change: -813047808
RSS: 820801536, change: -955437056
RSS: 377745408, change: -443056128
RSS: 164728832, change: -213016576
RSS: 164728832, change: 0
RSS: 149159936, change: -15568896
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0
RSS: 149159936, change: 0

Does this seem acceptable? It seems to do a better job of not letting peak RSS get out of hand

@wesm
Copy link
Member Author

wesm commented Oct 20, 2019

(I don't know how much CPU the background thread takes up, hopefully not a lot)

@pitrou
Copy link
Member

pitrou commented Oct 20, 2019

(I don't know how much CPU the background thread takes up, hopefully not a lot)

Probably less than doing things eagerly (also it should avoid adding latency to the main program).

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple details.

/// \brief Set jemalloc memory page purging behavior for future-created arenas
/// to the indicated number of milliseconds. See dirty_decay_ms and
/// muzzy_decay_ms options in jemalloc for a description of what these do. The
/// default is configured to 0 which releases memory more aggressively to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment? It doesn't default to 0 anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} \
} while (0)

Status jemalloc_set_decay_ms(int ms) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this? The mallctl calls are not trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

("oversize_threshold:0,"
"dirty_decay_ms:1000,"
"muzzy_decay_ms:1000,"
"background_thread:true");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a comment explaining how we came to these settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pitrou
Copy link
Member

pitrou commented Oct 20, 2019

Does this seem acceptable? It seems to do a better job of not letting peak RSS get out of hand

Yes, I think that's the main goal here.

@wesm
Copy link
Member Author

wesm commented Oct 20, 2019

Thanks for the review. I'll update the comments and add a test.

It seems that this change wreaks havoc on the Plasma valgrind tests. I'm just going to disable valgrind in the Plasma Python tests and hope for the best.

@pitrou
Copy link
Member

pitrou commented Oct 20, 2019

Hmm, I wasn't aware that Valgrind was still enabled somewhere in our test suite. Are you sure that is the case?

@wesm
Copy link
Member Author

wesm commented Oct 20, 2019

https://github.com/apache/arrow/blob/master/.travis.yml#L97

I don't think it's worth the effort of figuring out how to fix this.

@fsaintjacques
Copy link
Contributor

Just to confirm, it is a failed assert more than an archery benchmark failure?

@xhochy
Copy link
Member

xhochy commented Oct 21, 2019

@wesm Changes look good from my side. It would be nice when you could do a mini-blogpost somewhere. These microoptimizations are a really good point why someone should use Arrow's own implementation around the memory format.

@wesm
Copy link
Member Author

wesm commented Oct 21, 2019

@ursabot benchmark

@ursabot
Copy link

ursabot commented Oct 21, 2019

AMD64 Ubuntu 18.04 C++ Benchmark (#72178) builder failed.

Revision: 9290470

Archery: 'archery benchmark ...' step's stderr:

Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

Cloning into '/tmp/arrow-archery-b4via207/master/arrow'...
done.
Note: checking out 'abb8a1e43a27f0a96cd1a8f82eb493fa72d4b37a'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at abb8a1e43 Merge 92904703475fa6221a3e0d0bd1e0b65e98ccffa1 into bbad94af5ec92a008806a37ed45e0beb1b236998
Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:168 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1721 (resolve_dependency)
  CMakeLists.txt:424 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

2019-10-21 19:07:58
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-compute-compare-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 19.62, 19.37, 11.05
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:09:00
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-hashing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 7.84, 16.02, 10.42
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:09:28
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-builder-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 5.50, 14.82, 10.16
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:11:33
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-utf8-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.53, 9.95, 8.96
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:12:31
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-ipc-read-write-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.21, 8.44, 8.50
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:14:27
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-bit-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.06, 6.08, 7.63
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:15:30
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-int-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.02, 5.08, 7.18
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:15:59
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-csv-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.07, 4.71, 6.99
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:16:35
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-trie-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.04, 4.30, 6.77
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-10-21 19:16:50
Running /tmp/arrow-archery-b4via207/WORKSPACE/build/release/arrow-compute-aggregate-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
Load Average: 1.03, 4.13, 6.67
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

@pitrou
Copy link
Member

pitrou commented Oct 22, 2019

Now some OS X / C Glib build fails for another reason that I don't understand (can't find LLVM?):
https://travis-ci.org/apache/arrow/jobs/601244672

I'd be tempted to merge anyway, as it seems clear that this build is currently broken, including on master. @kou please voice up quickly if you oppose it.

(also cc @xhochy )

@pitrou
Copy link
Member

pitrou commented Oct 22, 2019

@pitrou
Copy link
Member

pitrou commented Oct 22, 2019

I'll merge without the protobuf warnings fix, actually, since those warnings also fail on git master.

@pitrou pitrou closed this in 1ae946c Oct 22, 2019
@kou
Copy link
Member

kou commented Oct 22, 2019

@kou please voice up quickly if you oppose it.

No objection. We can fix it later.

kszucs pushed a commit to kszucs/arrow that referenced this pull request Oct 23, 2019
…se dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this

The current default behavior causes applications dealing in large datasets to hold on to a large amount of physical operating system memory. While this may improve performance in some cases, it empirically seems to be causing problems for users.

There's some discussion of this issue in some other contexts here

jemalloc/jemalloc#1128

Here is a test script I used to check the RSS while reading a large Parquet file (~10GB in memory) in a loop (requires downloading the file http://public-parquet-test-data.s3.amazonaws.com/big.snappy.parquet)

https://gist.github.com/wesm/c75ad3b6dcd37231aaacf56a80a5e401

This patch enables jemalloc background page reclamation and reduces the time decay from 10 seconds to 1 second so that memory is returned to the OS more aggressively.

Closes apache#5701 from wesm/ARROW-6910 and squashes the following commits:

8fc8aa8 <Antoine Pitrou> Revert "Try to fix protobuf-related clang warning"
ab67abb <Antoine Pitrou> Try to fix protobuf-related clang warning
9290470 <Wes McKinney> Review comments, disable PLASMA_VALGRIND in Travis
8c4d367 <Wes McKinney> Use background_thread:true and 1000ms decay
daa5416 <Wes McKinney> Set jemalloc dirty_decay_ms and muzzy_decay_ms to 0 by default, add function to set the values to something else

Lead-authored-by: Wes McKinney <wesm+git@apache.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants