Skip to content

Commit

Permalink
ARROW-6910: [C++][Python] Set jemalloc default configuration to relea…
Browse files Browse the repository at this point in the history
…se dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this

The current default behavior causes applications dealing in large datasets to hold on to a large amount of physical operating system memory. While this may improve performance in some cases, it empirically seems to be causing problems for users.

There's some discussion of this issue in some other contexts here

jemalloc/jemalloc#1128

Here is a test script I used to check the RSS while reading a large Parquet file (~10GB in memory) in a loop (requires downloading the file http://public-parquet-test-data.s3.amazonaws.com/big.snappy.parquet)

https://gist.github.com/wesm/c75ad3b6dcd37231aaacf56a80a5e401

This patch enables jemalloc background page reclamation and reduces the time decay from 10 seconds to 1 second so that memory is returned to the OS more aggressively.

Closes #5701 from wesm/ARROW-6910 and squashes the following commits:

8fc8aa8 <Antoine Pitrou> Revert "Try to fix protobuf-related clang warning"
ab67abb <Antoine Pitrou> Try to fix protobuf-related clang warning
9290470 <Wes McKinney> Review comments, disable PLASMA_VALGRIND in Travis
8c4d367 <Wes McKinney> Use background_thread:true and 1000ms decay
daa5416 <Wes McKinney> Set jemalloc dirty_decay_ms and muzzy_decay_ms to 0 by default, add function to set the values to something else

Lead-authored-by: Wes McKinney <wesm+git@apache.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
  • Loading branch information
2 people authored and kszucs committed Oct 23, 2019
1 parent 08472da commit c20eceb
Show file tree
Hide file tree
Showing 7 changed files with 90 additions and 9 deletions.
10 changes: 4 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,14 @@ matrix:
- $TRAVIS_BUILD_DIR/ci/travis_script_cpp.sh || travis_terminate 1
- $TRAVIS_BUILD_DIR/ci/travis_script_gandiva_java.sh || travis_terminate 1
- $TRAVIS_BUILD_DIR/ci/travis_upload_cpp_coverage.sh || travis_terminate 1
- name: "Python 3.6 unit tests w/ Valgrind, conda-forge toolchain, coverage"

# -------------------------------------------------------------------------
- name: "Python 3.6 unit tests, conda-forge toolchain, coverage"
compiler: gcc
language: cpp
os: linux
jdk: openjdk8
env:
# Valgrind is needed for the Plasma store tests
- ARROW_BUILD_WARNING_LEVEL=CHECKIN
- ARROW_TRAVIS_COVERAGE=1
- ARROW_TRAVIS_FLIGHT=1
Expand All @@ -109,7 +110,7 @@ matrix:
- ARROW_TRAVIS_PYTHON_JVM=1
- ARROW_TRAVIS_USE_SYSTEM_JAVA=1
- ARROW_TRAVIS_USE_TOOLCHAIN=1
- ARROW_TRAVIS_VALGRIND=1
- ARROW_TRAVIS_S3=1
# TODO(wesm): Run the benchmarks outside of Travis
# - ARROW_TRAVIS_PYTHON_BENCHMARKS=1
before_script:
Expand All @@ -120,9 +121,6 @@ matrix:
script:
- $TRAVIS_BUILD_DIR/ci/travis_script_java.sh || travis_terminate 1
- export ARROW_TRAVIS_PYTHON_GANDIVA=1
# Only run Plasma tests with valgrind in one of the Python builds because
# they are slow
- export PLASMA_VALGRIND=1
- $TRAVIS_BUILD_DIR/ci/travis_script_python.sh 3.6
- $TRAVIS_BUILD_DIR/ci/travis_upload_cpp_coverage.sh
- name: "[OS X] C++ w/ XCode 9.3"
Expand Down
45 changes: 43 additions & 2 deletions cpp/src/arrow/memory_pool.cc
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,28 @@
// building jemalloc.
// See discussion in https://github.com/jemalloc/jemalloc/issues/1621

// ARROW-6910(wesm): we found that jemalloc's default behavior with respect to
// dirty / muzzy pages (see definitions of these in the jemalloc documentation)
// conflicted with user expectations, and would even cause memory use problems
// in some cases. By enabling the background_thread option and reducing the
// decay time from 10 seconds to 1 seconds, memory is released more
// aggressively (and in the background) to the OS. This can be configured
// further by using the arrow::jemalloc_set_decay_ms API

#ifdef NDEBUG
const char* je_arrow_malloc_conf = "oversize_threshold:0";
const char* je_arrow_malloc_conf =
("oversize_threshold:0,"
"dirty_decay_ms:1000,"
"muzzy_decay_ms:1000,"
"background_thread:true");
#else
// In debug mode, add memory poisoning on alloc / free
const char* je_arrow_malloc_conf = "oversize_threshold:0,junk:true";
const char* je_arrow_malloc_conf =
("oversize_threshold:0,"
"junk:true,"
"dirty_decay_ms:1000,"
"muzzy_decay_ms:1000,"
"background_thread:true");
#endif
#endif

Expand Down Expand Up @@ -389,6 +406,30 @@ MemoryPool* default_memory_pool() {
#endif
}

#define RETURN_IF_JEMALLOC_ERROR(ERR) \
do { \
if (err != 0) { \
return Status::UnknownError(std::strerror(ERR)); \
} \
} while (0)

Status jemalloc_set_decay_ms(int ms) {
#ifdef ARROW_JEMALLOC
ssize_t decay_time_ms = static_cast<ssize_t>(ms);

int err = mallctl("arenas.dirty_decay_ms", nullptr, nullptr, &decay_time_ms,
sizeof(decay_time_ms));
RETURN_IF_JEMALLOC_ERROR(err);
err = mallctl("arenas.muzzy_decay_ms", nullptr, nullptr, &decay_time_ms,
sizeof(decay_time_ms));
RETURN_IF_JEMALLOC_ERROR(err);

return Status::OK();
#else
return Status::Invalid("jemalloc support is not built");
#endif
}

///////////////////////////////////////////////////////////////////////
// LoggingMemoryPool implementation

Expand Down
11 changes: 11 additions & 0 deletions cpp/src/arrow/memory_pool.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,17 @@ ARROW_EXPORT MemoryPool* system_memory_pool();
/// May return NotImplemented if jemalloc is not available.
ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);

/// \brief Set jemalloc memory page purging behavior for future-created arenas
/// to the indicated number of milliseconds. See dirty_decay_ms and
/// muzzy_decay_ms options in jemalloc for a description of what these do. The
/// default is configured to 1000 (1 second) which releases memory more
/// aggressively to the operating system than the jemalloc default of 10
/// seconds. If you set the value to 0, dirty / muzzy pages will be released
/// immediately rather than with a time decay, but this may reduce application
/// performance.
ARROW_EXPORT
Status jemalloc_set_decay_ms(int ms);

/// Return a process-wide memory pool based on mimalloc.
///
/// May return NotImplemented if mimalloc is not available.
Expand Down
10 changes: 10 additions & 0 deletions cpp/src/arrow/memory_pool_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -143,4 +143,14 @@ TEST(ProxyMemoryPool, Logging) {
ASSERT_EQ(0, pool->bytes_allocated());
ASSERT_EQ(0, pp.bytes_allocated());
}

TEST(Jemalloc, SetDirtyPageDecayMillis) {
// ARROW-6910
#ifdef ARROW_JEMALLOC
ASSERT_OK(jemalloc_set_decay_ms(0));
#else
ASSERT_RAISES(Invalid, jemalloc_set_decay_ms(0));
#endif
}

} // namespace arrow
3 changes: 2 additions & 1 deletion python/pyarrow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ def parse_git(root, **kwargs):
from pyarrow.lib import (MemoryPool, LoggingMemoryPool, ProxyMemoryPool,
total_allocated_bytes, set_memory_pool,
default_memory_pool, logging_memory_pool,
proxy_memory_pool, log_memory_allocations)
proxy_memory_pool, log_memory_allocations,
jemalloc_set_decay_ms)

# I/O
from pyarrow.lib import (HdfsFile, NativeFile, PythonFile,
Expand Down
2 changes: 2 additions & 0 deletions python/pyarrow/includes/libarrow.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,8 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil:

cdef CMemoryPool* c_default_memory_pool" arrow::default_memory_pool"()

CStatus c_jemalloc_set_decay_ms" arrow::jemalloc_set_decay_ms"(int ms)

cdef cppclass CListType" arrow::ListType"(CDataType):
CListType(const shared_ptr[CDataType]& value_type)
CListType(const shared_ptr[CField]& field)
Expand Down
18 changes: 18 additions & 0 deletions python/pyarrow/memory.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,21 @@ def total_allocated_bytes():
"""
cdef CMemoryPool* pool = c_get_memory_pool()
return pool.bytes_allocated()


def jemalloc_set_decay_ms(decay_ms):
"""
Set arenas.dirty_decay_ms and arenas.muzzy_decay_ms to indicated number of
milliseconds. A value of 0 (the default) results in dirty / muzzy memory
pages being released right away to the OS, while a higher value will result
in a time-based decay. See the jemalloc docs for more information
It's best to set this at the start of your application.
Parameters
----------
decay_ms : int
Number of milliseconds to set for jemalloc decay conf parameters. Note
that this change will only affect future memory arenas
"""
check_status(c_jemalloc_set_decay_ms(decay_ms))

0 comments on commit c20eceb

Please sign in to comment.