Skip to content
This repository was archived by the owner on Oct 16, 2020. It is now read-only.
This repository was archived by the owner on Oct 16, 2020. It is now read-only.

Transparent Huge Pages set to [always] is sub-optimal for many applications #2635

@markhpc

Description

@markhpc

Issue Report

Transparent Huge Pages provides real benefit to certain applications by potentially reducing TLB misses and improving performance. For other applications, it can bloat memory usage and cause performance regressions. The kernel documentation claims that [madvise] is the default behavior:

"madvise" will enter direct reclaim like "always" but only for regions
that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

https://www.kernel.org/doc/Documentation/vm/transhuge.txt

However in mm/Kconfig it turns out the default behavior is actually to use [always]:

https://github.com/torvalds/linux/blob/master/mm/Kconfig#L385-L407

By default coreos enables transparent huge pages, but doesn't specify if it wants to use always or madvise by default, so always is chosen. Unfortunately setting THP to [always] causes issues with a variety of software:

splunk: https://docs.splunk.com/Documentation/Splunk/7.3.2/ReleaseNotes/SplunkandTHP
mongodb: https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
couchbase: https://docs.couchbase.com/server/current/install/thp-disable.html
oracle: https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp
nuodb: http://doc.nuodb.com/4.0/Content/OpenShift-disable-THP.htm
Go runtime: golang/go#8832
jemalloc: https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/
node.js: nodejs/node#11077
tcmalloc: gperftools/gperftools#1073

More recently, we've also seen memory usage bloat in Ceph (using tcmalloc) when THP is set to always potentially resulting in OOM when running inside containers. There are various ways to potentially work around this at the application level including using MADV_NOHUGEPAGE or a prctl flag. Requiring these workarounds to disable THP for a given application is counter-intuitive for several reasons:

  1. It puts the onus on developers to explicitly stop the kernel from engaging in sub-optimal behavior.

  2. It's incredibly confusing to have a system-wide default that claims to "always" enable a setting that many applications may or may not silently disable through workarounds.

Finally, when another prominent distribution was faced with a similar choice, they ran stream and malloc tests showing improvement at various allocation sizes when THP was disabled. Ultimately that lead them to switching to madvise with no apparent performance regressions:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1703742

Bug

In coreos-overlay, THP is set:
https://github.com/coreos/coreos-overlay/blob/master/sys-kernel/coreos-modules/files/amd64_defconfig-4.19#L216

But making madvise default also requires setting:

CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

Expected Behavior

The current behavior is expected when THP is set to [always].

Actual Behavior

See:
https://docs.google.com/spreadsheets/d/1Xl3nWapi7ZKEmpnsSHHWO96iopEG0hK6GeDWhWKSfDo/edit?usp=sharing

Reproduction Steps

  1. Install a single OSD ceph cluster.
  2. Run a background write workload using hsbench or fio sufficient to fill the ceph-osd caches.
  3. compare memory usage of the OSD process when THP is set to [always] vs [madvise]

Other Information

https://unix.stackexchange.com/questions/495816/which-distributions-enable-transparent-huge-pages-for-all-applications
https://www.percona.com/blog/2019/03/06/settling-the-myth-of-transparent-hugepages-for-databases/
https://blog.nelhage.com/post/transparent-hugepages/
https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/
https://dl.acm.org/citation.cfm?id=3359640

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions