Skip to content
This repository has been archived by the owner. It is now read-only.

Transparent Huge Pages set to [always] is sub-optimal for many applications #2635

markhpc opened this issue Nov 13, 2019 · 0 comments


Copy link

@markhpc markhpc commented Nov 13, 2019

Issue Report

Transparent Huge Pages provides real benefit to certain applications by potentially reducing TLB misses and improving performance. For other applications, it can bloat memory usage and cause performance regressions. The kernel documentation claims that [madvise] is the default behavior:

"madvise" will enter direct reclaim like "always" but only for regions
that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

However in mm/Kconfig it turns out the default behavior is actually to use [always]:

By default coreos enables transparent huge pages, but doesn't specify if it wants to use always or madvise by default, so always is chosen. Unfortunately setting THP to [always] causes issues with a variety of software:

Go runtime: golang/go#8832
node.js: nodejs/node#11077
tcmalloc: gperftools/gperftools#1073

More recently, we've also seen memory usage bloat in Ceph (using tcmalloc) when THP is set to always potentially resulting in OOM when running inside containers. There are various ways to potentially work around this at the application level including using MADV_NOHUGEPAGE or a prctl flag. Requiring these workarounds to disable THP for a given application is counter-intuitive for several reasons:

  1. It puts the onus on developers to explicitly stop the kernel from engaging in sub-optimal behavior.

  2. It's incredibly confusing to have a system-wide default that claims to "always" enable a setting that many applications may or may not silently disable through workarounds.

Finally, when another prominent distribution was faced with a similar choice, they ran stream and malloc tests showing improvement at various allocation sizes when THP was disabled. Ultimately that lead them to switching to madvise with no apparent performance regressions:


In coreos-overlay, THP is set:

But making madvise default also requires setting:



What hardware/cloud provider/hypervisor is being used to run Container Linux?

Expected Behavior

The current behavior is expected when THP is set to [always].

Actual Behavior


Reproduction Steps

  1. Install a single OSD ceph cluster.
  2. Run a background write workload using hsbench or fio sufficient to fill the ceph-osd caches.
  3. compare memory usage of the OSD process when THP is set to [always] vs [madvise]

Other Information

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant