New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transparent Huge Page support #5816
Comments
I wrote a really small library to test this feature without modifying proton:
It can be compiled with I was not sure where the heap was, so I just told the kernel to use huge pages everywhere. A quick test is not showing any use of huge pages via In any case, this naive approach did not work. I still think the proposal on reddit is a potentially good idea, but it needs further investigation. :/ |
It turns out that it is necessary to preallocate huge pages via Someone on reddit pointed out that there is already libhugetlbfs for this. However, some testing found that it did nothing (even after mounting hugetlbfs). I tried using
This seems to be a known issue in glibc 2.33 and newer: As mentioned there, glibc 2.35 has added support for huge pages as a workaround. Provided that
However, after looking at
I imagine that the non huge page allocations are coming from wine's memory allocator, whose allocations matter far more for games than glibc's memory allocation. I guess wine's allocator needs to be patched to leverage huge pages if available. :/ |
Someone in discord tried using glxgears to test this. I am not sure if that is the best test, but it is easy to run. I repeated his test after making some minor changes (such as turning off compositing in kwin) and I found there to be a very slight improvement:
Wine's memory allocator needs to be patched before any tests are worth doing on windows software running on Linux, but native software such as
|
These |
I doubt we're interested in this unless someone can demonstrate a real performance benefit. |
@Patola I do not think glxgears is a good benchmark, since it is tiny. It unlikely has TLB misses (although we can measure that with Linux perf). You need to test something bigger like a real game. I only did it since I did not want to explain why the numbers from glxgears someone else posted were wrong after seeing someone else do that on discord. My posting the numbers was to preempt someone else posting numbers that were taken in a bad way, even if I knew that glxgears was not a great benchmark to evaluate this. To be honest, I was surprised to see anything that looked like an improvement in it at all. :/ That said, I would only expect a 1-3% improvement from lower TLB misses. Maybe systems with integrated graphics would benefit more.
@aeikum Would would consitute a real performance benefit? If we can consistently observe a 1% improvement somewhere that is relevant to games on Linux, would that be enough? I do not want to set unrealistic expectations. The theoretical benefits of huge pages are:
I would not expect this to give more than a 1-3% performance improvement. Finding it would require that we either find a native game that demonstrates a real improvement through the glibc huge page code or patch wine so that people could test games for a real improvement. |
I did some benchmarks in both Shadow of the Tomb Raider and Civilization VI. I will not post the Shadow of the Tomb Raider data since it showed identical performance. However, half of the allocations in Shadow of The Tomb Raider still used 4K sizes, so perhaps glibc's method for using huge pages did not work for the allocations that matter. I tried Civilization 6's AI benchmark under the assumption that it would use glibc's allocator for its AI decisions. This showed a slight improvement:
That is an average improvement of 0.8544%. My test hardware was:
I guess huge pages have a smaller impact than I thought. Maybe someone with integrated graphics like the guy in the reddit thread would see a bigger improvement. :/ |
I rebooted my PC and did a last ditch effort to test huge page support on it. This time, I started Shadow of the Tomb Raider after a fresh boot and ran 3 benchmark runs back to back. I then set Shadow of the Tomb Raider to start with
Here are the results:
This is a 2.52% improvement in FPS. This appears to be from Linux forcing transparent huge page support on allocations that do not use glibc malloc because prior tests without Here are links to screenshots showing my graphical settings: I have a 4K display, but I had to change the game resolution to 1920x1080 while reducing the resolution modifier to its lowest to get the game to be 100% CPU bound. Additionally, while I have Nvidia graphics, it seems that the Intel and AMD graphics drivers have huge page support: https://lists.freedesktop.org/archives/dri-devel/2017-August/150732.html There would probably be a greater benefit on systems with integrated graphics. I suspect this could benefit the steam deck. We should not need to force the kernel to always use transparent huge pages to gain this performance increase. Instead, proper use of |
Also, a link to a screenshot of the best result without huge pages: And a link to a screenshot of the best result with huge pages: |
For what it's worth, on my system i definitely see an improvement with Huge Pages enabled. I tested CPU bound performance in The Witcher 3 (800x600 windowed and all settings to low) on my low-end Intel Pentium G4620 and AMD RX 560 and got around 7% higher performance with Transparent Huge Pages enabled: 102 FPS without THP vs 109 FPS with THP. My GPU is underloaded in both cases, but with THP it is slightly less underloaded. I tested several times to be sure, with Huge Pages i always get a bit higher performance. Well, at least in The Witcher 3, haven't tested other games yet. |
Nice! What did you do to enable it? |
@GenocideStomper I enabled it with:
|
From my own research: transparent hugepages are a different mechanism from hugepages. The latter have to be manually requsted using one of the several different allocators ( From the documentation (https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html), the guaranteed way to get a transparent hugepage is to set the Quote:
You can monitor usage (again, all of this is in the kernel documentation) by grepping for So - my conclusion is, kernel knows what its doing without manual meddling. |
I came to a similar conclusion last week, but forgot to post it. There is no need to statically allocate huge pages to use huge pages. |
The thing with hugepages is that their usability/effectiveness can be changed by memory fragmentation when system has been in use for a while. So the measurements should cover various cases where people are playing games (and long sessions). Originally hugepages were implemented for use with virtual machines (IIRC) and some software have observed worse results with them for some reason (though I don't why Hadoop would suffer from it.. maybe it has been fixed?) So it might not be a universal win for every case and it would be beneficial to test more in various cases. |
I can confirm. Hugepages and transparent hugepages share a similar goal but are obtained differently. THP doesn't need pre-allocated hugepage pools and doesn't need complex configuration, however THP are limited to anonymous memory regions such as heap and stack space, and can lead to performance degradation while memory pressure is high, as the CPU will spend more time to find a memory region or need to defrag memory. With hugepages you gain a more fine-grained controls, and probably other types of memory region, but require more configuration. |
Probably not too convincing of a sample, but may be relevant to share in this issue: https://www.youtube.com/watch?v=DSGKq5KSkPw&t=520s Steam Deck user talks about enabling THP and shows some graphs to compare before / after frame rates. They demonstrate a 10% improvement on the min frame rate (or rather While that is perhaps not too impressive, it might be a better way to look at the benefits beyond average / max FPS improvements. Opt-in via
Beyond that some distros default to |
It's
Use Value to |
Feature Request
I confirm:
contain this feature already.
Description
Add an environment variable for turning on transparent huge page support on the heap via
madvise(addr, size, MADV_HUGEPAGE)
.Additional changes could be made to further improve transparent huge page support, such as:
MEM_LARGE_PAGES
inVirtualAlloc
in Wine.Justification [optional]
This is Linux specific, so I doubt Wine would accept the patch.
This should be done because fewer TLB misses from the use of transparent huge pages should slightly improve CPU bound performance in games running under Proton. Implementing an option to use to turn it on for evaluation purposes would probably be best. That would allow the community to gather data to determine whether it should be on by default.
Risks [optional]
Setting it system-wide is said to harm performance:
https://www.reddit.com/r/linux_gaming/comments/uhfjyt/comment/i75z26g/
It is possible that this could cause a performance regression.
References [optional]
There are claims on reddit that games can benefit from this:
https://www.reddit.com/r/linux_gaming/comments/uhfjyt/underrated_advice_for_improving_gaming/
The manual page states:
https://man7.org/linux/man-pages/man2/madvise.2.html
This describes a video game.
Microsoft documentation on Windows
MEM_LARGE_PAGES
:https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
Kernel documentation on THP support:
https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html
The text was updated successfully, but these errors were encountered: