New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap: Implement VirtualLock on Windows #9186
Conversation
🎉 🎈 ☀️ 🌟 |
@@ -59,7 +60,11 @@ | |||
|
|||
private void setup(boolean addShutdownHook, Tuple<Settings, Environment> tuple) throws Exception { | |||
if (tuple.v1().getAsBoolean("bootstrap.mlockall", false)) { | |||
Natives.tryMlockall(); | |||
if (Platform.isWindows()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use org.apache.lucene.util.Constants.WINDOWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call @kimchy, didn't know that existed.
I can observe the same behavior on Windows 2012 R2 / 64 bit: Good job :) |
@tlrx great news, thanks for testing and for the review :). When did you notice the page faults? Initially during start up or afterwards while ES has been running for a while? Did you use any tools like Testlimit mentioned above? |
@gmarz I tested it again with more RAM (8gb) allocated to the virtual machine and 4gb for ES. Page faults appears at startup time - up to 150 pf/sec - then decreases slowly to 0. I don't know how to use TestLimit, but if you want me to check page faults with this tool I'll be happy to have some example commands to run :) |
@tlrx I think the high number of page faults initially is normal since we are accessing pages. The important thing is that they eventually decrease to 0 and stay there. Simplest way to use testlimit is to run it with the -r flag, which will reserve memory 1MB at a time:
Once you reach > 90% memory usage, send a bunch of index and query requests to ES. My results with mlockall disabled: mlockall enabled: |
Just tested with mlockall disabled:
mlockall disabled:
Everything looks OK to me 👍 |
Thanks @tlrx ! |
nice work @gmarz! |
Passing on some feedback: For mlockall(MCL_FUTURE) the closest thing on Windows is SetProcessWorkingSetSizeEx with the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag. This seems like a more reasonable option to me. Note that the minimum working set size works as a sort of memory reservation, so for example on a 64 GB system you can’t have two processes asking for 32 GB each. The combined size of all working set minimums has to be smaller than total RAM, and it can’t be very close to that limit, otherwise unrelated reservations or non-pageable allocations can start failing. The exact threshold depends on what else is running on the system, but it’s probably a good idea to leave at least 5-10% of RAM available to other reservations/allocations. |
@henakamaMSFT thanks for the feedback! I have a few questions/comments. Any further clarification would be really appreciated.
The current implementation of mlockall in elasticsearch for *nix is mlockall(MCL_CURRENT). Since we recommend setting ES_HEAP_SIZE and initialize the JVM with a fix amount of memory, I don't believe there's a need to lock future allocations - only the currently mapped pages that are initialized by the JVM. That said, where does SetProcessWorkingSetSize fit in terms of emulating mlockall(MCL_CURRENT)? Are we correct in increasing the working set size by ES_HEAP_SIZE before attempted to lock pages in the working set?
That makes sense. Since it's recommended that ES_HEAP_SIZE doesn't exceed 50% of the total RAM, I don't think this is an issue.
Are we saying that VirtualQuery+VirtualLock is not viable? If so, is there a another way, or a work around to avoid materializing such pages? |
If the total amount of memory you want to VirtualLock is X then you need to call SetProcessWorkingSetSize and increase the minimum working set size to X plus a small overhead. X + 1 MB should work.
You can measure how much extra IO and memory usage VirtualLock is causing in your case (compared to simply accessing most of the data and code you think you’re going to need), and decide whether it’s acceptable.
If you can somehow find all the allocations you care about, you can just VirtualLock those, instead of locking every committed region in the process. Alternatively, instead of VirtualLock, you can use SetProcessWorkingSetSizeEx with the hard minimum flag, as I mentioned previously. This way only pages that are actually accessed by your app will be materialized/read from disk. But the OS will guarantee that it will not trim your process as long as its working set size stays below the limit. The end result is similar to what you’re doing now, but with less overhead. |
|
||
public static final int MEM_COMMIT = 0x1000; | ||
|
||
public static class MEMORY_BASIC_INFORMATION extends Structure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this class name need to be all upper and not camel cased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if so please add some documentation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, doesn't need to be. It was just a convention to emphasize that it represents a native structure, but it can and probably should be camel cased instead.
Closing in favor of #10887 Thank you @henakamaMSFT and team for the helpful feedback! |
This PR implements
mlockall
like functionality on Windows by leveraging the native VirtualLock function.As explained in #8480, unlike
mlockall
on *nix,VirtualLock
requires a base memory address and the size of a region to lock. The only sane approach, to my knowledge, was to use VirtualQueryEx to iterate the address space of the JVM and lock each page individually.To test this, I used a combination of a few tools:
Here's what the results look like in resource monitor when starting Elasticsearch with:
-Xmx4g -Xms4g
:with
bootstrap.mlockall
=false
the JVM is initialized with ~4GB of virtual memory (Commit), but only ~200MB is actual physical memory (Working Set).
with
boostrap.mlockall
=true
the working set is now also ~4GB upon start up of elasticsearch.
Additionally, I've stressed my system using Testlimit and observed up to ~100 page faults/s with mlockall disabled, and 0 page faults/s with mlockall enabled.
These results indicate to me that this is working, but it would be great to get some additional eyes/testing on this.
Closes #8480