Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap: Implement VirtualLock on Windows #9186

Closed
wants to merge 2 commits into from
Closed

Conversation

gmarz
Copy link
Contributor

@gmarz gmarz commented Jan 7, 2015

This PR implements mlockall like functionality on Windows by leveraging the native VirtualLock function.

As explained in #8480, unlike mlockall on *nix, VirtualLock requires a base memory address and the size of a region to lock. The only sane approach, to my knowledge, was to use VirtualQueryEx to iterate the address space of the JVM and lock each page individually.

To test this, I used a combination of a few tools:

Here's what the results look like in resource monitor when starting Elasticsearch with: -Xmx4g -Xms4g:

with bootstrap.mlockall = false

mlockall_disabled

the JVM is initialized with ~4GB of virtual memory (Commit), but only ~200MB is actual physical memory (Working Set).

with boostrap.mlockall = true

mlockall_enabled

the working set is now also ~4GB upon start up of elasticsearch.

Additionally, I've stressed my system using Testlimit and observed up to ~100 page faults/s with mlockall disabled, and 0 page faults/s with mlockall enabled.

These results indicate to me that this is working, but it would be great to get some additional eyes/testing on this.

Closes #8480

@Mpdreamz
Copy link
Member

Mpdreamz commented Jan 7, 2015

🎉 🎈 ☀️ 🌟

@@ -59,7 +60,11 @@

private void setup(boolean addShutdownHook, Tuple<Settings, Environment> tuple) throws Exception {
if (tuple.v1().getAsBoolean("bootstrap.mlockall", false)) {
Natives.tryMlockall();
if (Platform.isWindows()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use org.apache.lucene.util.Constants.WINDOWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call @kimchy, didn't know that existed.

@tlrx
Copy link
Member

tlrx commented Jan 8, 2015

I can observe the same behavior on Windows 2012 R2 / 64 bit: Working Set and Private are almost identical. Got a high number of Hard fault/sec with mlockall but I think the cause is that nearly 100% of the memory was allocated.

Good job :)

@gmarz
Copy link
Contributor Author

gmarz commented Jan 8, 2015

@tlrx great news, thanks for testing and for the review :).

When did you notice the page faults? Initially during start up or afterwards while ES has been running for a while? Did you use any tools like Testlimit mentioned above?

@tlrx
Copy link
Member

tlrx commented Jan 9, 2015

@gmarz I tested it again with more RAM (8gb) allocated to the virtual machine and 4gb for ES. Page faults appears at startup time - up to 150 pf/sec - then decreases slowly to 0.

I don't know how to use TestLimit, but if you want me to check page faults with this tool I'll be happy to have some example commands to run :)

@gmarz
Copy link
Contributor Author

gmarz commented Jan 12, 2015

@tlrx I think the high number of page faults initially is normal since we are accessing pages. The important thing is that they eventually decrease to 0 and stay there.

Simplest way to use testlimit is to run it with the -r flag, which will reserve memory 1MB at a time:

.\testlimit64.exe -r

Once you reach > 90% memory usage, send a bunch of index and query requests to ES.

My results with -Xmx8g -Xms8g

mlockall disabled:

page_faults_mlockall_disabled

mlockall enabled:

page_faults_mlockall_enabled

@tlrx
Copy link
Member

tlrx commented Jan 14, 2015

Just tested with testlimit64.exe -r and -Xms4g -Xmx4g.

mlockall disabled:

  • hard faults increases up to 10 then decrease to 0 and stay constant
  • Commit ~4Gb, Working ~600, PrivateKb ~600Kb

mlockall disabled:

  • hard faults increases up to ~50 then decrease to 0 and stay constant
  • Commit ~4Gb, Working ~4Gb, PrivateKb ~4Gb

Everything looks OK to me 👍

@gmarz
Copy link
Contributor Author

gmarz commented Jan 14, 2015

Thanks @tlrx !

@clintongormley
Copy link

nice work @gmarz!

@henakamaMSFT
Copy link
Contributor

Passing on some feedback:
Emulating mlockall(MCL_CURRENT) using VirtualQuery+VirtualLock will materialize/pull into memory even those pages that would otherwise not be accessed. If they call VirtualLock on every valid VA range in the process it will materialize a lot of unnecessary demand-zero pages (for private allocations) or cause unnecessary disk reads (for memory-mapped files/DLLs).

For mlockall(MCL_FUTURE) the closest thing on Windows is SetProcessWorkingSetSizeEx with the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag. This seems like a more reasonable option to me.

Note that the minimum working set size works as a sort of memory reservation, so for example on a 64 GB system you can’t have two processes asking for 32 GB each. The combined size of all working set minimums has to be smaller than total RAM, and it can’t be very close to that limit, otherwise unrelated reservations or non-pageable allocations can start failing. The exact threshold depends on what else is running on the system, but it’s probably a good idea to leave at least 5-10% of RAM available to other reservations/allocations.

@gmarz
Copy link
Contributor Author

gmarz commented Jan 29, 2015

@henakamaMSFT thanks for the feedback! I have a few questions/comments. Any further clarification would be really appreciated.

For mlockall(MCL_FUTURE) the closest thing on Windows is SetProcessWorkingSetSizeEx with the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag. This seems like a more reasonable option to me.

The current implementation of mlockall in elasticsearch for *nix is mlockall(MCL_CURRENT). Since we recommend setting ES_HEAP_SIZE and initialize the JVM with a fix amount of memory, I don't believe there's a need to lock future allocations - only the currently mapped pages that are initialized by the JVM. That said, where does SetProcessWorkingSetSize fit in terms of emulating mlockall(MCL_CURRENT)? Are we correct in increasing the working set size by ES_HEAP_SIZE before attempted to lock pages in the working set?

Note that the minimum working set size works as a sort of memory reservation, so for example on a 64 GB system you can’t have two processes asking for 32 GB each. The combined size of all working set minimums has to be smaller than total RAM, and it can’t be very close to that limit, otherwise unrelated reservations or non-pageable allocations can start failing. The exact threshold depends on what else is running on the system, but it’s probably a good idea to leave at least 5-10% of RAM available to other reservations/allocations.

That makes sense. Since it's recommended that ES_HEAP_SIZE doesn't exceed 50% of the total RAM, I don't think this is an issue.

Emulating mlockall(MCL_CURRENT) using VirtualQuery+VirtualLock will materialize/pull into memory even those pages that would otherwise not be accessed. If they call VirtualLock on every valid VA range in the process it will materialize a lot of unnecessary demand-zero pages (for private allocations) or cause unnecessary disk reads (for memory-mapped files/DLLs).

Are we saying that VirtualQuery+VirtualLock is not viable? If so, is there a another way, or a work around to avoid materializing such pages?

@henakamaMSFT
Copy link
Contributor

That said, where does SetProcessWorkingSetSize fit in terms of emulating mlockall(MCL_CURRENT)? Are we correct in increasing the working set size by ES_HEAP_SIZE before attempted to lock pages in the working set?

If the total amount of memory you want to VirtualLock is X then you need to call SetProcessWorkingSetSize and increase the minimum working set size to X plus a small overhead. X + 1 MB should work.

Are we saying that VirtualQuery+VirtualLock is not viable?

You can measure how much extra IO and memory usage VirtualLock is causing in your case (compared to simply accessing most of the data and code you think you’re going to need), and decide whether it’s acceptable.

If so, is there a another way, or a work around to avoid materializing such pages?

If you can somehow find all the allocations you care about, you can just VirtualLock those, instead of locking every committed region in the process.

Alternatively, instead of VirtualLock, you can use SetProcessWorkingSetSizeEx with the hard minimum flag, as I mentioned previously. This way only pages that are actually accessed by your app will be materialized/read from disk. But the OS will guarantee that it will not trim your process as long as its working set size stays below the limit. The end result is similar to what you’re doing now, but with less overhead.

@s1monw s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015

public static final int MEM_COMMIT = 0x1000;

public static class MEMORY_BASIC_INFORMATION extends Structure {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this class name need to be all upper and not camel cased?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so please add some documentation :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, doesn't need to be. It was just a convention to emphasize that it represents a native structure, but it can and probably should be camel cased instead.

@gmarz
Copy link
Contributor Author

gmarz commented Apr 29, 2015

Closing in favor of #10887

Thank you @henakamaMSFT and team for the helpful feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support VirtualLock on Windows
9 participants