Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Locked memory manager #8753
Add a pool for locked memory chunks, replacing
This is something I've been wanting to do for a long time. The current approach of preventing swapping of sensitive information by locking objects where they happen to be on the stack or heap in-place causes a lot of mlock/munlock system call churn, slowing down any handling of keys.
Not only that, but locked memory is a limited resource and using a lot of it bogs down the system by increasing overall swappiness, so the previous approach of locking every page that may contain any key information (but also other information) is wasteful.
Thus replace it with a consolidated pool of locked memory, so that chunks of "secure" memory can be allocated and freed without any system calls, and there is little memory overhead as possible (for example, administrative structures are not themselves in locked memory). The pool consists of one of more arenas, which divide a contiguous memory range into chunks. Arenas are allocated per 256 kB (configurable). If all current arenas are full, allocate a new one. Arenas are directly allocated from the OS with the appropriate memory page allocation API. No arenas are ever freed unless the program exits.
Immediately after startup, loading a fairly large wallet.
Amount of memory locked
With this patch:
With this patch:
Concept ACK! I'm glad that you're working on this. I think it's the right approach.
Indeed. I've also been thinking about heartbleed-like attacks. Currently key data is scattered all around the heap and stack, with this approach it is consolidated in a few places which are separate from the normal heap where e.g. network buffers are allocated.
It would help even more if the secret data is separated with a 'moat' of unmapped pages from the normal heap, so that a large read can't get there.
I've done nothing special to accomplish this at the moment, though, apart from trying to use the OS page allocation directly. Which reminds me that on POSIX I should probably be using
Gah, that needs a silly cast to uint64_t (I guess this error comes up on 32-bit platforms?).
The higher level is already wrote by @gmaxwell, no need to repeat it.
No memory locked at all? Or when we jump out of the limit, you do not lock anything?
Hmm, arena is 256k min. Will try with lower arena size.
Changed arenasize to 128k and:
It allocates and locks memory per arena. If locking the first arena (of 256Kib) fails, nothing will be locked. You could set the
Yes on Ubuntu it's also unlimited by default. OpenBSD has 5 MiB. 64k seems utterly useless.
Done, it should always get one arena of locked memory as long as the limit is larger then 0. If not it will act as a NonLockedPoolManager, nothing else to do.
You might want to make the allocation increment just one page and on start have an initial allocation that is equal to whatever typical usage is in the current configuration. This would both reduce over-allocation and increase the chances that we get all that ulimit would allow. Not a strong opinion, just a design tweak. Guard pages sound like a good idea. They should be at least as large as any object that exists in the system. Causing the locked page pool to be at a random address would be helpful too.
The practical problem here is that having tons of one-page (or two-page for that matter) arenas reduces performance, at least with the current implementation. I don't think allocating 256kB (or whatever the ulimit is, if it is less) on the first time this is used is such a bad compromise given Bitcoin Core's frequent use of this memory. As said my relatively large (not huge) wallet already requires 512kiB (which is really ~300KiB rounded up, but we're not programming for the C64 here).
Also there would be some residual memory wasted if the pool consists of 4k/8k blocks spread around. And mind the syscall overhead when requesting from OS in such small quantities.
Note that I'm all for changing the 256kB parameter to say, 128kB, if that is marginally better. As I've documented in the comments it's a compromise.
Hm, I guess, by specifying a random argument to mmap (modulus the virtual address size, rounded to a page) and then 'probing' the address space where it can be put. I think this is a great idea for later, but I'm not aiming to do so in this pull (maybe we can borrow some code from ASLR?). Also here you don't want the individual arenas too small or it'd spray the address space to unusability, as well as burden the MMU tables.
Good idea, I did this for the class field at least (chKey to vchKey and chIV to vchIV), makes sense as a general suggestion.
Nov 2, 2016
1 check passed
added a commit
this pull request
Nov 2, 2016
OS X, clang (one
somewhere before its usage?