Fix horrific performance found by gmaxwell. #740

Merged
merged 1 commit into from Jan 5, 2012

Conversation

Projects
None yet
6 participants
Contributor

TheBlueMatt commented Jan 3, 2012

I hereby certify that I have greped for ever use of CDataStream and find no case where secure_allocator is necessary.
Note that this fix increases blockchain download by gmaxwell's estimate of some "1000x faster"

Member

luke-jr commented Jan 3, 2012

would be much nicer if you just typedef std::vector vector_type; :P

Contributor

TheBlueMatt commented Jan 3, 2012

I prefer it the other way, but meh I dont care; changed.

Member

gmaxwell commented Jan 3, 2012

well, ... 1000x is perhaps an exaggeration. More like 40-50x or so. A complete chain syncup here took 29 minutes on code with all the mlock removed. On the same system syncing to about height 37k took three and a half hours earlier today. I'm not patient enough to benchmark a complete chain syncup without the fix, though perhaps someone else will.

This problem was really tricky to find— oprofile cycle sampling was completely blind to the slowdown caused by mlock, I guess if I had hooked GLOBAL_TLB_FLUSHES I would have seen it but who would guess to do that? Valgrind's callgrind couldn't see it (unsurprisingly). I eventually found it with ltrace.

In general we shouldn't be calling mlock per-allocation for anything. Even on private key data it's really too slow to use more than a couple times during execution, though I guess it's not completely horrific if just used for that. Instead we should pre-allocate a mlocked arena and allocate from that for private keys. Perhaps boost has something to make this easier?

Independent of the that, CDataStream's wide usage and allocation behavior make me cringe. It's creating a LOT of tiny indirect heap allocation calls all over the code. That can't be good for performance. (though obviously not on the same scale as the mlock overuse)

Contributor

gavinandresen commented Jan 3, 2012

Error compiling, compiler version problem?

i686-apple-darwin10-llvm-g++-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.6)

serialize.h:872: error: ‘CDataStream::CDataStream(const std::vector<char, std::allocator >&, int, int)’ cannot be overloaded
serialize.h:867: error: with ‘CDataStream::CDataStream(const std::vector<char, std::allocator >&, int, int)’

Also, not zeroing freed memory any more makes me nervous, because memory containing leftover garbage is a good building block for remote code exploits. Maybe a free_after_delete_allocator... (does boost already define one?)

Owner

sipa commented Jan 3, 2012

Same error here, on Ubuntu. I'm also in favor of keeping the zeroing functionality, even for non-mlocked allocations.

Member

gmaxwell commented Jan 3, 2012

Boost apparently does have an allocator, which you can override how it gets memory: http://www.boost.org/doc/libs/1_47_0/libs/pool/doc/interfaces/user_allocator.html
If we want to be properly security paranoid, and we're using a specialized allocator, we could potentially add canary functionality just like the stack protection in addition to the zeroize. I'd have to dump the allocator usage data to see how much overhead that might have. (e.g. add a word after (and perhaps before) every allocation which is checked on free)

I think the mlock security improvement is pretty inconsequential. The significance of the zeroizing at making use-after-free exploits hard is more significant. While we're discussing this, we should note that if we use our own allocator we should add IFDEFed out valgrind macros (see memcheck.h IIRC) so that we don't reduce valgrind's sensitivity.

Contributor

TheBlueMatt commented Jan 3, 2012

Fixed (I need to stop coding so late, probably also why I didnt catch this bug before I included it in my encryption pull...)
My 2 cents (well, ok more like 1): use-after-free exploits are so rare its almost not worth doing the freeing. When you combine that with the fact that the item in question is a std::vector, not a class, I would argue its even less of an issue. If someone wants to use the stuff in the vector to call a nasty function, they would be 100x more likely to be able to while its still allocated and in memory, not once it has been freed/reallocated.

Contributor

TheBlueMatt commented Jan 3, 2012

Oops more to fix( I had to board and didn't have time to test it) I'll fix it when I'm in the air and have wifi

gmaxwell reply@reply.github.com wrote:

Boost apparently does have an allocator, which you can override how it
gets memory:
http://www.boost.org/doc/libs/1_47_0/libs/pool/doc/interfaces/user_allocator.html
If we want to be properly security paranoid, and we're using a
specialized allocator, we could potentially add canary functionality
just like the stack protection in addition to the zeroize. I'd have to
dump the allocator usage data to see how much overhead that might
have. (e.g. add a word after (and perhaps before) every allocation
which is checked on free)

I think the mlock security improvement is pretty inconsequential. The
significance of the zeroizing at making use-after-free exploits hard is
more significant. While we're discussing this, we should note that if
we use our own allocator we should add IFDEFed out valgrind macros (see
memcheck.h IIRC) so that we don't reduce valgrind's sensitivity.


Reply to this email directly or view it on GitHub:
#740 (comment)

Contributor

TheBlueMatt commented Jan 4, 2012

Actually fixed in that commit.

Owner

laanwj commented Jan 4, 2012

ACK, this is great :)

Contributor

gavinandresen commented Jan 4, 2012

I love the performance improvement, but I still don't like the elimination of zero-after-free. Security in depth is important.

Here's the danger:

Attacker finds a remotely-exploitable buffer overrun somewhere in the networking code that crashes the process.
They turn the crash into a full remote exploit by sending carefully constructed packets before the crash packet, to initialize used-but-then-freed memory to a known state.

Unlikely? Sure.

Is it ugly to define a zero_after_free_allocator for CDataStream? Sure. (simplest implementation: copy secure_allocator, remove the mlock/munlock calls).

But given that CDataStream is the primary interface between bitcoin and the network, I think being extra paranoid here is a very good idea.

Member

gmaxwell commented Jan 4, 2012

The performance difference from avoiding the zeroization doesn't appear to be huge: It saves about 30 seconds out of a 30 minute full sync (compared to not zeroizing).

Chart showing all mlocks gone, vs bluematt's patch: http://people.xiph.org/~greg/bitcoin-sync.png The bluematt patch gets a late start due to the time spent filling the keypool, but because its a bit faster it eventually catches up.

The mlocks on the keying stuff are still problematic: The keypool refill takes 17 seconds with the mlocks in, <1 second with them out. Users with encrypted wallets will often do a mass refill when they unlock. To resolve that the frequency of mlocking should be reduced. The pool allocator I linked to above should make that easy to use (make secure_allocator use the pool, make the pool use malloc + mlock).

The frequent mlock usage may also be creating security weaknesses in the form of mlock ineffectiveness. If the process leaks mlocked pages then eventually it will hit the limit (e.g. I think most linux desktops have a 1MB/process limit or something like that), and past the limit pages will no longer be mlocked.

Contributor

TheBlueMatt commented Jan 4, 2012

I was thinking it would make more of a performance difference, but as long as its very minor, I just made it zero_after_free.

Member

gmaxwell commented Jan 4, 2012

BlueMatt: I don't think you should have to provide a allocate(). Though maybe it's fine to leave it in case we convert it into a canary allocator.

Owner

sipa commented Jan 4, 2012

ACK

Contributor

TheBlueMatt commented Jan 5, 2012

@gmaxwell heh, of course...maybe I just need to give up programming at this point...

Member

gmaxwell commented Jan 5, 2012

ACK

gavinandresen merged commit 7486c64 into bitcoin:master Jan 5, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment