-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openssl abstract.Suite definitely broken #18
Comments
Thanks for finding and checking out this bug. Can you point out a torture-test that reliably reproduces it (sooner or later)? For example, have you observed BenchmarkPointEncode alone to reproduce the problem if run long enough? I tried: go test -v -bench PointEncode -benchtime 60s ...but I wasn't immediately able to reproduce it at least in that way. One thing to try immediately when any likely allocation-related bug crops up using the OpenSSL library is try NULLing out the BN_CTX parameters in the relevant OpenSSL library calls, to see if we might be misusing or corrupting a BN_CTX object in some way. Not using a BN_CTX will be slower but might be slightly more robust. Ah, wait - are you using a single cipher.Suite object across multiple threads, i.e., shared by multiple virtual "nodes" in your intra-process test framework? That could definitely cause the problem, because BN_CTX is not safe for concurrent use by multiple threads (and isn't supposed to be), and as a result an openssl Suite object probably isn't either. Can you try just creating a Suite object per virtual node (i.e., per thread) and see if that makes the problem go away? If it does, this brings up the obvious question of whether Suite objects "should" be thread-safe. I can see arguments both ways. |
Dylan, have you had a chance to verify definitively whether or not this is a thread-safety issue, as I suggested above? It should be a 10-minute test at most, and would be nice to know for certain even if we decide to punt on the "right" solution for now (as I'm inclined to). Thanks. |
I'm of the understanding that OpenSSL isn't thread-safe by default, but https://www.openssl.org/docs/crypto/CRYPTO_lock.html On Wed, Jan 28, 2015 at 1:52 PM, Bryan Ford notifications@github.com
|
Point and Secret arithmetic operations aren't supposed to be thread-safe, in that you should be using a given Point or Secret object in one thread at a time anyway, as with most purely computational code. Adding thread-safety to Point/Secret arithmetic would be useless to just about everyone and would just slow things down. If this is the particular thread-safety issue I suspect it is, however, then the problem is very specific to the fact that OpenSSL's bignum library uses this allocation-pooling optimization that tries to reuse bignum objects to reduce allocations. Reasonable, except the bignum CTX object that's intended to be that cache of bignum objects is probably not designed to be thread-safe, intentionally for performance reasons: you're supposed to be using one per thread. So there are three obvious solutions:
But in any case, it's not worth investing much time in this, at least not now, because we're not sure whether we really want to use the OpenSSL-based cipher suite for anything but testing/benchmarking against. So perhaps it's OK (for now) if it's just not thread-safe at all. |
wontfix |
* not printing unknown nil type in error message * testing unmarshaling with wrong type
* Verification of subgroup element for schnorr sigs When using a non prime order group over an elliptic curve, a point of the curve can belong to the correct curve but not necessarily to the correct prime order subgroup subgroup. That commits additionally ensures that this is the case with the given point if the functionality allows for it. The scalar must be checked as well that it is in the right range, but currently group/mod is used which automatically checks that property. * Outsource the verification of a DKG packet DKG packets can now be verified _before_ they are passed to the internal protocol if the option is specified. That allows to preprocess the packet before passing them to the application if a given packet needs to be rebroadcasted, which is the case for broadcast protocol.
* Verification of subgroup element for schnorr sigs When using a non prime order group over an elliptic curve, a point of the curve can belong to the correct curve but not necessarily to the correct prime order subgroup subgroup. That commits additionally ensures that this is the case with the given point if the functionality allows for it. The scalar must be checked as well that it is in the right range, but currently group/mod is used which automatically checks that property. * Outsource the verification of a DKG packet DKG packets can now be verified _before_ they are passed to the internal protocol if the option is specified. That allows to preprocess the packet before passing them to the application if a given packet needs to be rebroadcasted, which is the case for broadcast protocol.
Essentially if you do a lot of openssl operations (Add) on the elliptic curves everything stops working. You start getting sig panics that trace back to the openssl function EC_POINT_point2oct originating from the openssl/curve.go:219. This has been verified on both Mac OS and Linux. In addition to this error, occasionally double frees will happen or unallocated frees. This has been checked with the -race flag and no data races were detected either. This bug is non-deterministic in how it fails, if it fails, and if it fails silently or loudly.
This package is unusable, but a good alternative seems to be the nist package which does not have this problem.
The text was updated successfully, but these errors were encountered: