Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openssl abstract.Suite definitely broken #18

Closed
dyv opened this issue Jan 19, 2015 · 5 comments
Closed

Openssl abstract.Suite definitely broken #18

dyv opened this issue Jan 19, 2015 · 5 comments
Labels

Comments

@dyv
Copy link

dyv commented Jan 19, 2015

Essentially if you do a lot of openssl operations (Add) on the elliptic curves everything stops working. You start getting sig panics that trace back to the openssl function EC_POINT_point2oct originating from the openssl/curve.go:219. This has been verified on both Mac OS and Linux. In addition to this error, occasionally double frees will happen or unallocated frees. This has been checked with the -race flag and no data races were detected either. This bug is non-deterministic in how it fails, if it fails, and if it fails silently or loudly.

coco.test(61675,0xb0104000) malloc: *** error for object 0x5001e30: double free
*** set a breakpoint in malloc_error_break to debug
15 announces
SIGABRT: abort
PC=0x7fff9565f282
signal arrived during cgo execution

goroutine 59 [syscall, locked to thread]:
coco.test(61675,0xb028d000) malloc: *** error for object 0x5223610: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
runtime.cgocall_errno(0x40019e0, 0xc2081bdb98, 0xc200000000)
    /Users/dylanvisher/gosource/go/src/runtime/cgocall.go:130 +0x104 fp=0xc2081bdb78 sp=0xc2081bdb50
github.com/dedis/crypto/openssl._Cfunc_EC_POINT_mul(0x5200960, 0x5222a40, 0x5222950, 0x0, 0x0, 0x52016b0, 0xc200000000)
    /Users/dylanvisher/go/src/github.com/dedis/crypto/openssl/:304 +0x3c fp=0xc2081bdb98 sp=0xc2081bdb78
github.com/dedis/crypto/openssl.(*point).Mul(0xc2080ee380, 0x0, 0x0, 0x4422f6064 announces
, 0xc2081881b0, 0x0, 0x0)
    /Users/dylanvisher/go/src/github.com/dedis/crypto/openssl/curve.go:197 +0xd2 fp=0xc2081bdc18 sp=0xc2081bdb98
github.com/dedis/prifi/coco.(*SigningNode).Commit(0xc20803bee0)
    /Users/dylanvisher/go/src/github.com/dedis/prifi/coco/collectiveSigning.go:68 +0x28c fp=0xc2081bddf0 sp=0xc2081bdc18
github.com/dedis/prifi/coco.(*SigningNode).Announce(0xc20803bee0, 0xc2080ee040)
    /Users/dylanvisher/go/src/github.com/dedis/prifi/coco/collectiveSigning.go:61 +0x41b fp=0xc2081bdf40 sp=0xc2081bddf0
github.com/dedis/prifi/coco.func·001()
    /Users/dylanvisher/go/src/github.com/dedis/prifi/coco/collectiveSigning.go:38 +0x1fc fp=0xc2081bdfe0 sp=0xc2081bdf40
runtime.goexit()
    /Users/dylanvisher/gosource/go/src/runtime/asm_amd64.s:2430 +0x1 fp=0xc2081bdfe8 sp=0xc2081bdfe0
created by github.com/dedis/prifi/coco.(*SigningNode).Listen
    /Users/dylanvisher/go/src/github.com/dedis/prifi/coco/collectiveSigning.go:44 +0xa1

This package is unusable, but a good alternative seems to be the nist package which does not have this problem.

@dyv dyv added the bug label Jan 19, 2015
@bford
Copy link
Contributor

bford commented Jan 22, 2015

Thanks for finding and checking out this bug. Can you point out a torture-test that reliably reproduces it (sooner or later)? For example, have you observed BenchmarkPointEncode alone to reproduce the problem if run long enough? I tried:

go test -v -bench PointEncode -benchtime 60s

...but I wasn't immediately able to reproduce it at least in that way.

One thing to try immediately when any likely allocation-related bug crops up using the OpenSSL library is try NULLing out the BN_CTX parameters in the relevant OpenSSL library calls, to see if we might be misusing or corrupting a BN_CTX object in some way. Not using a BN_CTX will be slower but might be slightly more robust.

Ah, wait - are you using a single cipher.Suite object across multiple threads, i.e., shared by multiple virtual "nodes" in your intra-process test framework? That could definitely cause the problem, because BN_CTX is not safe for concurrent use by multiple threads (and isn't supposed to be), and as a result an openssl Suite object probably isn't either. Can you try just creating a Suite object per virtual node (i.e., per thread) and see if that makes the problem go away?

If it does, this brings up the obvious question of whether Suite objects "should" be thread-safe. I can see arguments both ways.

@bford
Copy link
Contributor

bford commented Jan 28, 2015

Dylan, have you had a chance to verify definitively whether or not this is a thread-safety issue, as I suggested above? It should be a 10-minute test at most, and would be nice to know for certain even if we decide to punt on the "right" solution for now (as I'm inclined to). Thanks.

@davidiw
Copy link
Contributor

davidiw commented Jan 29, 2015

I'm of the understanding that OpenSSL isn't thread-safe by default, but
maybe in some situations it is. It may be necessary to implement the
following hooks:

https://www.openssl.org/docs/crypto/CRYPTO_lock.html

On Wed, Jan 28, 2015 at 1:52 PM, Bryan Ford notifications@github.com
wrote:

Dylan, have you had a chance to verify definitively whether or not this is
a thread-safety issue, as I suggested above? It should be a 10-minute test
at most, and would be nice to know for certain even if we decide to punt on
the "right" solution for now (as I'm inclined to). Thanks.


Reply to this email directly or view it on GitHub
#18 (comment).

@bford
Copy link
Contributor

bford commented Jan 29, 2015

Point and Secret arithmetic operations aren't supposed to be thread-safe, in that you should be using a given Point or Secret object in one thread at a time anyway, as with most purely computational code. Adding thread-safety to Point/Secret arithmetic would be useless to just about everyone and would just slow things down.

If this is the particular thread-safety issue I suspect it is, however, then the problem is very specific to the fact that OpenSSL's bignum library uses this allocation-pooling optimization that tries to reuse bignum objects to reduce allocations. Reasonable, except the bignum CTX object that's intended to be that cache of bignum objects is probably not designed to be thread-safe, intentionally for performance reasons: you're supposed to be using one per thread. So there are three obvious solutions:

  1. Just don't use CTX objects at all; set all those parameters to null, which is allowed, but will probably slow things down by preventing the caching optimizations.
  2. Have one big CTX object and make access to that thread-safe, e.g., by taking and releasing a master lock around any OpenSSL-based arithmetic operation that needs to use that ctx object. This would work but would obviously be bad for parallelism.
  3. Implement a thread-safe pool of CTX objects, where any thread that comes into the OpenSSL code grabs one temporarily (via thread-safe primitives), does the operation it needs to do, then dumps the CTX object back on the pool (again using thread-safe primitives). This would at least permit some parallelism, but contention on the CTX object lock might eventually be a problem - if we end up using the OpenSSL-based implementation of this stuff for anything important, which we might not.
  4. The "proper" approach would be to use per-thread state: i.e., have a per-thread CTX object that gets used automatically whenever we invoke an OpenSSL-based bignum/EC routine. But Go still doesn't have native support for per-thread state as far as I can tell, and I don't know if the pthread-based threadpsecific state functions will work in the context of a Go program (might be worth trying).

But in any case, it's not worth investing much time in this, at least not now, because we're not sure whether we really want to use the OpenSSL-based cipher suite for anything but testing/benchmarking against. So perhaps it's OK (for now) if it's just not thread-safe at all.

@ineiti
Copy link
Member

ineiti commented Jun 23, 2016

wontfix

@ineiti ineiti closed this as completed Jun 23, 2016
ineiti pushed a commit that referenced this issue Mar 19, 2018
* not printing unknown nil type in error message
* testing unmarshaling with wrong type
armfazh pushed a commit to armfazh/kyber that referenced this issue Apr 13, 2023
* Verification of subgroup element for schnorr sigs

When using a non prime order group over an elliptic curve, a point of
the curve can belong to the correct curve but not necessarily to the correct
prime order subgroup subgroup. That commits additionally ensures that this is
the case with the given point if the functionality allows for it.
The scalar must be checked as well that it is in the right range, but currently
group/mod is used which automatically checks that property.

* Outsource the verification of a DKG packet

DKG packets can now be verified _before_ they are passed to the internal
protocol if the option is specified. That allows to preprocess the packet before
passing them to the application if a given packet needs to be rebroadcasted,
which is the case for broadcast protocol.
armfazh pushed a commit to armfazh/kyber that referenced this issue Apr 24, 2024
* Verification of subgroup element for schnorr sigs

When using a non prime order group over an elliptic curve, a point of
the curve can belong to the correct curve but not necessarily to the correct
prime order subgroup subgroup. That commits additionally ensures that this is
the case with the given point if the functionality allows for it.
The scalar must be checked as well that it is in the right range, but currently
group/mod is used which automatically checks that property.

* Outsource the verification of a DKG packet

DKG packets can now be verified _before_ they are passed to the internal
protocol if the option is specified. That allows to preprocess the packet before
passing them to the application if a given packet needs to be rebroadcasted,
which is the case for broadcast protocol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants