Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when called from react native module #189

Closed
izone-airstream opened this issue Feb 22, 2020 · 8 comments · Fixed by #191
Closed

Crash when called from react native module #189

izone-airstream opened this issue Feb 22, 2020 · 8 comments · Fixed by #191

Comments

@izone-airstream
Copy link

izone-airstream commented Feb 22, 2020

We use GRPC-Swift in our recent project, which imports swift-nio-ssl for channels.

The implementation of the app called a grpc TLS connection from react-native's native module, the app crashed right away at sdallocx, right after OPENSSL_free(orig_ptr), at CNIOBoringSSL -> crypto -> mem.c -> OPENSSL_realloc.

It looks like it was caused by boringSSL's context allocating failure. To verify that, we turn off the GRPC's TLS, the app stop crashing.

We also test the swift-grpc by calling directly from a native swift application, using TLS, which also works well without any crash.

We tried to find some similar issues at stripe-ios (stripe/stripe-ios#1293), which eventually was found due to Some of BoringSSL symbols in their code weren't prefixed, causing conflicts with other copies of BoringSSL and OpenSSL in end user apps. Their solution was to add prefix to wide range of their BoringSSLl's symbols.

So we are wondering that would it be the same reason causing our app's crashing?

Cheers

@Lukasa
Copy link
Contributor

Lukasa commented Feb 22, 2020

We do aggressively prefix our boringSSL symbols. That includes OPENSSL_malloc, OPENSSL_realloc, and OPENSSL_free, all of which are actually in the binary as CNIOBoringSSL_OEPNSSL_malloc etc.

Do you have the crash report? It would be helpful to see exactly where the crash is and what's triggering it.

@izone-airstream
Copy link
Author

I write a simple project only call swift-gPRC with TLS in RN, it still crash.
https://github.com/izone-airstream/RN-Swift-gRPC-TLS-Crash

A crash report is also attached under the repo. Thanks

@agnosticdev
Copy link
Contributor

The crash in thread 23 looks like it's trying to allocate outside the memory region.
Check out size_t sk_insert() in stack.c for more insight.

0   ???                           	0x0000000106d2f878 sdallocx + 0
1   org.reactjs.native.example.RNSwiftGRPC	0x00000001064ae128 CNIOBoringSSL_OPENSSL_realloc + 200 (mem.c:159)
2   org.reactjs.native.example.RNSwiftGRPC	0x00000001064c7919 CNIOBoringSSL_sk_insert + 233 (stack.c:184)
3   org.reactjs.native.example.RNSwiftGRPC	0x00000001064c7ff4 CNIOBoringSSL_sk_push + 36 (stack.c:321)
4   org.reactjs.native.example.RNSwiftGRPC	0x000000010650bd68 CNIOBoringSSL_sk_SSL_CIPHER_push + 40 (CNIOBoringSSL_ssl.h:1258)
5   org.reactjs.native.example.RNSwiftGRPC	0x00000001065372a9 bssl::CNIOBoringSSL::ssl_create_cipher_list(std::__1::unique_ptr<bssl::CNIOBoringSSL::SSLCipherPreferenceList, bssl::CNIOBoringSSL::internal::Deleter<bssl::CNIOBoringSSL::SSLCipherPreferenceList> >*, char const*, bool) + 2073 (ssl_cipher.cc:1254)

To start debugging this I would take a look at how this TLSConfiguration is being created/used

var config = ClientConnection.Configuration(
	target: .hostAndPort("homegraph.googleapis.com", 443),
	eventLoopGroup: group
	,tls: .init(configuration: TLSConfiguration.forClient()) //comment out this line to turn off the TLS, which stop the crashing.
)

compared to how it is used in NIOSSLHTTP1CLIENT

var cert: [NIOSSLCertificateSource] = []
var key: NIOSSLPrivateKeySource?
var trustRoot: NIOSSLTrustRoots = .default
let tlsConfiguration = TLSConfiguration.forClient(trustRoots: trustRoot,
                                                  certificateChain: cert,
                                                  privateKey: key,
                                                  renegotiationSupport: .once)
let sslContext = try! NIOSSLContext(configuration: tlsConfiguration)

This should get your started.

@Lukasa
Copy link
Contributor

Lukasa commented Feb 22, 2020

This is interesting. It seems like this is quite probably an interaction with the strange declaration of sdallocx. sdallocx on Linux or Apple platforms is intended to be weak-linked, with a fallback implementation that is defined in terms of free. This should only actually be used if sdallocx is statically linked with the binary. Weirdly, we're ending up jumping into the __DATA region of the binary, which is why we see the SIGBUS. That's very odd.

Do you happen to have any logs from the compile?

@izone-airstream
Copy link
Author

izone-airstream commented Feb 22, 2020

Yes, I add build log to the repo.
https://github.com/izone-airstream/RN-Swift-gRPC-TLS-Crash/raw/master/ios/build-log.txt

Follow your thread, I noticed that the Facebook library "Folly", which is used by RN, do use sadllocx here:
https://github.com/facebook/folly/blob/master/folly/memory/detail/MallocImpl.h

@Lukasa
Copy link
Contributor

Lukasa commented Feb 24, 2020

Follow your thread, I noticed that the Facebook library "Folly", which is used by RN, do use sadllocx here:

This helped push us down the right road. Specifically, Folly doesn't just use sdallocx: if you don't set FOLLY_HAVE_WEAK_SYMBOLS and statically link jemalloc with the binary then Folly will define sdallocx to the nullptr.

The problem is that BoringSSL wants to play a game with finding sdallocx where it will try to weak link it with a fallback, so that if you link an allocator that defines sdallocx then Boring will use it, otherwise it'll use the static copy in the file.

The result is that if you statically link Folly and BoringSSL into the same binary, Folly will helpfully define sdallocx as a bunch of 0s, and then the linker will happily use that as a relocation target for sdallocx. This then leads to nasty type confusion because the linker is jumping to the sdallocx location, but that location is actually a pointer to sdallocx, and specifically a nil pointer to sdallocx.

The end result is that BoringSSL and Folly are just not compatible with being statically linked together into the same binary. Unfortunate but true.

For us, to maximise our compatibility we'll compile out this "optimisation" from BoringSSL. That will technically cost us a small amount of performance when running with jemalloc, but it won't matter much because only BoringSSL was using the sized free anyway, and so the perf gain from sized free is pretty small.

@davidben, this is probably relevant for you to be aware of. I don't know if this is something you actually want to do anything about, but you should at least know about it for future cases. I don't know anyone involved with the Folly library, but if anyone does please point them at this issue for their own diagnostic purposes. It's not clear to me that either library is objectively doing the wrong thing, but they definitely don't play nice together.

@davidben
Copy link
Contributor

davidben commented Apr 1, 2020

@Lukasa We changed our weak symbol pattern in https://boringssl-review.googlesource.com/c/boringssl/+/40405. This might fix this, but for different reasons than we thought when we made the change. See discussion at https://boringssl-review.googlesource.com/c/boringssl/+/40405/9#message-0e7e7e16e7576e76dffe8a4be77fe3b1102f4112

Folly's handling of these symbols is invalid, however. This is ultimately a bug that must be fixed in Folly. If weak symbols aren't supported, they fill in sdallocx and friends with function pointers whose value is NULL. While this, syntactically, supports the same spelling of the check within Folly, they are not the same thing. That means that Folly is introducing type errors into libraries linked with it.

One option for them would be something like:

#if !FOLLY_HAVE_WEAK_SYMBOLS
inline namespace folly_weak_symbol_compat {
void* (*mallocx)(size_t, int) = nullptr;
...
}
#endif

with a similar inline namespace in the header files, and dropping the extern "C" around it all. This way the code which checks it can consistently write sdallocx != nullptr, but they never clobber the true sdallocx symbol, which belongs to the malloc implementation. Another would be to the compatibility nullptrs into their header as static symbols or macros or something.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 1, 2020

Thanks for that clarification @davidben, it's good to know. I'll see if we can find a way to bucket-brigade this to someone who maintains Folly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants