-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
I have a test suite where each test calls Aws::InitAPI, then executes a test and finally calls Aws::ShutdownAPI. The second test then crashes in Aws::InitAPI. This problem started occurring when upgrading aws-sdk-cpp from version 1.8.186 to 1.9.352.
This looks similar to #2102, but that issue is closed and I don't quite understand what the reporter's solution mentioned in #2102 (comment) would be.
Expected Behavior
A second Aws::InitAPI does not crash after Aws::InitAPI + Aws::ShutdownAPI.
Current Behavior
A second Aws::InitAPI crashes after Aws::InitAPI + Aws::ShutdownAPI.
Reproduction Steps
Compile and run the following program, linking against aws-sdk-cpp 1.9.352:
#include <aws/core/Aws.h>
int main()
{
Aws::SDKOptions options;
Aws::InitAPI(options);
Aws::ShutdownAPI(options);
Aws::InitAPI(options); // line 10
Aws::ShutdownAPI(options);
}The program will crash at line 10.
GDB backtrace:
(gdb) bt
#0 0x00007ffff791166f in s2n_x509_trust_store_wipe (store=0x118) at crt/aws-crt-cpp/crt/s2n/tls/s2n_x509_validator.c:135
#1 0x00007ffff78e0988 in s2n_config_cleanup (config=0x0) at crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:125
#2 0x00007ffff78e1373 in s2n_config_free (config=0x0) at crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:360
#3 0x00007ffff78bfa84 in s_s2n_ctx_destroy (s2n_ctx=0x82ba20) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1292
#4 0x00007ffff78c0bd1 in s_tls_ctx_new (alloc=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>, options=0x7fffffffbb68, mode=S2N_CLIENT) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1613
#5 0x00007ffff78c0c3c in aws_tls_client_ctx_new (alloc=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>, options=0x7fffffffbb68) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1625
#6 0x00007ffff783b758 in Aws::Crt::Io::TlsContext::TlsContext (this=0x7fffffffbcd0, options=..., mode=Aws::Crt::Io::TlsMode::CLIENT, allocator=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>) at crt/aws-crt-cpp/source/io/TlsOptions.cpp:423
#7 0x00007ffff773ebe2 in Aws::InitAPI (options=...) at aws-cpp-sdk-core/source/Aws.cpp:79
#8 0x0000000000400a2a in main () at reproduce.cpp:10
Valgrind output:
==176222== Memcheck, a memory error detector
==176222== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==176222== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==176222== Command: ./reproduce
==176222==
==176222== Invalid read of size 8
==176222== at 0x52E166F: s2n_x509_trust_store_wipe (s2n_x509_validator.c:135)
==176222== by 0x52B0987: s2n_config_cleanup (s2n_config.c:125)
==176222== by 0x52B1372: s2n_config_free (s2n_config.c:360)
==176222== by 0x528FA83: s_s2n_ctx_destroy (s2n_tls_channel_handler.c:1292)
==176222== by 0x5290BD0: s_tls_ctx_new (s2n_tls_channel_handler.c:1613)
==176222== by 0x5290C3B: aws_tls_client_ctx_new (s2n_tls_channel_handler.c:1625)
==176222== by 0x520B757: Aws::Crt::Io::TlsContext::TlsContext(Aws::Crt::Io::TlsContextOptions&, Aws::Crt::Io::TlsMode, aws_allocator*) (TlsOptions.cpp:423)
==176222== by 0x510EBE1: Aws::InitAPI(Aws::SDKOptions const&) (Aws.cpp:79)
==176222== by 0x400A29: main (reproduce.cpp:10)
==176222== Address 0x118 is not stack'd, malloc'd or (recently) free'd
==176222==
==176222==
==176222== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==176222== Access not within mapped region at address 0x118
==176222== at 0x52E166F: s2n_x509_trust_store_wipe (s2n_x509_validator.c:135)
==176222== by 0x52B0987: s2n_config_cleanup (s2n_config.c:125)
==176222== by 0x52B1372: s2n_config_free (s2n_config.c:360)
==176222== by 0x528FA83: s_s2n_ctx_destroy (s2n_tls_channel_handler.c:1292)
==176222== by 0x5290BD0: s_tls_ctx_new (s2n_tls_channel_handler.c:1613)
==176222== by 0x5290C3B: aws_tls_client_ctx_new (s2n_tls_channel_handler.c:1625)
==176222== by 0x520B757: Aws::Crt::Io::TlsContext::TlsContext(Aws::Crt::Io::TlsContextOptions&, Aws::Crt::Io::TlsMode, aws_allocator*) (TlsOptions.cpp:423)
==176222== by 0x510EBE1: Aws::InitAPI(Aws::SDKOptions const&) (Aws.cpp:79)
==176222== by 0x400A29: main (reproduce.cpp:10)
Possible Solution
From the backtrace, it seems incorrect for s_s2n_ctx_destroy (frame #3) to pass a null config to s2n_config_free, given that s2n_config_free doesn't perform a null check. But how come the config is null and the init code path is calling cleanup functions?
As far as I can tell, this is what happens:
- The first
Aws::InitAPIcallsAws::InitializeCrt→Aws::New<Aws::Crt::ApiHandle>→Aws::Crt::ApiHandle::ApiHandle→Aws::Crt::s_initApi→aws_mqtt_library_init→aws_io_library_init→aws_tls_init_static_state→s2n_disable_atexit, which setsatexit_cleanup = false. aws_tls_init_static_statethen callss2n_init, which setsinitialized = true.- The first
Aws::ShutdownAPIcallsAws::CleanupCrt→Aws::Delete<Aws::Crt::ApiHandle>→Aws::Crt::ApiHandle::~ApiHandle→aws_s3_library_clean_up→aws_http_library_clean_up→aws_io_library_clean_up→aws_tls_clean_up_static_state→s2n_cleanup.initializedis still true andatexit_cleanupis still false. - The second
Aws::InitAPIcallsAws::InitializeCrt→Aws::Crt::ApiHandle::ApiHandle→Aws::Crt::s_initApi→aws_mqtt_library_init→aws_io_library_init→aws_tls_init_static_state→s2n_disable_atexit, which fails sinceinitializedis true. aws_tls_init_static_statesetss_s2n_initialized_externally = trueand therefore doesn't calls2n_init.Aws::InitAPI→Aws::Crt::Io::TlsContext::TlsContext→aws_tls_client_ctx_new→s_tls_ctx_new→s2n_config_newfails sinces2n_inithasn't been called.s_s2n_ctx_destroyis called due to the above failure.s2n_config_freeis not prepared for freeing a half-initializeds2n_ctx.
I haven't verified this, but from code inspection I think that the regression might have been introduced in awslabs/aws-c-io@4e462a0 since before that change the return value of s2n_disable_atexit did not affect whether aws_tls_init_static_state calls s2n_init.
To me it looks like the problem is that Aws::ShutdownAPI doesn't fully clean up the s2n state, so the fix I came up with is to let s2n_cleanup_atexit_impl reset initialized and atexit_cleanup to their original values:
--- a/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c
+++ b/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c
@@ -88,6 +88,9 @@ static bool s2n_cleanup_atexit_impl(void)
/* the configs need to be wiped before resetting the memory callbacks */
s2n_wipe_static_configs();
+ initialized = false;
+ atexit_cleanup = true;
+
return s2n_result_is_ok(s2n_rand_cleanup_thread()) &&
s2n_result_is_ok(s2n_rand_cleanup()) &&
s2n_result_is_ok(s2n_locking_cleanup()) &&Additional Information/Context
No response
AWS CPP SDK version used
1.9.352
Compiler and Version used
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Operating System and version
CentOS 7.9.2009