Skip to content

InitAPI+ShutdownAPI+InitAPI crashes #2119

@jrosdahl

Description

@jrosdahl

Describe the bug

I have a test suite where each test calls Aws::InitAPI, then executes a test and finally calls Aws::ShutdownAPI. The second test then crashes in Aws::InitAPI. This problem started occurring when upgrading aws-sdk-cpp from version 1.8.186 to 1.9.352.

This looks similar to #2102, but that issue is closed and I don't quite understand what the reporter's solution mentioned in #2102 (comment) would be.

Expected Behavior

A second Aws::InitAPI does not crash after Aws::InitAPI + Aws::ShutdownAPI.

Current Behavior

A second Aws::InitAPI crashes after Aws::InitAPI + Aws::ShutdownAPI.

Reproduction Steps

Compile and run the following program, linking against aws-sdk-cpp 1.9.352:

#include <aws/core/Aws.h>

int main()
{
  Aws::SDKOptions options;

  Aws::InitAPI(options);
  Aws::ShutdownAPI(options);

  Aws::InitAPI(options); // line 10
  Aws::ShutdownAPI(options);
}

The program will crash at line 10.

GDB backtrace:

(gdb) bt
#0  0x00007ffff791166f in s2n_x509_trust_store_wipe (store=0x118) at crt/aws-crt-cpp/crt/s2n/tls/s2n_x509_validator.c:135
#1  0x00007ffff78e0988 in s2n_config_cleanup (config=0x0) at crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:125
#2  0x00007ffff78e1373 in s2n_config_free (config=0x0) at crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:360
#3  0x00007ffff78bfa84 in s_s2n_ctx_destroy (s2n_ctx=0x82ba20) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1292
#4  0x00007ffff78c0bd1 in s_tls_ctx_new (alloc=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>, options=0x7fffffffbb68, mode=S2N_CLIENT) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1613
#5  0x00007ffff78c0c3c in aws_tls_client_ctx_new (alloc=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>, options=0x7fffffffbb68) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1625
#6  0x00007ffff783b758 in Aws::Crt::Io::TlsContext::TlsContext (this=0x7fffffffbcd0, options=..., mode=Aws::Crt::Io::TlsMode::CLIENT, allocator=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>) at crt/aws-crt-cpp/source/io/TlsOptions.cpp:423
#7  0x00007ffff773ebe2 in Aws::InitAPI (options=...) at aws-cpp-sdk-core/source/Aws.cpp:79
#8  0x0000000000400a2a in main () at reproduce.cpp:10

Valgrind output:

==176222== Memcheck, a memory error detector
==176222== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==176222== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==176222== Command: ./reproduce
==176222== 
==176222== Invalid read of size 8
==176222==    at 0x52E166F: s2n_x509_trust_store_wipe (s2n_x509_validator.c:135)
==176222==    by 0x52B0987: s2n_config_cleanup (s2n_config.c:125)
==176222==    by 0x52B1372: s2n_config_free (s2n_config.c:360)
==176222==    by 0x528FA83: s_s2n_ctx_destroy (s2n_tls_channel_handler.c:1292)
==176222==    by 0x5290BD0: s_tls_ctx_new (s2n_tls_channel_handler.c:1613)
==176222==    by 0x5290C3B: aws_tls_client_ctx_new (s2n_tls_channel_handler.c:1625)
==176222==    by 0x520B757: Aws::Crt::Io::TlsContext::TlsContext(Aws::Crt::Io::TlsContextOptions&, Aws::Crt::Io::TlsMode, aws_allocator*) (TlsOptions.cpp:423)
==176222==    by 0x510EBE1: Aws::InitAPI(Aws::SDKOptions const&) (Aws.cpp:79)
==176222==    by 0x400A29: main (reproduce.cpp:10)
==176222==  Address 0x118 is not stack'd, malloc'd or (recently) free'd
==176222== 
==176222== 
==176222== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==176222==  Access not within mapped region at address 0x118
==176222==    at 0x52E166F: s2n_x509_trust_store_wipe (s2n_x509_validator.c:135)
==176222==    by 0x52B0987: s2n_config_cleanup (s2n_config.c:125)
==176222==    by 0x52B1372: s2n_config_free (s2n_config.c:360)
==176222==    by 0x528FA83: s_s2n_ctx_destroy (s2n_tls_channel_handler.c:1292)
==176222==    by 0x5290BD0: s_tls_ctx_new (s2n_tls_channel_handler.c:1613)
==176222==    by 0x5290C3B: aws_tls_client_ctx_new (s2n_tls_channel_handler.c:1625)
==176222==    by 0x520B757: Aws::Crt::Io::TlsContext::TlsContext(Aws::Crt::Io::TlsContextOptions&, Aws::Crt::Io::TlsMode, aws_allocator*) (TlsOptions.cpp:423)
==176222==    by 0x510EBE1: Aws::InitAPI(Aws::SDKOptions const&) (Aws.cpp:79)
==176222==    by 0x400A29: main (reproduce.cpp:10)

Possible Solution

From the backtrace, it seems incorrect for s_s2n_ctx_destroy (frame #⁠3) to pass a null config to s2n_config_free, given that s2n_config_free doesn't perform a null check. But how come the config is null and the init code path is calling cleanup functions?

As far as I can tell, this is what happens:

  1. The first Aws::InitAPI calls Aws::InitializeCrtAws::New<Aws::Crt::ApiHandle>Aws::Crt::ApiHandle::ApiHandleAws::Crt::s_initApiaws_mqtt_library_initaws_io_library_initaws_tls_init_static_states2n_disable_atexit, which sets atexit_cleanup = false.
  2. aws_tls_init_static_state then calls s2n_init, which sets initialized = true.
  3. The first Aws::ShutdownAPI calls Aws::CleanupCrtAws::Delete<Aws::Crt::ApiHandle>Aws::Crt::ApiHandle::~ApiHandleaws_s3_library_clean_upaws_http_library_clean_upaws_io_library_clean_upaws_tls_clean_up_static_states2n_cleanup. initialized is still true and atexit_cleanup is still false.
  4. The second Aws::InitAPI calls Aws::InitializeCrtAws::Crt::ApiHandle::ApiHandleAws::Crt::s_initApiaws_mqtt_library_initaws_io_library_initaws_tls_init_static_states2n_disable_atexit, which fails since initialized is true.
  5. aws_tls_init_static_state sets s_s2n_initialized_externally = true and therefore doesn't call s2n_init.
  6. Aws::InitAPIAws::Crt::Io::TlsContext::TlsContextaws_tls_client_ctx_news_tls_ctx_news2n_config_new fails since s2n_init hasn't been called.
  7. s_s2n_ctx_destroy is called due to the above failure.
  8. s2n_config_free is not prepared for freeing a half-initialized s2n_ctx.

I haven't verified this, but from code inspection I think that the regression might have been introduced in awslabs/aws-c-io@4e462a0 since before that change the return value of s2n_disable_atexit did not affect whether aws_tls_init_static_state calls s2n_init.

To me it looks like the problem is that Aws::ShutdownAPI doesn't fully clean up the s2n state, so the fix I came up with is to let s2n_cleanup_atexit_impl reset initialized and atexit_cleanup to their original values:

--- a/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c
+++ b/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c
@@ -88,6 +88,9 @@ static bool s2n_cleanup_atexit_impl(void)
     /* the configs need to be wiped before resetting the memory callbacks */
     s2n_wipe_static_configs();
 
+    initialized = false;
+    atexit_cleanup = true;
+
     return s2n_result_is_ok(s2n_rand_cleanup_thread()) &&
            s2n_result_is_ok(s2n_rand_cleanup()) &&
            s2n_result_is_ok(s2n_locking_cleanup()) &&

Additional Information/Context

No response

AWS CPP SDK version used

1.9.352

Compiler and Version used

gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)

Operating System and version

CentOS 7.9.2009

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.p2This is a standard priority issueresponse-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions