Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition getting boto3.client("rds-data") #43

Closed
bruceadams opened this issue Feb 8, 2022 · 5 comments
Closed

Race condition getting boto3.client("rds-data") #43

bruceadams opened this issue Feb 8, 2022 · 5 comments

Comments

@bruceadams
Copy link

bruceadams commented Feb 8, 2022

This bit of code is not thread safe.

        if rds_data_client is None:
            self._client = boto3.client("rds-data")

The problem here is that getting a boto3 client is not thread safe. Using a single boto3 client is thread safe.

I can hit this issue consistently with multi-threaded use of SQLAlchemy when SQLAlchemy tries to get two database connections concurrently.

Once I figured out what was happening, I realized that I can pass in rds_data_client to avoid this race.

The problem can (and probably should) be avoided within this driver by assuring that the rds-data client is only gotten once, globally, across multiple instances of AuroraDataAPIClient.

Here are the three (similar) relevant stack traces (with my code at the top of the stack snipped off):

    with engine().connect() as connection:
  File \"/var/task/sqlalchemy/engine/base.py\", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File \"/var/task/sqlalchemy/engine/base.py\", line 96, in __init__
    else engine.raw_connection()
  File \"/var/task/sqlalchemy/engine/base.py\", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File \"/var/task/sqlalchemy/engine/base.py\", line 3250, in _wrap_pool_connect
    return fn()
  File \"/var/task/sqlalchemy/pool/base.py\", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File \"/var/task/sqlalchemy/pool/base.py\", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File \"/var/task/sqlalchemy/pool/base.py\", line 476, in checkout
    rec = pool._do_get()
  File \"/var/task/sqlalchemy/pool/impl.py\", line 146, in _do_get
    self._dec_overflow()
  File \"/var/task/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
    compat.raise_(
  File \"/var/task/sqlalchemy/util/compat.py\", line 207, in raise_
    raise exception
  File \"/var/task/sqlalchemy/pool/impl.py\", line 143, in _do_get
    return self._create_connection()
  File \"/var/task/sqlalchemy/pool/base.py\", line 256, in _create_connection
    return _ConnectionRecord(self)
  File \"/var/task/sqlalchemy/pool/base.py\", line 371, in __init__
    self.__connect()
  File \"/var/task/sqlalchemy/pool/base.py\", line 666, in __connect
    pool.logger.debug(\"Error on connect(): %s\", e)
  File \"/var/task/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
    compat.raise_(
  File \"/var/task/sqlalchemy/util/compat.py\", line 207, in raise_
    raise exception
  File \"/var/task/sqlalchemy/pool/base.py\", line 661, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File \"/var/task/sqlalchemy/engine/create.py\", line 590, in connect
    return dialect.connect(*cargs, **cparams)
  File \"/var/task/sqlalchemy/engine/default.py\", line 597, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File \"/var/task/aurora_data_api/__init__.py\", line 404, in connect
    return AuroraDataAPIClient(dbname=database, aurora_cluster_arn=aurora_cluster_arn,
  File \"/var/task/aurora_data_api/__init__.py\", line 45, in __init__
    self._client = boto3.client(\"rds-data\")
  File \"/var/task/boto3/__init__.py\", line 93, in client
    return _get_default_session().client(*args, **kwargs)
  File \"/var/task/boto3/session.py\", line 270, in client
    return self._session.create_client(
  File \"/var/task/botocore/session.py\", line 855, in create_client
    credentials = self.get_credentials()
  File \"/var/task/botocore/session.py\", line 457, in get_credentials
    self._credentials = self._components.get_component(
  File \"/var/task/botocore/session.py\", line 995, in get_component
    del self._deferred[name]
KeyError: 'credential_provider'
    with engine().connect() as connection:
  File \"/var/task/sqlalchemy/engine/base.py\", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File \"/var/task/sqlalchemy/engine/base.py\", line 96, in __init__
    else engine.raw_connection()
  File \"/var/task/sqlalchemy/engine/base.py\", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File \"/var/task/sqlalchemy/engine/base.py\", line 3250, in _wrap_pool_connect
    return fn()
  File \"/var/task/sqlalchemy/pool/base.py\", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File \"/var/task/sqlalchemy/pool/base.py\", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File \"/var/task/sqlalchemy/pool/base.py\", line 476, in checkout
    rec = pool._do_get()
  File \"/var/task/sqlalchemy/pool/impl.py\", line 146, in _do_get
    self._dec_overflow()
  File \"/var/task/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
    compat.raise_(
  File \"/var/task/sqlalchemy/util/compat.py\", line 207, in raise_
    raise exception
  File \"/var/task/sqlalchemy/pool/impl.py\", line 143, in _do_get
    return self._create_connection()
  File \"/var/task/sqlalchemy/pool/base.py\", line 256, in _create_connection
    return _ConnectionRecord(self)
  File \"/var/task/sqlalchemy/pool/base.py\", line 371, in __init__
    self.__connect()
  File \"/var/task/sqlalchemy/pool/base.py\", line 666, in __connect
    pool.logger.debug(\"Error on connect(): %s\", e)
  File \"/var/task/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
    compat.raise_(
  File \"/var/task/sqlalchemy/util/compat.py\", line 207, in raise_
    raise exception
  File \"/var/task/sqlalchemy/pool/base.py\", line 661, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File \"/var/task/sqlalchemy/engine/create.py\", line 590, in connect
    return dialect.connect(*cargs, **cparams)
  File \"/var/task/sqlalchemy/engine/default.py\", line 597, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File \"/var/task/aurora_data_api/__init__.py\", line 404, in connect
    return AuroraDataAPIClient(dbname=database, aurora_cluster_arn=aurora_cluster_arn,
  File \"/var/task/aurora_data_api/__init__.py\", line 45, in __init__
    self._client = boto3.client(\"rds-data\")
  File \"/var/task/boto3/__init__.py\", line 93, in client
    return _get_default_session().client(*args, **kwargs)
  File \"/var/task/boto3/session.py\", line 270, in client
    return self._session.create_client(
  File \"/var/task/botocore/session.py\", line 859, in create_client
    defaults_mode = self._resolve_defaults_mode(config, config_store)
  File \"/var/task/botocore/session.py\", line 908, in _resolve_defaults_mode
    default_config_resolver = self._get_internal_component(
  File \"/var/task/botocore/session.py\", line 729, in _get_internal_component
    return self._internal_components.get_component(name)
  File \"/var/task/botocore/session.py\", line 995, in get_component
    del self._deferred[name]
KeyError: 'default_config_resolver'
    with engine().connect() as connection:
  File \"/var/task/sqlalchemy/engine/base.py\", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File \"/var/task/sqlalchemy/engine/base.py\", line 96, in __init__
    else engine.raw_connection()
  File \"/var/task/sqlalchemy/engine/base.py\", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File \"/var/task/sqlalchemy/engine/base.py\", line 3250, in _wrap_pool_connect
    return fn()
  File \"/var/task/sqlalchemy/pool/base.py\", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File \"/var/task/sqlalchemy/pool/base.py\", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File \"/var/task/sqlalchemy/pool/base.py\", line 476, in checkout
    rec = pool._do_get()
  File \"/var/task/sqlalchemy/pool/impl.py\", line 146, in _do_get
    self._dec_overflow()
  File \"/var/task/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
    compat.raise_(
  File \"/var/task/sqlalchemy/util/compat.py\", line 207, in raise_
    raise exception
  File \"/var/task/sqlalchemy/pool/impl.py\", line 143, in _do_get
    return self._create_connection()
  File \"/var/task/sqlalchemy/pool/base.py\", line 256, in _create_connection
    return _ConnectionRecord(self)
  File \"/var/task/sqlalchemy/pool/base.py\", line 371, in __init__
    self.__connect()
  File \"/var/task/sqlalchemy/pool/base.py\", line 666, in __connect
    pool.logger.debug(\"Error on connect(): %s\", e)
  File \"/var/task/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
    compat.raise_(
  File \"/var/task/sqlalchemy/util/compat.py\", line 207, in raise_
    raise exception
  File \"/var/task/sqlalchemy/pool/base.py\", line 661, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File \"/var/task/sqlalchemy/engine/create.py\", line 590, in connect
    return dialect.connect(*cargs, **cparams)
  File \"/var/task/sqlalchemy/engine/default.py\", line 597, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File \"/var/task/aurora_data_api/__init__.py\", line 404, in connect
    return AuroraDataAPIClient(dbname=database, aurora_cluster_arn=aurora_cluster_arn,
  File \"/var/task/aurora_data_api/__init__.py\", line 45, in __init__
    self._client = boto3.client(\"rds-data\")
  File \"/var/task/boto3/__init__.py\", line 93, in client
    return _get_default_session().client(*args, **kwargs)
  File \"/var/task/boto3/session.py\", line 270, in client
    return self._session.create_client(
  File \"/var/task/botocore/session.py\", line 856, in create_client
    endpoint_resolver = self._get_internal_component('endpoint_resolver')
  File \"/var/task/botocore/session.py\", line 729, in _get_internal_component
    return self._internal_components.get_component(name)
  File \"/var/task/botocore/session.py\", line 995, in get_component
    del self._deferred[name]
KeyError: 'endpoint_resolver'
@kislyuk
Copy link
Contributor

kislyuk commented Feb 23, 2022

Thank you for identifying this issue. We'll fix it in your PR after some minor modifications.

It's unfortunate that boto3 advises that clients are thread-safe yet allows fatal races in its own client initialization code. Seems like a bug in boto3. Would you be interested in opening a bug in boto3?

@bruceadams
Copy link
Author

I'm willing to open a bug in boto3-land. I haven't done it nor really gone hunting for any existing bug.

I wanted my stuff to work, which I got to!

Getting my stuff to work involved me to digging deeply enough into aurora-data-api to be able to point clearly at the problem as it happens here and suggest a fix.

@bruceadams
Copy link
Author

Um, https://github.com/cloud-utils/aurora-data-api/blob/main/aurora_data_api/__init__.py#L41 does not fix the problem. That gets a new, separate, lock in each instance of AuroraDataAPIClient and the problem is a race between multiple instances of AuroraDataAPIClient.

@kislyuk
Copy link
Contributor

kislyuk commented Feb 25, 2022

That lock is a class variable, initialized once on load with the declaration of the class. While it's scoped to the class instead of the module, it is functionally no different. So if it doesn't fix the problem you encountered, then I believe neither would your proposed solution. Can you give it a try and see if you can reproduce the error?

@bruceadams
Copy link
Author

Ah. You are correct about the class variable. Sorry for my confusion.

For what it's worth, I would write it with explicit use of the class variable, like this:

        if rds_data_client is None:
            with AuroraDataAPIClient._client_init_lock:

In general (and not relevant here), using self to access a class variable runs the risk of the class variable being shadowed by an instance variable with the same name. I also find explicit access clearer when reading the code 🤷🏻.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants