Skip to content

Conversation

@maxispeicher
Copy link
Contributor

Issue #461:

Description of changes:
Introduce local caching of query metadata for Athena queries.
A new helper class _LocalMetadataCacheManager was added in awswrangler/athena/_utils.py. It maintains a dict which serves as the cache structure and a priority queue (with the help of heapq) for maintaining the order based on the submission time.

The cache procedure follows these steps:

  1. read_sql_query is called
  2. Based on the config value max_remote_cache_entries the configured (default=50) number of query execution ids are retrieved by calling list_query_executions
  3. For every id it is checked if the item already exists in the cache.
  4. For all ids that don't exist in the cache the query execution metadata is retrieved
  5. The cache is updated with these items by first removing all items that are older than the oldest item in the cache. Then if the cache would get bigger than the config value max_local_cache_entries (default=100), the oldest items are removed. Finally the new items are added to the cache.
  6. (when a query finished the metadata is also added to the cache)

When accessing the cache the dict is filtered to only use successful queries with statement type DDL or DML and return the cached entries ordered by completion time.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@igorborgest igorborgest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another great contributions @maxispeicher.
Really elegant implementation, thanks!

@igorborgest igorborgest added this to the 2.3.0 milestone Jan 5, 2021
@igorborgest igorborgest self-assigned this Jan 7, 2021
@igorborgest igorborgest merged commit 8cb0b79 into aws:master Jan 7, 2021
@maxispeicher maxispeicher deleted the athena-metadata-caching branch January 7, 2021 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants