-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Athena queries to address cumulative 1GB read limitation #41544
Comments
This PR aims to reduce the windows sent from `SearchUnstructuredEvents`. StartTime is never updated so it's kept with its initial value which causes problems to the event handler sending windows that include more than 1Gb of data when using the Athena backend which causes failures. Syncs with gravitational/teleport-plugins#1068 Related to #41544 Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
This PR aims to reduce the windows sent from `SearchUnstructuredEvents`. StartTime is never updated so it's kept with its initial value which causes problems to the event handler sending windows that include more than 1Gb of data when using the Athena backend which causes failures. Syncs with gravitational/teleport-plugins#1068 Related to #41544 Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
…41730) * Event-handler: call `SearchUnstructuredEvents` with smaller windows This PR aims to reduce the windows sent from `SearchUnstructuredEvents`. StartTime is never updated so it's kept with its initial value which causes problems to the event handler sending windows that include more than 1Gb of data when using the Athena backend which causes failures. Syncs with gravitational/teleport-plugins#1068 Related to #41544 Signed-off-by: Tiago Silva <tiago.silva@goteleport.com> * fix lint * add configurable window size --------- Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
This PR aims to reduce the windows sent from `SearchUnstructuredEvents`. StartTime is never updated so it's kept with its initial value which causes problems to the event handler sending windows that include more than 1Gb of data when using the Athena backend which causes failures. Syncs with gravitational/teleport-plugins#1068 Related to #41544 Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
…dows (#41904) * Event-handler: call `SearchUnstructuredEvents` with smaller windows This PR aims to reduce the windows sent from `SearchUnstructuredEvents`. StartTime is never updated so it's kept with its initial value which causes problems to the event handler sending windows that include more than 1Gb of data when using the Athena backend which causes failures. Syncs with gravitational/teleport-plugins#1068 Related to #41544 Signed-off-by: Tiago Silva <tiago.silva@goteleport.com> * fix lint * add configurable window size --------- Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
Short term: we should update the limit set on the workgroup to something higher than 1GB. Long term: we either need to better track how much data exists for a given time period and proactively limit the timeframe we query for, or we need to alter our data model to allow for better partitioning of the data. From running a few test queries manually the biggest factor in the amount of data scanned was the ORDER BY clause.
The Another area with room for improvement is the file size of the parquet files stored in S3. Auth currently buffers messages received from SQS for a very short while and then writes them to S3 to reduce the time it takes for audit events to be visible to users. However, that results in lots of very small files(<5MB) in S3. All of the documentation regarding best practices recommends storing files in S3 that are 128MB in size. The AWS glue documentation specifically mentions the following
|
This is only valid if no search is specified. Once we have some kind of search it will scan more data if he doesn't find all the data. We can attempt to optimize by ensuring that events are written in order before being saved to Parquet. This allows us to eliminate the need for the ORDER BY clause. As a result, non-filtered queries will be more efficient, as Athena will ingest only the necessary data rather than all data within the period. This approach will significantly reduce costs for default queries and will allow us to optimize other queries that require some filtering by event types |
#Description:
Issue:
Currently, Teleport relies on Athena queries for retrieving logs within specified date ranges. While this approach has been effective, it faces a critical limitation due to Athena's cumulative 1GB read limitation. When the cumulative size of files containing logs for a given date range exceeds 1GB, the query fails silently, leading to potential disruptions in system functionality without proper error handling or notification to the client.
Problem Statement:
The cumulative 1GB read limitation imposed by Athena poses a significant challenge for Teleport, particularly when querying logs spanning extensive date ranges or dealing with high-volume log data. When this limit is exceeded, the query abruptly terminates, leaving the system failing to deliver the expected results to the client. Even with limits on the queries, the query fails because Athena must read all files.
Proposed Solution:
To address this limitation and ensure the reliability of log retrieval operations, it's essential to optimize Athena queries not only based on costs but also with a focus not reaching cumulative data read limits.
gravitational/teleport-plugins#1068 implements a mitigation to the problem by requesting smaller windows but the problem should be handled by the Athena backend itself if possible
The text was updated successfully, but these errors were encountered: