CloudFront real-time access log processor #41119
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR provides a CloudFormation stack that exports CloudFront real-time access logs to Parquet files in an S3 bucket, supporting efficient queries from Athena or Redshift Spectrum.
Links
CloudFront:
Kinesis Firehose:
Jira:
INF-381
('Set up Real-time logs for CloudFront distributions')INF-423
('Log Pegasus HTTP logs to AWS Athena')Testing story
Manually tested, via the following steps:
SELECT * FROM cdo_access_logs.access_logs WHERE datehour > '2021/06/15' ORDER BY timestamp DESC LIMIT 10
CREATE EXTERNAL SCHEMA
query, e.g.,CREATE EXTERNAL SCHEMA cdo_access_logs FROM DATA CATALOG DATABASE 'cdo_access_logs_dev' IAM_ROLE '[role_arn]';
SELECT * FROM cdo_access_logs.access_logs WHERE datehour > '2021/06/15' LIMIT 10
Deployment strategy
aws cloudformation package
command.Follow-up work
Privacy
This feature controls the creation of HTTP access logs which contain IP addresses, user-agents and other potentially-sensitive data (depending on the data encoded in paths, query parameters, etc).
Security
cloudfront
,firehose
,redshift
,lambda
) grant least privilege to created resources.PR Checklist: