Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution does not scale for large existing AWS accounts. #4

Open
BigDataDaddy opened this issue Jan 23, 2021 · 3 comments
Open

Solution does not scale for large existing AWS accounts. #4

BigDataDaddy opened this issue Jan 23, 2021 · 3 comments

Comments

@BigDataDaddy
Copy link

I have a large existing AWS account with a few years of CloudTrail log data already in my source S3 bucket. When deploying this solution and manually running the first crawler it runs, but does not finish, at least in any reasonable amount of time. I ran the CloudTrailRawCrawler crawler for 24 hours and it didn't finish the first crawl of the source CloudTrail bucket. I suspect this is due to a few years worth of daily partitions and very large number of small existing CloudTrail log files. Not this source S3 bucket only contains the CloudTrail logs for one account that is 99% dominated by 1 AWS region. So, there aren't an unreasonable number of partitions to crawl.

Is there any way to speed up or parallelize the initial crawl?

@BigDataDaddy
Copy link
Author

BTW, I chose this repo over several other CloudTrail partitioners for 2 reasons:

  1. Transforming the data from horrible JSON to Parquet is the absolute right thing to do for query speed, especially in Athena.
  2. I love the use of a terraform module to deploy compared to the cryptic AWS CloudFormation (CF) or Cloud Developer Kit (CDK).

Thanks to Alex for that!!!

@BigDataDaddy
Copy link
Author

Is anyone responding to issues for this repo?

@BigDataDaddy
Copy link
Author

Still no response to scaling this solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant