The datadelivery project is an open source solution that provides a starter toolkit to be deployed in any AWS account in order to enable users to begin their learning path on AWS analytics services, like Athena, Glue, EMR, Redshift. It does that by supplying a Terraform module that can be called from any Terraform project for deploying all the infrastructure needed to take the first steps using analytics in AWS with public datasets to be explored.
- Have you ever wanted to have a bunch of datasets to explore in AWS?
- Have you ever wanted to take public data and start building an ETL process?
- Have you ever wanted to go deep into the Data Mesh architecture with SoR, SoT and Spec layers?
🚛 Try datadelivery!
Note Now the datadelivery project has an official documentation in readthedocs! Visit the following link and check out usability technical details, practical examples and more!
- 🚀 A pocket and disposable AWS environment
- 🪣 Automatic creation of S3 buckets using the SoR, SoT and Spec storage layers approach
- 🤖 Automatic data cataloging process using a scheduled Glue Crawler
- 🎲 Provides different dataset tables ready to be explored in any AWS analytics service
- 🔦 Destroy everything and recreate all again at a touch of a single command
When users call the datadelivery Terraform module, the following operations are performed:
- Five different buckets are created in the target AWS account
- The content of
data/
folder at the source module are uploaded to the SoR bucket - An IAM role is created with enough permissions to run a Glue Crawler
- A Glue Crawler is created with a S3 target pointing to the SoR bucket
- A cron expression is configured to trigger the Glue Crawler 2 minutes after finishing the infrastructure deployment
- All files from SoR bucket (previously on
data/
folder) are cataloged as new tables on Data Catalog - A preconfigured Athena workgroup is created in order to enable users to run queries
The datadelivery Terraform module isn't alone. There are other complementary open source solutions that can be put together to enable the full power of learning analytics on AWS. Check it out if you think they could be useful for you!
AWS Glue
Terraform
- Terraform - Creating Modules
- Terraform - Using Modules
- Terraform - Module Sources
- Medium - Maintaining Reusable Terraform Modules
- Terraform - Filesystem and Workspace Info
GitHub