GitHub - ThiagoPanini/datadelivery: A Terraform module that provides an efficient way to activate pieces and services in an AWS account in order to enable users to explore preselected public datasets.

Overview

The datadelivery project is an open source solution that provides a starter toolkit to be deployed in any AWS account in order to enable users to begin their learning path on AWS analytics services, like Athena, Glue, EMR, Redshift. It does that by supplying a Terraform module that can be called from any Terraform project for deploying all the infrastructure needed to take the first steps using analytics in AWS with public datasets to be explored.

Have you ever wanted to have a bunch of datasets to explore in AWS?
Have you ever wanted to take public data and start building an ETL process?
Have you ever wanted to go deep into the Data Mesh architecture with SoR, SoT and Spec layers?

🚛 Try datadelivery!

Note Now the datadelivery project has an official documentation in readthedocs! Visit the following link and check out usability technical details, practical examples and more!

Features

🚀 A pocket and disposable AWS environment
🪣 Automatic creation of S3 buckets using the SoR, SoT and Spec storage layers approach
🤖 Automatic data cataloging process using a scheduled Glue Crawler
🎲 Provides different dataset tables ready to be explored in any AWS analytics service
🔦 Destroy everything and recreate all again at a touch of a single command

How Does it Work?

When users call the datadelivery Terraform module, the following operations are performed:

Five different buckets are created in the target AWS account
The content of data/ folder at the source module are uploaded to the SoR bucket
An IAM role is created with enough permissions to run a Glue Crawler
A Glue Crawler is created with a S3 target pointing to the SoR bucket
A cron expression is configured to trigger the Glue Crawler 2 minutes after finishing the infrastructure deployment
All files from SoR bucket (previously on data/ folder) are cataloged as new tables on Data Catalog
A preconfigured Athena workgroup is created in order to enable users to run queries

Combining Solutions

The datadelivery Terraform module isn't alone. There are other complementary open source solutions that can be put together to enable the full power of learning analytics on AWS. Check it out if you think they could be useful for you!

Contacts

References

AWS Glue

Glue Crawler Prerequisites

Terraform

GitHub

GitHub - terraform-validate Action

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github		.github
data		data
docs		docs
policy		policy
.gitignore		.gitignore
README.md		README.md
catalog.tf		catalog.tf
iam.tf		iam.tf
locals.tf		locals.tf
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
storage.tf		storage.tf
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Features

How Does it Work?

Combining Solutions

Contacts

References

About

Releases 4

Packages

Languages

ThiagoPanini/datadelivery

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Features

How Does it Work?

Combining Solutions

Contacts

References

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages