This repo is my experimental projects on Data Engineering.
-
Updated
Mar 6, 2023 - Python
This repo is my experimental projects on Data Engineering.
Automate Apache Spark in Hadoop with Airflow in Cloud
Project files originating from my 2023 Nanodegree Data Engineering.
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
Keywords: Python, Airflow, AWS, S3, Redshift, ETL
Udacity project within the Data Engineer Nanodegree
Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application
Bioinformática Hospital de Amor de Barretos.
Promisified async PostgreSQL queries for R
Examples that I use to learn and show Apache Beam
Coding Challenge as part of Insight Data Engineering Program
Data Analysis Toolkit (DATK)
A Research Project Incorporating Data Modeling, Data Engineering and Data Analysis on Employees of a Corporation
A small project to practice extracting large data sets - specifically a chess dataset.
This repository contains my results of the Microsoft Professional Program for Data Science.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."