#

data-engineering

Here are 3,178 public repositories matching this topic...

jijo-james / data-engineering-pet-projects

This repo is my experimental projects on Data Engineering.

python airflow sql etl data-engineering

Updated Mar 6, 2023
Python

leonidee / spark-hadoop-automation-in-cloud

Automate Apache Spark in Hadoop with Airflow in Cloud

airflow apache-spark hadoop data-engineering

Updated Jul 16, 2023
Python

LoveNui / EMR-AWS-APACHE-SPARK

aws airflow big-data spark data-engineering data-analysis

Updated Jul 15, 2023
Python

horony / udacity-nanodegree-data-engineering

Project files originating from my 2023 Nanodegree Data Engineering.

udacity spark python3 data-engineering udacity-nanodegree

Updated Feb 10, 2023
Jupyter Notebook

khushal2405 / ETL-pipeline-using-Airflow-and-AWS-EMR

We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics

python aws airflow scala spark apache-spark etl s3 s3-bucket aws-emr pyspark data-engineering

Updated Feb 25, 2023
Python

mukmookk / streamDAQ

real time nasdaq data pipeline

python data-engineering webcrawling

Updated Aug 15, 2023
Python

leonardohss0 / etl-sql-s3-redshift

Keywords: Python, Airflow, AWS, S3, Redshift, ETL

airflow etl data-engineering

Updated Apr 29, 2023
Python

lucasbalponti / Apache-Airflow---Pipeline-de-dados

pipeline data-engineering dag apache-airflow vitrinedev

Updated Apr 26, 2023
Python

juliaobenauer / Data-Pipelines-with-Airflow

Udacity project within the Data Engineer Nanodegree

python airflow sql etl data-engineering

Updated Nov 26, 2022
Python

deliveroo / data-sink-client

Client for data-sink

data-engineering

Updated Aug 5, 2022
Ruby

2uinc / incubator-superset

Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application

data-engineering business-intelligence

Updated Apr 24, 2020
Python

mining-information-for-you / ha-bioinformatica

Bioinformática Hospital de Amor de Barretos.

data-science bioinformatics genomics data-engineering data-lake gene-annotation

Updated Nov 21, 2018
Jupyter Notebook

returnString / pgpromise

Promisified async PostgreSQL queries for R

data-science r postgresql data-engineering

Updated Jun 7, 2018
C++

marianajo / beam-examples

Examples that I use to learn and show Apache Beam

etl examples data-engineering apache-beam data-processing etl-pipeline data-engineering-workflows data-engineering-pipeline

Updated Oct 24, 2018
Python

rvsandeep / H1B-Analytics

Coding Challenge as part of Insight Data Engineering Program

statistics data-engineering data-science-challenges insight-data-engineering

Updated Oct 30, 2018
Python

ihnokim / datk

Data Analysis Toolkit (DATK)

data-science deep-learning signal-processing image-processing pandas data-engineering data-analysis

Updated Apr 14, 2021
CSS

hanyang2019 / Project_Employee_Database

A Research Project Incorporating Data Modeling, Data Engineering and Data Analysis on Employees of a Corporation

sqlalchemy sql postgresql python3 sql-query data-engineering data-analysis matplotlib jupyter-notebooks data-modeling pgadmin4

Updated Dec 29, 2019
Jupyter Notebook

PeterMorrison1 / PGNParser

A small project to practice extracting large data sets - specifically a chess dataset.

python chess data-engineering

Updated Jul 25, 2019
Python

bhaargav006 / dota-pipeline

A BigData pipeline for DotA2

big-data data-engineering database-modeling

Updated Dec 11, 2022
CSS

MPPDataScience

Alyxion / MPPDataScience

This repository contains my results of the Microsoft Professional Program for Data Science.

data-science cloud data-engineering ethics

Updated Jan 25, 2021
Jupyter Notebook

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."