Skip to content

Launchpad2DataEng offers a comprehensive introduction to the skills and concepts essential for a successful career in data engineering, all while fostering an open-source community where learners and experts alike can share insights, ask questions, and contribute to the growth of everyone involved.

License

thetechhustle/Launchpad2DataEng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project-Based Data Engineering Learning

Welcome to our open-source project aimed at fostering practical data engineering skills! This project is inspired by the approach of breaking into data engineering with zero cost and a focus on hands-on projects. We will guide you through setting up a real-world data pipeline using modern tools and technologies like Python, BigQuery/Snowflake, and Astronomer. Whether you're a beginner looking to dive into data engineering or an experienced professional aiming to brush up on your skills, this project i...

Overview

This project outlines a step-by-step approach to building a data engineering pipeline, from sourcing data to implementing quality checks. We focus on practical, project-based learning to equip you with the skills needed to excel in the field of data engineering.

What You Will Build

  • A Python script to fetch data from a REST API.
  • A process to dump this data into a CSV file initially.
  • A Snowflake or BigQuery setup to manage your data in the cloud.
  • An automated pipeline using Astronomer to ingest data on a scheduled basis.
  • Data quality checks to ensure the integrity of your data.

Getting Started

Before you begin, make sure you have the following prerequisites:

  • Python installed on your machine.
  • An account with Snowflake or BigQuery (free tiers are available).
  • An account with Astronomer.

Installation & Setup

  1. Find a Data Source: Choose a data source you are interested in (e.g., stock market, Pokémon, sports data). Make sure it offers a REST API.
  2. Python Script for Data Fetching: Clone this repository and navigate to the script directory. Modify the script to point to your chosen data source.
  3. Snowflake/BigQuery Account: Follow the instructions on their website to set up a free trial account. Modify the script to dump data into your Snowflake/BigQuery instance instead of a CSV.
  4. Astronomer for Automation: Set up an account and follow the instructions to automate your data ingestion.
  5. Data Quality Checks: Implement data quality checks using Great Expectations or your custom checks.

Contributing

We welcome contributions from the community! Whether it's adding new features, improving documentation, or reporting bugs, your contributions are greatly appreciated.

  • Fork the Repository: Start by forking this repository to your GitHub account.
  • Create a Pull Request: After making your changes, create a pull request against our repository. Please provide a clear description of your changes.
  • Code Review: Your pull request will be reviewed by our team. We may suggest some changes or improvements.

License

This project is open-source and available under the MIT License.

Acknowledgments

About

Launchpad2DataEng offers a comprehensive introduction to the skills and concepts essential for a successful career in data engineering, all while fostering an open-source community where learners and experts alike can share insights, ask questions, and contribute to the growth of everyone involved.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages