⚡ Bitcoin Real-Time Data Streaming Pipeline

An End-to-End Data Engineering Project

Overview

This project demonstrates a real-time data engineering pipeline from scratch, covering everything from ingestion to storage using a modern, scalable tech stack. Bitcoin price updates from the CoinGecko API and stream, process, and store the data using tools like Airflow, Kafka, Spark, and Cassandra, all containerized via Docker for seamless orchestration and deployment.

System Architecture

Pipeline Flow:

Airflow fetches Bitcoin data from the CoinGecko API and stores it in PostgreSQL.
Data is streamed to Apache Kafka, coordinated by Zookeeper.
Spark Streaming consumes and processes data in real-time.
Transformed data is stored in a Cassandra database.
Monitoring and schema evolution handled via Kafka Control Center and Schema Registry.

Tech Stack

Layer	Tool
Orchestration	Apache Airflow
Messaging	Apache Kafka, Zookeeper
Processing	Apache Spark (Structured Streaming)
Storage	Cassandra, PostgreSQL
Monitoring	Kafka Control Center, Schema Registry
Infrastructure	Docker, Docker Compose
Programming	Python

🏁 Getting Started

Clone and spin up the project in just a few steps:

Clone the repository

git clone https://github.com/0xpradish/e2e-data-engineering.git

Navigate to the project directory
```
cd e2e-data-engineering
```
Run Docker Compose to spin up the services:
```
docker compose up -d
```

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dags		dags
script		script
README.md		README.md
architecture.png		architecture.png
docker-compose.yml		docker-compose.yml
message_sample.json		message_sample.json
notes.txt		notes.txt
requirements.txt		requirements.txt
stream_bitcoin.py		stream_bitcoin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ Bitcoin Real-Time Data Streaming Pipeline

Overview

System Architecture

Tech Stack

🏁 Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ Bitcoin Real-Time Data Streaming Pipeline

Overview

System Architecture

Tech Stack

🏁 Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages