It establishes a modern data pipeline for generating, streaming, processing, and visualizing fake flight event data. The system uses Python (Faker) to produce mock flight events, which are then streamed through Kafka as a real-time message broker. Spark Structured Streaming consumes and processes these events, transforming the data before it is persisted in a PostgreSQL database. Finally, a Streamlit interactive dashboard visualizes the flight positions, statuses (On Time, Delayed, Cancelled), and other details in real-time. All services are containerized and orchestrated using Docker Compose for a robust and reproducible environment.
- Python (Faker) → Generate fake flight data (Producer)
- Kafka → Message broker for real-time streaming
- Spark (Structured Streaming) → Process and transform flight events
- Postgres → Store flight data
- Streamlit → Interactive dashboard to visualize flight positions & statuses
- Docker Compose → Containerized setup for all services
[ Python Producer (Faker) ]
↓
Kafka Topic: flights
↓
Spark Structured Streaming
↓
PostgreSQL
↓
Streamlit Dashboard
- Kafka topic:
flights - Postgres DB: flights_project
- Streamlit UI shows flight map with status colors (On Time / Delayed / Cancelled).
- All services are containerized and orchestrated via Docker Compose.
=======
38e135dc6bf3ef9b3aa87f3aa3cf2dbf4bacea31

