Smart City End to End Real-Time Data Streaming Pipeline

Project Overview

This project aims to create a comprehensive real-time data streaming pipeline for a Smart City initiative. It captures and processes real-time data from a vehicle traveling from London to Birmingham, including vehicle data, GPS data, emergency incidents, weather conditions, and camera footage. The pipeline leverages a combination of IoT devices, Apache Kafka, Apache Spark, Docker, and AWS services to ensure efficient data ingestion, processing, storage, and visualization.

Architecture Overview

Technologies Used

IoT Devices: For capturing real-time data.
Apache Zookeeper: For managing and coordinating Kafka brokers.
Apache Kafka: For real-time data ingestion into different topics.
Apache Spark: For real-time data processing and streaming.
Docker: For containerization and orchestration of Kafka and Spark.
Python: For data processing scripts.
AWS Cloud:
- S3: For storing processed data as Parquet files.
- Glue: For data extraction and cataloging.
- Athena: For querying processed data.
- IAM: For managing access and permissions.
- Redshift: For data warehousing and analytics.
Amazon QuickSight: For data visualization and dashboarding.

Project Workflow

Data Ingestion:
- IoT devices capture real-time data.
- Data is ingested into Kafka topics configured in Docker using docker-compose.yml.
Data Processing:
- Apache Spark reads data from Kafka topics.
- Spark processes the data and writes it to AWS S3 as Parquet files.
- Spark Streaming is used for real-time data processing with checkpointing to handle data issues.
Data Storage:
- Processed data is stored in AWS S3.
- AWS Glue crawlers extract data from S3 and catalog it.
Data Querying:
- AWS Athena queries the processed and cataloged data from Glue.
Data Visualization:
- Amazon QuickSight visualizes the queried data with interactive dashboards.

Getting Started

Prerequisites

Docker and Docker Compose
AWS Account with appropriate IAM roles and permissions
Python 3.x
Apache Kafka and Apache Spark setup

Setup Instructions

Clone the Repository:

git clone https://github.com/SmartCityProject/smart-city-pipeline.git
cd smart-city-pipeline

Configure Docker:
- Ensure Docker and Docker Compose are installed and running.
- Configure Kafka and Spark in docker-compose.yml.
- Start the services:
```
docker-compose up -d
```
AWS Configuration:
- Set up AWS IAM roles and permissions.
- Configure AWS S3 buckets, Glue crawlers, and Athena.
- Update the configuration files with your AWS credentials and resource details.
Run Data Ingestion:
- Start producing data to Kafka topics using IoT data simulators.
Run Spark Streaming:
- Submit the Spark job to process and stream data to S3:
```
spark-submit --master local[2] your-spark-job.py
```
Query Data with Athena:
- Use AWS Athena to query the processed data stored in S3.
Visualize Data with QuickSight:
- Create an Amazon QuickSight analysis and build your dashboard with the processed data.

Dashboard

The dashboard created in Amazon QuickSight includes the following visualizations:

Total Number of Entries in Each Table: KPI widgets or bar charts.
Average Speed of Vehicles: Single KPI value.
Number of Emergency Incidents: Single KPI value.
Time Series Analysis: Line charts for vehicle data, weather data, and emergency incidents over time.
Geographic Visualization: Maps for route visualization and vehicle locations.
Heat Maps: For GPS data points density.
Detailed Analysis: Tables and pie charts for detailed data insights.

Conclusion

This project demonstrates the power of modern data engineering tools to handle complex, real-time data streams and deliver actionable insights for Smart City initiatives. The use of AWS services ensures scalability, reliability, and ease of data management, making it an excellent example of an end-to-end data streaming pipeline.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Thanks to the open-source community for providing the tools and libraries used in this project.
Special thanks to my team and mentors for their support and guidance.


This `README.md` provides a clear and detailed overview of the project, including the technologies used, workflow, setup instructions, and visualizations created in Amazon QuickSight. Adjust any specifics like the repository URL, architecture diagram path, and other details as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.venv		.venv
jobs		jobs
Analysis-1.png		Analysis-1.png
ERD Diagram.pdf		ERD Diagram.pdf
README.md		README.md
SysArch.png		SysArch.png
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart City End to End Real-Time Data Streaming Pipeline

Project Overview

Architecture Overview

Technologies Used

Project Workflow

Getting Started

Prerequisites

Setup Instructions

Dashboard

Conclusion

License

Acknowledgments

About

Releases

Packages

Languages

DivineSamOfficial/SmartCityProject

Folders and files

Latest commit

History

Repository files navigation

Smart City End to End Real-Time Data Streaming Pipeline

Project Overview

Architecture Overview

Technologies Used

Project Workflow

Getting Started

Prerequisites

Setup Instructions

Dashboard

Conclusion

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages