⚡ Real-Time SCADA Data Pipeline with Kafka and PostgreSQL

This project simulates a real-time data pipeline where a fake SCADA system streams sensor data every second. The architecture includes Kafka for real-time data streaming and PostgreSQL for persistent storage.

�� Architecture Overview

``` [Fake SCADA System] --> [Kafka Producer] --> [Kafka Topic] --> [Kafka Consumer] --> [PostgreSQL Database]

FAKE SCADA SYSTEM (Python script) | | generates fake sensor readings every second ↓ KAFKA PRODUCER (Python) ← part of the SCADA simulation | | publishes messages to Kafka topic (e.g., "sensor_data") ↓ KAFKA TOPIC (Apache Kafka) | | acts as a message broker ↓ KAFKA CONSUMER (Python) | | listens to topic, extracts message ↓ PostgreSQL Database

```

Components:

Fake SCADA System: Simulates real sensors by generating random readings every second.
Kafka Producer: Acts on behalf of the SCADA system and sends the data to a Kafka topic.
Kafka Broker: Handles message streaming between producers and consumers.
Kafka Consumer: Subscribes to the topic and inserts the received data into a PostgreSQL table.
PostgreSQL: Stores the time-series sensor data persistently for querying and analytics.

�� Setup Instructions

1. Clone the Repository

```bash git clone https://github.com/yourusername/scada-kafka-dashboard.git cd scada-kafka-dashboard ```

2. Create a Virtual Environment and Install Dependencies

```bash python3 -m venv scada_venv source scada_venv/bin/activate pip install -r requirements.txt ```

3. Install and Start Kafka (Using Docker)

Create a `docker-compose.yml` file:

```yaml version: '2' services: zookeeper: image: confluentinc/cp-zookeeper:latest environment: ZOOKEEPER_CLIENT_PORT: 2181

kafka: image: confluentinc/cp-kafka:latest ports: - "9092:9092" environment: KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 ```

Then run:

```bash docker-compose up -d ```

4. Create Kafka Topic

```bash docker exec -it <kafka_container_id_or_name> bash kafka-topics --create --topic scada-data --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 ```

5. Set Up PostgreSQL

```bash sudo -u postgres psql ```

```sql CREATE USER scada_user WITH PASSWORD 'scada_pass'; CREATE DATABASE scada_db OWNER scada_user; GRANT ALL PRIVILEGES ON DATABASE scada_db TO scada_user; ```

6. Create the `sensor_data` Table (Automatically via Consumer)

The table will be auto-created by the consumer script if not already present:

```sql CREATE TABLE IF NOT EXISTS sensor_data ( id SERIAL PRIMARY KEY, timestamp TIMESTAMP, sensor_id UUID, temperature FLOAT, pressure FLOAT, flow_rate FLOAT, status VARCHAR(50) ); ```

�� Running the Project

Start Kafka Producer (Fake SCADA System)

```bash python producer.py ```

Start Kafka Consumer

```bash python consumer.py ```

�� Query the Data in PostgreSQL

```bash psql -U postgres -d scada_db ```

Then:

```sql SELECT COUNT(*) FROM sensor_data; SELECT * FROM sensor_data ORDER BY id DESC LIMIT 5; ```

✅ Real SCADA System Integration

For a real SCADA system:

Replace the fake `producer.py` script.
Use a real Kafka producer that reads data from hardware or SCADA interface (e.g., OPC UA, MQTT, REST, Modbus).
Keep Kafka and PostgreSQL pipeline as-is.

�� Project Structure

``` ├── producer.py # Simulates SCADA sensor data and sends to Kafka ├── consumer.py # Reads from Kafka and inserts into PostgreSQL ├── requirements.txt # Python dependencies ├── docker-compose.yml # Kafka + Zookeeper setup ├── .env # DB connection details (optional) └── README.md ```

�� Sample Output

```sql SELECT * FROM sensor_data ORDER BY id DESC LIMIT 5; ```

id	timestamp	sensor_id	temperature	pressure	flow_rate	status
3832	2025-04-04 19:15:53	2087ca10-d69e-443c-98fc-124c215f4973	38.17	8.84	2.62	WARNING
3831	2025-04-04 19:15:52	b0e5bf6b-66a1-4976-997b-2bde6dc64d68	87.16	8.33	3.17	NORMAL

�� Contact

For questions or feedback, feel free to reach out or contribute!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡ Real-Time SCADA Data Pipeline with Kafka and PostgreSQL

�� Architecture Overview

Components:

�� Setup Instructions

1. Clone the Repository

2. Create a Virtual Environment and Install Dependencies

3. Install and Start Kafka (Using Docker)

4. Create Kafka Topic

5. Set Up PostgreSQL

6. Create the `sensor_data` Table (Automatically via Consumer)

�� Running the Project

Start Kafka Producer (Fake SCADA System)

Start Kafka Consumer

�� Query the Data in PostgreSQL

✅ Real SCADA System Integration

�� Project Structure

�� Sample Output

�� Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
consumer		consumer
producer		producer
scada_venv		scada_venv
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

K-BM/scada-kafka-application

Folders and files

Latest commit

History

Repository files navigation

⚡ Real-Time SCADA Data Pipeline with Kafka and PostgreSQL

��� Architecture Overview

Components:

��� Setup Instructions

1. Clone the Repository

2. Create a Virtual Environment and Install Dependencies

3. Install and Start Kafka (Using Docker)

4. Create Kafka Topic

5. Set Up PostgreSQL

6. Create the `sensor_data` Table (Automatically via Consumer)

��� Running the Project

Start Kafka Producer (Fake SCADA System)

Start Kafka Consumer

��� Query the Data in PostgreSQL

✅ Real SCADA System Integration

��� Project Structure

��� Sample Output

��� Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

�� Architecture Overview

�� Setup Instructions

�� Running the Project

�� Query the Data in PostgreSQL

�� Project Structure

�� Sample Output

�� Contact

Packages