Skip to content

The project showcases a data pipeline integrating Kafka messaging and MongoDB for efficient logistics data ingestion. It uses Python for Kafka producer/consumer operations and MongoDB for data storage.

Notifications You must be signed in to change notification settings

KRISHNASAIRAJ/Kafka-Driven-Logistics-Data-Ingestion-into-MongoDB

Repository files navigation

Kafka-Driven-Logistics-Data-Ingestion-into-MongoDB

The project orchestrates a pipeline to process logistics data using Kafka messaging and stores it efficiently in MongoDB. It involves Kafka producer and consumer scripts, Docker for scalability, and integrates Avro schema for data serialization.

Features

  • Kafka Integration: Utilizes Kafka messaging for handling logistics data.
  • Scalable Consumers: Docker scaling to manage the processing load with a consumer group of 3 instances.
  • Data Serialization: Implements Avro schema represented in avro.json for data serialization.
  • File Included: Contains delivery_trip_truck_data.csv for data ingestion.
  • Kafka Topic Creation: A Kafka topic named 'truck_data' created with 10 partitions.

Technology Used

  • Python Scripts:
    • kafka_producer.py: Sends logistics data to Kafka.
    • kafka_consumer.py: Processes data from Kafka and stores it in MongoDB.It starts consuming the earliest message available in the partition.
  • Docker: Utilized to scale consumer instances using the docker-compose.yml file.
  • Data Schema: Avro schema represented in avro.json for structured data handling.
  • File Contents: delivery_trip_truck_data.csv: Contains logistics data for ingestion.
  • Kafka Configuration:
    • Kafka topic: 'truck_data' created with 10 partitions.
  • MongoDB:Database for storing the logistics data

How to Use

  • Setting Up Kafka: Ensure 'truck_data' Kafka topic is created with 10 partitions.
  • Running the Pipeline:
    • Execute kafka_producer.py to send logistics data to Kafka.
    • Use Docker to scale consumer instances (docker-compose up --scale co=3).
  • Avro Schema Integration: Refer to avro.json for the Avro schema utilized in Kafka topic creation.
  • File Usage: delivery_trip_truck_data.csv: Contains logistics data for ingestion.

Producer                                             Consumer

Screenshot 2024-01-04 235750 Screenshot 2024-01-05 125504

MongoDB Database                               MongoDB Collection

Screenshot 2024-01-05 141253 Screenshot 2024-01-05 140926

About

The project showcases a data pipeline integrating Kafka messaging and MongoDB for efficient logistics data ingestion. It uses Python for Kafka producer/consumer operations and MongoDB for data storage.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published