Skip to content

The project showcases a robust real-time data pipeline integrating Confluent Kafka, MySQL, and Avro serialization to facilitate seamless and immediate processing of e-commerce updates. The pipeline enables efficient streaming, transformation, and storage of incremental data updates for downstream analytics and business intelligence.

Notifications You must be signed in to change notification settings

KRISHNASAIRAJ/Avro-Encoded-Real-Time-Data-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Avro-Encoded-Real-Time-Data-Processing

The project showcases a robust real-time data pipeline integrating Confluent Kafka, MySQL, and Avro serialization to facilitate seamless and immediate processing of e-commerce updates. The pipeline enables efficient streaming, transformation, and storage of incremental data updates for downstream analytics and business intelligence.

Features

  • Kafka Producer: Fetches incremental updates from a MySQL database and serializes data into Avro format.
  • Multi-partitioned Topics: Utilizes 10 partitions to ensure optimal data distribution.
  • Kafka Consumer Group: Python-based consumer group of 5 consumers deserializes Avro data, performs transformations, and writes to JSON files.
  • Data Transformation: Implements logic for case conversions, price adjustments based on business rules, and efficient JSON formatting.
  • Comprehensive Documentation: Includes setup guidelines, SQL queries for incremental fetch, Avro schema, and illustrated execution via screenshots.

Technologies Used:

  • Python 3.7+
  • Confluent Kafka Python Client
  • MySQL Database
  • Apache Avro File Format

KAFKA UI

KAFKAUI

Producer Output Consumer Group(5) Output

Producer Output Consumer Group Output

About

The project showcases a robust real-time data pipeline integrating Confluent Kafka, MySQL, and Avro serialization to facilitate seamless and immediate processing of e-commerce updates. The pipeline enables efficient streaming, transformation, and storage of incremental data updates for downstream analytics and business intelligence.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages