The project showcases a robust real-time data pipeline integrating Confluent Kafka, MySQL, and Avro serialization to facilitate seamless and immediate processing of e-commerce updates. The pipeline enables efficient streaming, transformation, and storage of incremental data updates for downstream analytics and business intelligence.
- Kafka Producer: Fetches incremental updates from a MySQL database and serializes data into Avro format.
- Multi-partitioned Topics: Utilizes 10 partitions to ensure optimal data distribution.
- Kafka Consumer Group: Python-based consumer group of 5 consumers deserializes Avro data, performs transformations, and writes to JSON files.
- Data Transformation: Implements logic for case conversions, price adjustments based on business rules, and efficient JSON formatting.
- Comprehensive Documentation: Includes setup guidelines, SQL queries for incremental fetch, Avro schema, and illustrated execution via screenshots.
- Python 3.7+
- Confluent Kafka Python Client
- MySQL Database
- Apache Avro File Format
![KAFKAUI](https://private-user-images.githubusercontent.com/90061814/294138191-cd16e59c-9abe-4827-adb0-9a1db2e0d458.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODQ4NDYsIm5iZiI6MTcyMDU4NDU0NiwicGF0aCI6Ii85MDA2MTgxNC8yOTQxMzgxOTEtY2QxNmU1OWMtOWFiZS00ODI3LWFkYjAtOWExZGIyZTBkNDU4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDA0MDkwNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU4YzA2NjUzMzk0MTQ5NzU4ZDk1NWIxNmE1YWEwNTZlM2E3ODAyMjE5NDZhZTJlNDQxMjFiNTA0NWRlZWU3YzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.iOfCmOnh-1h3e7-yrwObD_Jfhb8paxyzMJsR3qoKhn4)