Skip to content

LaPetiteSouris/KuronoSoshiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka Data Quality Control

Kafka Docker Images Python

This is the implementation of my small blogpost: To stream or to not stream. That is a question..

Inspired by the article written here Building A Streaming Fraud Detection System With Kafka And Python

The codebase for Kafka/Zookeeper stack ithes taken from the same blog post, as the author has done a brilliant work of puting the whole Kafka stack togheter in docker-compse. Many thanks to this tremendous effort, which allows my work to be implemented much easier.

Install

This fraud detection system is fully containerised. You will need Docker and Docker Compose to run it.

You simply need to create a Docker network called kafka-network to enable communication between the Kafka cluster and the apps:

$ docker network create kafka-network

All set!

Quickstart

  • Spin up the local single-node Kafka cluster (will run in the background):
$ docker-compose -f docker-compose.kafka.yml up -d
  • Check the cluster is up and running (wait for "started" to show up):
$ docker-compose -f docker-compose.kafka.yml logs -f broker | grep "started"
  • Start the stack, including gin, vodka and bourbon. Summazie
gin: Kafka producer, take the sample JSON file and simulate a coming of stream

vodka: cleaning service. Clean data and re-publish to a new topic. It also calculate sever data
stats on the flight.

bourbon: data quality watchdog. It calculates data stats from raw data

NOTE: Only bourbon and vodka print out their log. gin produces message to Kafka silently.

$ docker-compose up

Usage

Show a stream of transactions in the topic T (optionally add --from-beginning):

$ docker-compose -f docker-compose.kafka.yml exec broker kafka-console-consumer --bootstrap-server localhost:9092 --topic T

Topics:

  • streaming.user_activity: raw generated user event
  • derivedstream.cleaned_user_events: user event cleaned by vodka
  • derivedstream.quality_user_activity: quality control , publish by bourbon

Example message:

{"id":1,"first_name":"Barthel","last_name":"Kittel","email":"bkittel0@printfriendly.com","gender":"Male","ip_address":"130.187.82.195","date":"06/05/2018","country":"france"}

Teardown

To stop the transaction generator and fraud detector:

$ docker-compose down

To stop the Kafka cluster (use down instead to also remove contents of the topics):

$ docker-compose -f docker-compose.kafka.yml stop

To remove the Docker network:

$ docker network rm kafka-network

About

Data Stream Quality Control with Apache Kafka

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published