This is the implementation of my small blogpost: To stream or to not stream. That is a question..
Inspired by the article written here Building A Streaming Fraud Detection System With Kafka And Python
The codebase for Kafka/Zookeeper stack ithes taken from the same blog post, as the author has done a brilliant work of puting the whole Kafka stack togheter in docker-compse
. Many thanks to this tremendous effort, which allows my work to be implemented much easier.
This fraud detection system is fully containerised. You will need Docker and Docker Compose to run it.
You simply need to create a Docker network called kafka-network
to enable communication between the Kafka cluster and the apps:
$ docker network create kafka-network
All set!
- Spin up the local single-node Kafka cluster (will run in the background):
$ docker-compose -f docker-compose.kafka.yml up -d
- Check the cluster is up and running (wait for "started" to show up):
$ docker-compose -f docker-compose.kafka.yml logs -f broker | grep "started"
- Start the stack, including
gin
,vodka
andbourbon
. Summazie
gin: Kafka producer, take the sample JSON file and simulate a coming of stream
vodka: cleaning service. Clean data and re-publish to a new topic. It also calculate sever data
stats on the flight.
bourbon: data quality watchdog. It calculates data stats from raw data
NOTE: Only bourbon
and vodka
print out
their log. gin
produces message to Kafka silently.
$ docker-compose up
Show a stream of transactions in the topic T
(optionally add --from-beginning
):
$ docker-compose -f docker-compose.kafka.yml exec broker kafka-console-consumer --bootstrap-server localhost:9092 --topic T
Topics:
streaming.user_activity
: raw generated user eventderivedstream.cleaned_user_events
: user event cleaned byvodka
derivedstream.quality_user_activity
: quality control , publish bybourbon
Example message:
{"id":1,"first_name":"Barthel","last_name":"Kittel","email":"bkittel0@printfriendly.com","gender":"Male","ip_address":"130.187.82.195","date":"06/05/2018","country":"france"}
To stop the transaction generator and fraud detector:
$ docker-compose down
To stop the Kafka cluster (use down
instead to also remove contents of the topics):
$ docker-compose -f docker-compose.kafka.yml stop
To remove the Docker network:
$ docker network rm kafka-network