cmd
- contains the main entrypoint for the producer and data generation scriptdata
- contains the generated datamodel
- contains the data model structs used in golangprocessor
- contains the consumer code and integration with spark
- Docker
- Golang
- Python
- Apache Kafka
- Apache Spark
- ScyllaDB
Start the dependencies (Zookeeper, Kafka, and ScyllaDB)
docker-compose up -d
Create the kafka topics
chmod +x kafka_create_topics.sh
./kafka_create_topics.sh
Create the database keyspace and table (get schema from here)
docker exec -it scylla-node1 cqlsh
cqlsh> # copy and paste schema from init.cql
Install the consumer dependencies
pip install -r processor/requirements.txt
Start up the consumer
python processor/main.py
Finally, produce the messages
go run producer/main.go
Check the database if the data was inserted
docker exec -it scylla-node1 cqlsh
cqlsh> select * from integrated_citizen.citizen;
To generate new data run
go run generate_datasource/main.go
https://github.com/aldy505/local-scylladb-cluster/blob/master/docker-compose.yml
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
https://agam-kushwaha.medium.com/kafka-integration-with-apache-spark-d48e0691220f
https://medium.com/geekculture/integrate-kafka-with-pyspark-f77a49491087
https://levelup.gitconnected.com/using-kafka-with-docker-4520c2e6cfd
https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1