Big Data Project

Directory Structure

cmd - contains the main entrypoint for the producer and data generation script
data - contains the generated data
model - contains the data model structs used in golang
processor - contains the consumer code and integration with spark

Tools

Docker
Golang
Python
Apache Kafka
Apache Spark
ScyllaDB

Setup

Start the dependencies (Zookeeper, Kafka, and ScyllaDB)

docker-compose up -d

Create the kafka topics

chmod +x kafka_create_topics.sh
./kafka_create_topics.sh

Create the database keyspace and table (get schema from here)

docker exec -it scylla-node1 cqlsh

cqlsh> # copy and paste schema from init.cql

Install the consumer dependencies

pip install -r processor/requirements.txt

Start up the consumer

python processor/main.py

Finally, produce the messages

go run producer/main.go

Check the database if the data was inserted

docker exec -it scylla-node1 cqlsh

cqlsh> select * from integrated_citizen.citizen;

To generate new data run

go run generate_datasource/main.go

References used

https://github.com/aldy505/local-scylladb-cluster/blob/master/docker-compose.yml

https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html

https://agam-kushwaha.medium.com/kafka-integration-with-apache-spark-d48e0691220f

https://medium.com/geekculture/integrate-kafka-with-pyspark-f77a49491087

https://levelup.gitconnected.com/using-kafka-with-docker-4520c2e6cfd

https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
cmd		cmd
data		data
model		model
processor		processor
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
init.cql		init.cql
kafka_create_topics.sh		kafka_create_topics.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Project

Directory Structure

Tools

Setup

References used

About

Releases

Packages

Languages

HotPotatoC/big-data-exam

Folders and files

Latest commit

History

Repository files navigation

Big Data Project

Directory Structure

Tools

Setup

References used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages