In this project we try to combine Apache Kafka and Apache Spark Streaming to identify and detect common ways of exploiting web applications.
This project is Protocol-based IDS on Cloud Environment [GCP] using Apache Kafka & Spark Streaming.
The Purpose is to detect Brute force attacks, DDoS attacks, SQL injection, and Cross-Site Scripting.
Each detected intrusion is logged into the protocol-ids-output bucket according to timestamp, and SMS notifications will also be sent to telephone numbers that have been set using the Vonage SMS API.
The Source Code for each Spark Job is written in Scala and can be modified in the corresponding Scala Folders in the Spark Jobs.
L. Wirz, R. Tanthanathewin, A. Ketphet and S. Fugkeaw, "Design and Development of A Cloud-Based IDS using Apache Kafka and Spark Streaming," 2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand, 2022, pp. 1-6, doi: 10.1109/JCSSE54890.2022.9836264.
Stack used:
- Kafka
- GCP Dataproc [Contains Spark]
- Scala
- Python
Make sure to Start both VMs:
- KafkaVM
- Dataproc Master Node
To access the first Terminal SSH to the Kafka VM under Compute Engine.
Terminal 1
cd /opt/kafka
sudo bin/zookeeper-server-start.sh config/zookeeper.properties
Terminal 2
cd /opt/kafka
sudo bin/kafka-server-start.sh config/server.properties
For pasting the pseudo HTTP requests.
Terminal 3
cd /opt/kafka
sudo bin/kafka-console-producer.sh --topic get --bootstrap-server localhost:9092
For debugging
Terminal 4
cd /opt/kafka
sudo bin/kafka-console-consumer.sh --topic get --from-beginning --bootstrap-server localhost:9092
cd /opt/kafka
sudo bin/kafka-topics.sh --create --topic topic-name --bootstrap-server localhost:9092
cd /opt/kafka
sudo bin/kafka-topics.sh localhost:2181 --delete --topic topic-name --bootstrap-server localhost:9092
gcloud dataproc jobs submit spark --jar=gs://protocol-ids-spark-jobs/job-name.jar --cluster=cluster-protocol-ids --properties=^#^spark.jars.packages=org.apache.spark:spark-streaming-kafka-0-10_2.12:3.1.2,com.vonage:client:6.2.0 --region=asia-southeast2 -- gs://protocol-ids-output/output-logs/
ip:127.0.0.1, user-identifier:UD11, name:frank, time-stamp:[10/Oct/2000:13:55:36 -0700], header:"GET /?id=message&password=message2 HTTP/1.0", status:200
Wenyi Xu (2017) Visor: Real-time Log Monitor [Source Code and Log Pattern]. https://github.com/xuwenyihust/Visor
payloadbox (2021) SQL Injection Payload List [Dataset]. https://github.com/payloadbox/sql-injection-payload-list
payloadbox (2021) Cross Site Scripting (XSS) Vulnerability Payload List [Dataset]. https://github.com/payloadbox/xss-payload-list
iryndin (2018) 10K-Most-Popular-Passwords [Dataset]. https://github.com/iryndin/10K-Most-Popular-Passwords
ferasalnaem (2021) sqli-detection-using-ML [ML Testing Source Code]. https://github.com/ferasalnaem/sqli-detection-using-ML
Scala 2.12.14 Spark 3.1.2