### <center>QuakeMatch </center>
### <center>Luca Longo, Università degli Studi di Catania </center>
<img src="https://user-images.githubusercontent.com/50534107/256270970-f0de1c54-ba7f-46a0-ab32-c4ad00e6b26e.svg" alt="QuakeMatch" />

## Table of contents:
* [Introduction](#Introduction)
* [Quake Match Pipeline](#QuakeMatchPipeline)
* [Data Ingestion](#dataIngestion)
    * [Python Script](#pythonScript)
    * [Logstash](#logstash)
* [Streaming with Kafka](#kafka)
* [Processing with Apache Spark](#spark)
* [Data indexing with Elasticsearch](#elasticsearch)
* [Data visualization with Kibana](#kibana)
* [Requirements](#requirements)
* [Usage](#usage)

# Introduction <a class="anchor" id="Introduction"></a>

<p>
QuakeMatch is a tool for matching detection of antipodal earthquakes that use sismics data to look for a match between two or more events. It uses technologies such as Logstash, Kafka with Kafka Zookeeper, Apache Spark, Elasticsearch and Kibana for data ingestion, filtering, elaboration and results visualization.
</p>

## Quake Match Pipeline<a class="anchor" id="QuakeMatchPipeline"></a>
<p>
    <img src="https://user-images.githubusercontent.com/50534107/256268865-21fcd7c9-1762-45da-8a8e-aed7e15b7468.png" alt="Pipeline" />
</p>


# Data Ingestion <a class="anchor" id="dataIngestion"></a>

## Python Script <a class="anchor" id="pythonScript"></a>
The following code allows you to generate a file containing all the urls needed for the API requests made by Logstash:

In [1]:
import datetime

base_url = "https://www.seismicportal.eu/fdsnws/event/1/query?limit=7000&start={}&end={}"

start_date = datetime.datetime(1998, 7, 19)
end_date = datetime.datetime(2023, 7, 19)

current_date = start_date
links = []

i=0
while current_date <= end_date:
    start_time = current_date.strftime("%Y-%m-%dT%H:%M:%S.0")
    end_time = (current_date + datetime.timedelta(days=1) - datetime.timedelta(seconds=1)).strftime("%Y-%m-%dT%H:%M:%S.0")
    link = base_url.format(start_time, end_time)
    links.append(link)
    current_date += datetime.timedelta(days=1)
    i=i+1

with open("./logstash/seismic_portal_links.txt", "w") as file:
    for link in links:
        file.write(link + "\n")


        

You can choose the date range by modifying the start_date and end_date values

## Logstash <a class="anchor" id="logstash"></a>

Logstash will read the file generated by the python script, execute the API requests to Seismic Portal, filter the data and send the generated messages to Kafka.

# Streaming with Kafka <a class="anchor" id="kafka"></a>

Messages sent from Logstash to Kafka will be in the "earthquakes" topic. Each message will present five lists including all timestamps, regions, latitudes, longitudes and magnitudes of all events

# Processing with Apache Spark <a class="anchor" id="spark"></a>

Spark will take care of the following tasks:
<ul>
    <li>Taking messages from the Kafka topic "earthquakes"</li>
    <li>Filtering by magnitude ≥ 5.5</li>
    <li>Antipode calculation</li>
    <li>Retrieving a match with one or more events whose latitude and longitude is in the range of ± 30° with respect to the calculated antipode and which occurred within the following three days</li>
    <li>Creating a CSV file with the obtained data</li>
    <li>Creating an index on Elasticsearch</li>
    <li>Data indexing</li>
</ul>

# Data indexing with Elasticsearch <a class="anchor" id="elasticsearch"></a>

The data will be indexed in Elasticsearch under the index "earthquakes" and mapping "earthquake_mapping". Each item will have the following fields:
<ul>
    <li>unique id</li>
    <li>timestamp</li>
    <li>region</li>
    <li>latitude</li>
    <li>longitude</li>
    <li>latitude_antipode</li>
    <li>longitude_antipode</li>
    <li>matches</li>
</ul>

# Data visualization with Kibana <a class="anchor" id="kibana"></a>

<img src="https://user-images.githubusercontent.com/50534107/256273468-eec68964-ff00-4630-9f45-3998e94b6037.png" alt="kibana" />

The Kibana dashboard will show five lenses: the first indicates the number of matches that QuakeMatch was able to detect, followed by a descriptive table of the matches and finally three other lenses that show the data of the earthquakes occurred in Italy, the distribution of the number of earthquakes by magnitude and the distribution of earthquakes in the european area.

# Requirements <a class="anchor" id="requirements"></a>

<ul>
    <li>Docker</li>
    <li>Python</li>
    <li>wget</li>
    <li>A solution with, at least, 16GB of RAM</li>
</ul>

# Usage <a class="anchor" id="usage"></a>

<li>Install Docker in your system, than run the following command for create a docker network:</li>

In [2]:
%%bash
. ~/.bashrc
docker network create kafka-network

d3da4d4f72bd50aa53c5827e7eff7bffa127e1ba376254cc650da027198a93a4


<li>Create a container with Kafka Zookeeper:</li>

In [3]:
%%bash
. ~/.bashrc
docker run -d \
  --name zookeeper \
  --network kafka-network \
  -p 2181:2181 \
  -e ZOOKEEPER_CLIENT_PORT=2181 \
  confluentinc/cp-zookeeper:latest

7d3c4f34e5e45df23744b665c41a4d849ab3c4ec8871044e14647f0d671b8c37


<li>Wait Zookeper to be fully loaded, then create a container with Kafka:</li>

In [4]:
%%bash
. ~/.bashrc
docker run -d \
  --name kafka \
  --network kafka-network \
  -p 9092:9092 \
  -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
  -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 \
  -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
  confluentinc/cp-kafka:latest

98bcbbe2498c184d3c7a413b4fcac6ee27b376e4ed1277b463a3843ff24688c1


<li>And a container with Kafka UI:</li>

In [5]:
%%bash
. ~/.bashrc
docker run -d \
  --name kafka-ui \
  --network kafka-network \
  -p 8080:8080 \
  -e KAFKA_CLUSTERS_0_NAME=local \
  -e KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka:9092 \
  -e KAFKA_CLUSTERS_0_ZOOKEEPER=zookeeper:2181 \
  -e KAFKA_CLUSTERS_0_ENABLESR=false \
  -e KAFKA_CLUSTERS_0_SASLMECHANISM= \
  -e KAFKA_CLUSTERS_0_SASLPLAIN_USERNAME= \
  -e KAFKA_CLUSTERS_0_SASLPLAIN_PASSWORD= \
  -e KAFKA_CLUSTERS_0_SASLPLAIN_PASSWORD_FILE= \
  -e KAFKA_CLUSTERS_0_TRUSTEDCERTS= \
  -e KAFKA_CLUSTERS_0_CLIENTCERT= \
  -e KAFKA_CLUSTERS_0_CLIENTKEY= \
  -e KAFKA_CLUSTERS_0_CLIENTKEYPASSWORD= \
  -e KAFKA_CLUSTERS_0_CONSUMERCONFIGS= \
  -e KAFKA_CLUSTERS_0_ADMINCONFIGS= \
  provectuslabs/kafka-ui:latest

2efa12b6727e5a25f0b34b71b56f22c5c859d142a1d8af8ee1a14e56f197b8ca


<li>Create Elasticsearch docker image:</li>

In [6]:
%%bash
. ~/.bashrc
docker build -t elastic-image ./elasticsearch

#1 [internal] load build definition from Dockerfile
#1 sha256:9951727a17f9e69414b97f80d66b4b9ec4d41e089e5c8f0c7f43d5b91c0e43a2
#1 transferring dockerfile: 218B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:1291cf5f600c667f3ade4c49609fdf8ee5e289e579fbae28077be7abb72a4d5e
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.elastic.co/elasticsearch/elasticsearch:7.17.0
#3 sha256:1dbbb84f972fa6e5ae3847ac9fa346a7bfdb27185aacf0e30b34e90ee821bb02
#3 DONE 1.6s

#5 [1/2] FROM docker.elastic.co/elasticsearch/elasticsearch:7.17.0@sha256:577b382dda5d05385aea8c7b60dad97e02ff41ca0da54f723151c2aed9ac8f54
#5 sha256:248baad951fdafcfd3b149db719c51cb71d55a7d2b66d874122e699a682cfd13
#5 DONE 0.0s

#4 [2/2] RUN sysctl -w vm.max_map_count=262144
#4 sha256:d2e7b07617b1fb5c3a397425da5323d5228d4474b3458426eed823b9c1ad013b
#4 CACHED

#6 exporting to image
#6 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#6 exporting layers done
#6 writin

<li>Then run Elasticsearch:</li>

In [7]:
%%bash
. ~/.bashrc
docker run -d --name elasticsearch --network kafka-network -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elastic-image

1d4dde6207608fcad1a02855d084de5b5d73fa028639929e92594d0562486637


<li>Create a Kibana container:</li>

In [8]:
%%bash
. ~/.bashrc
docker run -d --name kibana -p 5601:5601 --network kafka-network -e "ELASTICSEARCH_HOSTS=http://elasticsearch:9200" docker.elastic.co/kibana/kibana:7.15.1

88e1763732cc40439a46fa1c20cd96f3b8e8708112222b075e0466c702596ea0


<li>Run python script:</li>

In [9]:
%%bash
. ~/.bashrc
python3 ./logstash/urls_dates.py

<li>Create Logstash docker image:</li>

In [10]:
%%bash
. ~/.bashrc
docker build -t logstash-image ./logstash

#1 [internal] load build definition from Dockerfile
#1 sha256:2bc9b4850f0190c69af6131e677837799905b5bdd2c9c71e4f502a5b40fea358
#1 transferring dockerfile: 520B 0.0s done
#1 DONE 0.1s

#2 [internal] load .dockerignore
#2 sha256:200dc0774b78b25e34bd7b537a7d4121eb068a37cd04c603ce49cfd70bb935a0
#2 transferring context:
#2 transferring context: 2B done
#2 DONE 0.1s

#3 [internal] load metadata for docker.elastic.co/logstash/logstash:8.8.1@sha256:9b2e080605e208ef1165fd6cfd68a8b05c2031c8818b8520f82f73238dbb471c
#3 sha256:f9b51405f81dc3157b837175d3f8cd610533f41e65fe5a83aa380f95b96c6a5c
#3 DONE 1.4s

#4 [1/6] FROM docker.elastic.co/logstash/logstash:8.8.1@sha256:9b2e080605e208ef1165fd6cfd68a8b05c2031c8818b8520f82f73238dbb471c
#4 sha256:1001cfd24a82b78a0aded882f8f22066a01463643e26780d6e6428154760017a
#4 DONE 0.0s

#5 [internal] load build context
#5 sha256:82a405591fcdbb6a6f5abc96202926f06750d4cd62cec3224368fe7e26824e64
#5 transferring context: 1.05MB done
#5 DONE 0.0s

#6 [2/6] COPY logstash.co

<li>Then run Logstash:</li>

In [11]:
%%bash
. ~/.bashrc
docker run -d --name logstash-container --network kafka-network  logstash-image

18538312611fae3ca7509e117afc3db296f2da92e326a4f5ff40a749bf3447ad


<li>Run following code for download elasticsearch-spark</li>

In [12]:
%%bash
. ~/.bashrc
wget https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-20_2.12/7.15.1/elasticsearch-spark-20_2.12-7.15.1.jar -P ./spark

--2023-07-28 00:38:47--  https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-20_2.12/7.15.1/elasticsearch-spark-20_2.12-7.15.1.jar
Resolving repo1.maven.org (repo1.maven.org)... 199.232.192.209, 199.232.196.209, 2a04:4e42:4c::209, ...
Connecting to repo1.maven.org (repo1.maven.org)|199.232.192.209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2101583 (2.0M) [application/java-archive]
Saving to: ‘./spark/elasticsearch-spark-20_2.12-7.15.1.jar’

     0K .......... .......... .......... .......... ..........  2% 2.96M 1s
    50K .......... .......... .......... .......... ..........  4% 5.90M 0s
   100K .......... .......... .......... .......... ..........  7% 4.97M 0s
   150K .......... .......... .......... .......... ..........  9% 17.2M 0s
   200K .......... .......... .......... .......... .......... 12% 73.1M 0s
   250K .......... .......... .......... .......... .......... 14% 5.20M 0s
   300K .......... .......... .......... ..........

<li>And spark-sql-kafka:</li>

In [13]:
%%bash
. ~/.bashrc
wget https://repo1.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.12/3.4.1/spark-sql-kafka-0-10_2.12-3.4.1.jar -P ./spark

--2023-07-28 00:38:47--  https://repo1.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.12/3.4.1/spark-sql-kafka-0-10_2.12-3.4.1.jar
Resolving repo1.maven.org (repo1.maven.org)... 199.232.192.209, 199.232.196.209, 2a04:4e42:4c::209, ...
Connecting to repo1.maven.org (repo1.maven.org)|199.232.192.209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 427253 (417K) [application/java-archive]
Saving to: ‘./spark/spark-sql-kafka-0-10_2.12-3.4.1.jar’

     0K .......... .......... .......... .......... .......... 11% 2.92M 0s
    50K .......... .......... .......... .......... .......... 23% 3.93M 0s
   100K .......... .......... .......... .......... .......... 35% 4.50M 0s
   150K .......... .......... .......... .......... .......... 47% 30.7M 0s
   200K .......... .......... .......... .......... .......... 59% 8.93M 0s
   250K .......... .......... .......... .......... .......... 71%  307M 0s
   300K .......... .......... .......... .......... ..........

<li>Create Apache Spark docker image:</li>

In [14]:
%%bash
. ~/.bashrc
docker build -t spark-earthquakes ./spark

#1 [internal] load build definition from Dockerfile
#1 sha256:841f7323cfafd9bcfe25788479f6fada23565f9dd581ada3649a608ea09da417
#1 transferring dockerfile: 297B done
#1 DONE 0.1s

#2 [internal] load .dockerignore
#2 sha256:1b9edfe5fc3d400dd16b156260f9d9be7846c010f1f4632c849ca56525d6e4e7
#2 transferring context: 2B done
#2 DONE 0.1s

#3 [internal] load metadata for docker.io/bitnami/spark:latest
#3 sha256:0dbae0baa930a0ea10c2e1420f888b307c97531e436e834af50ecdbfd42c80cf
#3 DONE 1.3s

#4 [1/6] FROM docker.io/bitnami/spark:latest@sha256:9467c6ec2cfd0cde0cb23ea81f44f85430cb0a8154d8a06982ec8895b1734b00
#4 sha256:77a30c37a56681f4cc8c11e90dd833687639c5598682c3f6884ea1d4a066bc22
#4 DONE 0.0s

#6 [internal] load build context
#6 sha256:9c850b7898abbb14c3b4abe5079f4de386f93eff755d0618db99490029b4090d
#6 transferring context: 5.28kB done
#6 DONE 0.0s

#8 [4/6] COPY requirements.txt .
#8 sha256:fd34911b55807c0364036a39cb923f67d24c7859b8538520b717bfd081aa57b8
#8 CACHED

#7 [3/6] COPY earthquake_analy

<li>Wait until all messages in Kafka's "earthquakes" topic are ready, then run Apache Spark container:</li>

In [15]:
%%bash
. ~/.bashrc
docker run -d --network kafka-network --name spark-earthquakes_analyzer spark-earthquakes

c70546e50bb806f6ffb49f04295b0ee10788a82c38927551b897c5a61ae98441


<li>In the browser, put the url "http://localhost:5601", go to "Kibana / Saved Objects", click on "import" and select "export.ndjson" in kibana folder</li>

<li>Open left menu in Kibana, select Dashboard, select the imported dashboard for see all lens</li>