Adaptive Distributed Streaming Similarity Joins

Code repository for Adaptive Distributed Streaming Similarity Joins published in DEBS 2023.

Kubernetes Minikube Setup Instructions

Requirements

A Kubernetes cluster (or an emulator, e.g. Minikube)
A Flink distribution (can be the DELTA one) and a FLINK_HOME env var pointing to it
- Put export FLINK_HOME="/path/to/flink/" in your .bashrc
Docker

Instructions for bringing cluster and dashboards up

Run ./kubernetes/start-everything.sh
Usually, this takes a while.. And Flink might not start after MinIO, which will cause it to break. If you see that the Flink deployment is all red in the dashboard, just run the Flink deployment script separately later (./kubernetes/deploy-flink.sh). Once MinIO is up, you can restart Flink without issue.
Open a new terminal, run minikube tunnel
Open a new terminal, run ./kubernetes/port-forward-minio.sh. You can now visit MinIO's dashboard at localhost:9000 (minio, minio123)

After this, you may either type kubectl get svc or go to the services category on the Kubernetes dashboard to get the relevant external cluster IPs for the dashboard of Flink (port 8081) and the Flask API. Minio's dashboard can be found under localhost:9000 after the port-forward.

To reset a deployment, it is usually enough to just delete it and re-run its deployment script (e.g. for Flink and Flask at least this is the case). If you need to reset the entirety of Minikube, just start from step 1. again.

API (port 5000)

<address>/setup: sends the job and migration (currently placeholder) jars to Flink
<address>/start: starts the distributed join job, it will throw an error but the job runs anyway if you check the dashboard
<address>/stop: stops the join job and creates a savepoint
<address>/migrate: starts the migration job and waits for it to finish. Once it is finished, it resumes the join job.
<address>/jobs: gets the jobs that are currently running, mostly for debug

Docker Images

Currently the only relevant image is the one in the coordinator folder. For now, you can just build it whenever you change the API (./coordinator/src/app.py) and it should use the local version first. Beware, this works for Minikube only.

K3s cluster instructions

IMPORTANT: Run ALL commands with sudo!!

Managing the deployments

The kubernetes folder has a redeploy-*.sh script for every deployment, which takes the deployment down and restarts it. This should in principle completely reset that particular deployment.

If nothing is running yet, use start-everything.sh to bring all deployments up.

Checking deployment status

You can use the script that is provided, after starting everything: check-deployments.sh

You can also inspect further with the following commands:

sudo k3s kubectl get deployments --all-namespaces
sudo k3s kubectl get pods --all-namespaces
sudo k3s kubectl get svc --all-namespaces

et cetera.

Getting the addresses in order to interact with the services

To see which ports and addresses can be used to connect to externally, use sudo k3s kubectl get all svc to view the addresses, specifically you want the ones under EXTERNAL-IP. Some services may show up to 4 addresses, in such cases you can use any of them.

Change docker images

In order to change the images used for the coordinator and the stat monitor, check the deployment files:

coordinator: ./kubernetes/deployments/coordinator.yaml
stat monitor: ./kubernetes/deployments/monitor.yaml

In both of them, look for the image: property, and replace that with a reference to your own version of the image.

You should be able to build the images from the stat monitor and coordinator repositories and push them under your own username on dockerhub, in case you need to make adjustments (e.g. to the algorithm used for the stat monitor).

Managing the Flink jobs

The API is the same as above, except now you need to do curl requests to the external IP of the coordinator service, e.g.:

curl http://<COORDINATOR_IP>:5000/setup

Remember to first call setup in case the flink cluster is fresh, so it has the jars needed for the jobs. Then you may call the start route to start the join job.

Running S³J and the experiments

With access to a Kubernetes cluster, you can setup all the needed components by running the start-everything.sh script within /online_partitioning_for_ssj/kubernetes.

To run experiments the run_in_parts.sh scripts within /online_partitioning_for_ssj/kubernetes can be used. You will need to provided the configuration details through an experiments.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
clusterjoin-baseline		clusterjoin-baseline
kubernetes		kubernetes
scripts		scripts
src		src
ssj-coordinator		ssj-coordinator
ssj-experiment-results		ssj-experiment-results
ssj-load-balancer		ssj-load-balancer
ssj-statistics		ssj-statistics
state-migration-job		state-migration-job
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
copy_jars.sh		copy_jars.sh
pom.xml		pom.xml
post_processing.sh		post_processing.sh
run_ssj.sh		run_ssj.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Distributed Streaming Similarity Joins

Kubernetes Minikube Setup Instructions

Requirements

Instructions for bringing cluster and dashboards up

API (port 5000)

Docker Images

K3s cluster instructions

Managing the deployments

Checking deployment status

Getting the addresses in order to interact with the services

Change docker images

Managing the Flink jobs

Running S³J and the experiments

About

Releases 1

Packages

Contributors 2

Languages

delftdata/s3j-adaptive-similarity-joins

Folders and files

Latest commit

History

Repository files navigation

Adaptive Distributed Streaming Similarity Joins

Kubernetes Minikube Setup Instructions

Requirements

Instructions for bringing cluster and dashboards up

API (port 5000)

Docker Images

K3s cluster instructions

Managing the deployments

Checking deployment status

Getting the addresses in order to interact with the services

Change docker images

Managing the Flink jobs

Running S3J and the experiments

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Running S³J and the experiments

Packages