Skip to content

Latest commit

 

History

History
87 lines (61 loc) · 2.05 KB

README.md

File metadata and controls

87 lines (61 loc) · 2.05 KB

Spark

I was inspired by this repository: https://github.com/testdrivenio/spark-kubernetes However I have continued on the structure and made my own improvements.

Docker

The container is available under tools/docker/spark/Dockerfile.

It contains:

  • python3.9.1
  • spark 3.0.1

The command minikube docker-env returns a set of bash environment variable exports to configure your local environment to re-use the Docker daemon inside the Minikube instance.

eval $(minikube docker-env)
make build_spark_docker

Start minikube:

minikube start --memory 8192 --cpus 4 --vm=true

Enable minikube dashboard:

minikube dashboard

Tear up:

make deploy_spark_k8s_cluster

Browse into the minikube dashboard and this should be visible as it gets deployed to the cluster.

alt text

I have for some reason noticed that it does not work when I do this in the script, so run this stand alone for now!

kubectl apply -f ./deploy/k8s/spark/minikube-ingress.yaml

Then run:

echo "$(minikube ip) sparkkubernetes" | sudo tee -a /etc/hosts

The spark web ui and spark detailed web ui can now be reached at:

http://sparkkubernetes

alt text

NOTE! jobs will only be available while you run a spark job, access this while running example code below, or simply run a pyspark shell interactively if there is a need to verify this

Example

  |~/c/deiteo | on   DEITEO-001-F…To-Worker(s) !1 ▓▒░ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
spark-worker-cb4fc9c8d-fhsxh   1/1     Running   0          53m
sparkmaster-cccbbdfcd-qktwq    1/1     Running   0          54m
>>
kubectl exec sparkmaster-cccbbdfcd-b7g2d -it -- spark-submit example_spark.py --config deiteo.yaml --local False

http://sparkkubernetes/jobs

alt text

To tear down run:

make delete_spark_k8s_cluster