I was inspired by this repository: https://github.com/testdrivenio/spark-kubernetes However I have continued on the structure and made my own improvements.
The container is available under tools/docker/spark/Dockerfile
.
It contains:
- python3.9.1
- spark 3.0.1
The command minikube docker-env
returns a set of bash environment variable exports to configure
your local environment to re-use the Docker daemon inside the Minikube instance.
eval $(minikube docker-env)
make build_spark_docker
Start minikube
:
minikube start --memory 8192 --cpus 4 --vm=true
Enable minikube
dashboard
:
minikube dashboard
Tear up:
make deploy_spark_k8s_cluster
Browse into the minikube dashboard
and this should be visible as it gets deployed to the cluster.
I have for some reason noticed that it does not work when I do this in the script, so run this stand alone for now!
kubectl apply -f ./deploy/k8s/spark/minikube-ingress.yaml
Then run:
echo "$(minikube ip) sparkkubernetes" | sudo tee -a /etc/hosts
The spark web ui
and spark detailed web ui
can now be reached at:
NOTE! jobs will only be available while you run a spark job, access this while running example code below, or simply run a pyspark shell interactively if there is a need to verify this
Example
| ~/c/deiteo | on DEITEO-001-F…To-Worker(s) !1 ▓▒░ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-worker-cb4fc9c8d-fhsxh 1/1 Running 0 53m
sparkmaster-cccbbdfcd-qktwq 1/1 Running 0 54m
>>
kubectl exec sparkmaster-cccbbdfcd-b7g2d -it -- spark-submit example_spark.py --config deiteo.yaml --local False
To tear down run:
make delete_spark_k8s_cluster