Skip to content

Commit

Permalink
Add support for Spark 3.3.0-hadoop3.3
Browse files Browse the repository at this point in the history
  • Loading branch information
GezimSejdiu committed Jul 1, 2022
1 parent 7050229 commit bc2a665
Show file tree
Hide file tree
Showing 21 changed files with 33 additions and 32 deletions.
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Docker images to:
<details open>
<summary>Currently supported versions:</summary>

* Spark 3.3.0 for Hadoop 3.3 with OpenJDK 8 and Scala 2.12
* Spark 3.2.1 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
* Spark 3.2.0 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
* Spark 3.1.2 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
Expand Down Expand Up @@ -52,15 +53,15 @@ Add the following services to your `docker-compose.yml` to integrate a Spark mas
version: '3'
services:
spark-master:
image: bde2020/spark-master:3.2.1-hadoop3.2
image: bde2020/spark-master:3.3.0-hadoop3.3
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:3.2.1-hadoop3.2
image: bde2020/spark-worker:3.3.0-hadoop3.3
container_name: spark-worker-1
depends_on:
- spark-master
Expand All @@ -69,7 +70,7 @@ services:
environment:
- "SPARK_MASTER=spark://spark-master:7077"
spark-worker-2:
image: bde2020/spark-worker:3.2.1-hadoop3.2
image: bde2020/spark-worker:3.3.0-hadoop3.3
container_name: spark-worker-2
depends_on:
- spark-master
Expand All @@ -78,7 +79,7 @@ services:
environment:
- "SPARK_MASTER=spark://spark-master:7077"
spark-history-server:
image: bde2020/spark-history-server:3.2.1-hadoop3.2
image: bde2020/spark-history-server:3.3.0-hadoop3.3
container_name: spark-history-server
depends_on:
- spark-master
Expand All @@ -93,12 +94,12 @@ Make sure to fill in the `INIT_DAEMON_STEP` as configured in your pipeline.
### Spark Master
To start a Spark master:

docker run --name spark-master -h spark-master -d bde2020/spark-master:3.2.1-hadoop3.2
docker run --name spark-master -h spark-master -d bde2020/spark-master:3.3.0-hadoop3.3

### Spark Worker
To start a Spark worker:

docker run --name spark-worker-1 --link spark-master:spark-master -d bde2020/spark-worker:3.2.1-hadoop3.2
docker run --name spark-worker-1 --link spark-master:spark-master -d bde2020/spark-worker:3.3.0-hadoop3.3

## Launch a Spark application
Building and running your Spark application on top of the Spark cluster is as simple as extending a template Docker image. Check the template's README for further documentation.
Expand All @@ -118,11 +119,11 @@ It will also setup a headless service so spark clients can be reachable from the

Then to use `spark-shell` issue

`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.2.1-hadoop3.2 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client`
`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.3.0-hadoop3.3 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client`

To use `spark-submit` issue for example

`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.2.1-hadoop3.2 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP`
`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.3.0-hadoop3.3 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP`

You can use your own image packed with Spark and your application but when deployed it must be reachable from the workers.
One way to achieve this is by creating a headless service for your pod and then use `--conf spark.driver.host=YOUR_HEADLESS_SERVICE` whenever you submit your application.
4 changes: 2 additions & 2 deletions base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ ENV INIT_DAEMON_BASE_URI http://identifier/init-daemon
ENV INIT_DAEMON_STEP spark_master_init

ENV BASE_URL=https://archive.apache.org/dist/spark/
ENV SPARK_VERSION=3.2.1
ENV HADOOP_VERSION=3.2
ENV SPARK_VERSION=3.3.0
ENV HADOOP_VERSION=3

COPY wait-for-step.sh /
COPY execute-step.sh /
Expand Down
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

set -e

TAG=3.2.1-hadoop3.2
TAG=3.3.0-hadoop3.3

build() {
NAME=$1
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
version: '3'
services:
spark-master:
image: bde2020/spark-master:3.2.1-hadoop3.2
image: bde2020/spark-master:3.3.0-hadoop3.3
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:3.2.1-hadoop3.2
image: bde2020/spark-worker:3.3.0-hadoop3.3
container_name: spark-worker-1
depends_on:
- spark-master
Expand Down
2 changes: 1 addition & 1 deletion examples/maven/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-maven-template:3.2.1-hadoop3.2
FROM bde2020/spark-maven-template:3.3.0-hadoop3.3

LABEL MAINTAINER="Gezim Sejdiu <g.sejdiu@gmail.com>"

Expand Down
2 changes: 1 addition & 1 deletion examples/maven/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ To run the application, execute the following steps:
```
3. Run the Docker container:
```bash
docker run --rm --network dockerspark_default --name spark-maven-example bde2020/spark-maven-example:3.2.1-hadoop3.2
docker run --rm --network dockerspark_default --name spark-maven-example bde2020/spark-maven-example:3.3.0-hadoop3.3
```
2 changes: 1 addition & 1 deletion examples/maven/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.version>2.12.13</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<spark.version>3.2.1</spark.version>
<spark.version>3.3.0</spark.version>
</properties>

<dependencies>
Expand Down
2 changes: 1 addition & 1 deletion examples/python/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-python-template:3.2.1-hadoop3.2
FROM bde2020/spark-python-template:3.3.0-hadoop3.3

COPY wordcount.py /app/

Expand Down
2 changes: 1 addition & 1 deletion examples/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ To run the application, execute the following steps:
```
3. Run the Docker container:
```bash
docker run --rm --network dockerspark_default --name pyspark-example bde2020/spark-python-example:3.2.1-hadoop3.2
docker run --rm --network dockerspark_default --name pyspark-example bde2020/spark-python-example:3.3.0-hadoop3.3
```
2 changes: 1 addition & 1 deletion history-server/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.1-hadoop3.2
FROM bde2020/spark-base:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

Expand Down
4 changes: 2 additions & 2 deletions k8s-spark-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ spec:
spec:
containers:
- name: spark-master
image: bde2020/spark-master:3.2.1-hadoop3.2
image: bde2020/spark-master:3.3.0-hadoop3.3
imagePullPolicy: Always
ports:
- containerPort: 8080
Expand All @@ -70,7 +70,7 @@ spec:
spec:
containers:
- name: spark-worker
image: bde2020/spark-worker:3.2.1-hadoop3.2
image: bde2020/spark-worker:3.3.0-hadoop3.3
imagePullPolicy: Always
ports:
- containerPort: 8081
2 changes: 1 addition & 1 deletion master/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.1-hadoop3.2
FROM bde2020/spark-base:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

Expand Down
2 changes: 1 addition & 1 deletion submit/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.1-hadoop3.2
FROM bde2020/spark-base:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

Expand Down
2 changes: 1 addition & 1 deletion template/maven/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-submit:3.2.1-hadoop3.2
FROM bde2020/spark-submit:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

Expand Down
2 changes: 1 addition & 1 deletion template/maven/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ If you overwrite the template's `CMD` in your Dockerfile, make sure to execute t

#### Example Dockerfile
```
FROM bde2020/spark-maven-template:3.2.1-hadoop3.2
FROM bde2020/spark-maven-template:3.3.0-hadoop3.3
MAINTAINER Erika Pauwels <erika.pauwels@tenforce.com>
MAINTAINER Gezim Sejdiu <g.sejdiu@gmail.com>
Expand Down
2 changes: 1 addition & 1 deletion template/python/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-submit:3.2.1-hadoop3.2
FROM bde2020/spark-submit:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

Expand Down
2 changes: 1 addition & 1 deletion template/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If you overwrite the template's `CMD` in your Dockerfile, make sure to execute t

#### Example Dockerfile
```
FROM bde2020/spark-python-template:3.2.1-hadoop3.2
FROM bde2020/spark-python-template:3.3.0-hadoop3.3
MAINTAINER You <you@example.org>
Expand Down
4 changes: 2 additions & 2 deletions template/sbt/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
FROM bde2020/spark-submit:3.2.1-hadoop3.2
FROM bde2020/spark-submit:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

ARG SBT_VERSION
ENV SBT_VERSION=${SBT_VERSION:-1.4.1}
ENV SBT_VERSION=${SBT_VERSION:-1.6.2}

RUN wget -O - https://github.com/sbt/sbt/releases/download/v${SBT_VERSION}/sbt-${SBT_VERSION}.tgz | gunzip | tar -x -C /usr/local

Expand Down
2 changes: 1 addition & 1 deletion template/sbt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ the `/template.sh` script at the end.
#### Example Dockerfile

```
FROM bde2020/spark-sbt-template:3.2.1-hadoop3.2
FROM bde2020/spark-sbt-template:3.3.0-hadoop3.3
MAINTAINER Cecile Tonglet <cecile.tonglet@tenforce.com>
Expand Down
2 changes: 1 addition & 1 deletion template/sbt/build.sbt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
scalaVersion := "2.12.14"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.2.1" % "provided"
"org.apache.spark" %% "spark-sql" % "3.3.0" % "provided"
)
2 changes: 1 addition & 1 deletion worker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.1-hadoop3.2
FROM bde2020/spark-base:3.3.0-hadoop3.3

LABEL maintainer="Gezim Sejdiu <g.sejdiu@gmail.com>, Giannis Mouchakis <gmouchakis@gmail.com>"

Expand Down

0 comments on commit bc2a665

Please sign in to comment.