Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-4154] Build docker image for each interpreter #3769

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 32 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
FROM maven:3.5-jdk-8 as builder
ADD . /workspace/zeppelin
WORKDIR /workspace/zeppelin
# Allow npm and bower to run with root privileges
RUN echo "unsafe-perm=true" > ~/.npmrc && \
echo '{ "allow_root": true }' > ~/.bowerrc && \
mvn -B package -DskipTests -Pbuild-distr -Pspark-3.0 -Pinclude-hadoop -Phadoop3 -Pspark-scala-2.12 -Pweb-angular && \
# Example with doesn't compile all interpreters
# mvn -B package -DskipTests -Pweb-angular -Pscala-2.11 -Pbuild-distr -pl '!groovy,!submarine,!livy,!hbase,!pig,!file,!flink,!ignite,!kylin,!lens' && \
mv /workspace/zeppelin/zeppelin-distribution/target/zeppelin-*/zeppelin-* /opt/zeppelin/ && \
# Removing stuff saves time, because docker creates a temporary layer
rm -rf ~/.m2 && \
rm -rf /workspace/zeppelin/*

FROM ubuntu:18.04
COPY --from=builder /opt/zeppelin /opt/zeppelin
38 changes: 37 additions & 1 deletion docs/setup/deployment/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This document contains instructions about making docker containers for Zeppelin.
### Installing Docker
You need to [install docker](https://docs.docker.com/engine/installation/) on your machine.

### Running docker image
### Running docker image for Zeppelin distribution

```bash
docker run -p 8080:8080 --rm --name zeppelin apache/zeppelin:<release-version>
Expand Down Expand Up @@ -59,3 +59,39 @@ cd scripts/docker/zeppelin/bin
docker build -t my-zeppelin:my-tag ./
```

### Build docker image for Zeppelin server & interpreters

Starting from 0.9, Zeppelin support to run in k8s or docker. So we add the capability to
build docker images for zeppelin server & interpreter. Please note that the provided Dockerfile doesn't match each environment.

At first your need to build a zeppelin-distribution docker image.
```bash
cd $ZEPPELIN_HOME
docker build -t zeppelin-distribution .
```

Build docker image for zeppelin server.
```bash
cd $ZEPPELIN_HOME/scripts/docker/zeppelin-server
docker build -t zeppelin-server .
```

Build base docker image for zeppelin interpreter.
```bash
cd $ZEPPELIN_HOME/scripts/docker/zeppelin-interpreter
docker build -t zeppelin-interpreter-base -f Dockerfile_interpreter_base .
```

Build image for zeppelin interpreter <interpreter_name>. By default, we use the `scripts/docker/zeppelin-interpreter/Dockerfile` to build the interpreter image, but we have also some customize Dockerfiles under `scripts/docker/zeppelin-interpreter/<interpreter_name>`. For examples, in offical Apache Zeppelin, we provide 3 customized images for python,r,spark.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Kubernetes, would it be possible to use customized interpreter image (scripts/docker/zeppelin-interpreter/<interpreter_name>) for particular interpreters and fallback to default interpreter image (scripts/docker/zeppelin-interpreter/Dockerfile) for all other interpreters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the interpreter image build process should use 'scripts/docker/Zeppelin interpreter/docker file'. For K8s you should have all interpreter images in the docker registry. Fallback logic in the Zeppelin server should be avoided.


```bash
# default interpreter by interpreter_name (e.g. md)
cd $ZEPPELIN_HOME/scripts/docker/zeppelin-interpreter
docker build -t zeppelin-interpreter-md -f Dockerfile --build-arg interpreter_name=md .
```

```bash
# python interpreter
cd $ZEPPELIN_HOME/scripts/docker/zeppelin-interpreter/python
docker build -t zeppelin-interpreter-python -f Dockerfile .
```
4 changes: 2 additions & 2 deletions k8s/zeppelin-server.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ data:
# If you have your ingress controller configured to connect to `zeppelin-server` service and have a domain name for it (with wildcard subdomain point the same address), you can replace serviceDomain field with your own domain.
SERVICE_DOMAIN: local.zeppelin-project.org:8080
ZEPPELIN_K8S_SPARK_CONTAINER_IMAGE: spark:2.4.5
ZEPPELIN_K8S_CONTAINER_IMAGE: apache/zeppelin:0.9.0-SNAPSHOT
ZEPPELIN_K8S_CONTAINER_IMAGE: apache/zeppelin-interpreter:0.9.0-SNAPSHOT
ZEPPELIN_HOME: /zeppelin
ZEPPELIN_SERVER_RPC_PORTRANGE: 12320:12320
# default value of 'master' property for spark interpreter.
Expand Down Expand Up @@ -115,7 +115,7 @@ spec:
path: nginx.conf
containers:
- name: zeppelin-server
image: apache/zeppelin:0.9.0-SNAPSHOT
image: apache/zeppelin-server:0.9.0-SNAPSHOT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the official docker images are built from scripts/docker/zeppelin/bin/Dockerfile.
Apache Infra configured the Dockerfile location for automated build on release. (see this comment)

Do you think we can release images based on new Dockerfiles in this PullRequest and remove scripts/docker/zeppelin/bin/Dockerfile ?

While /k8s/zeppelin-server.yaml points new docker image names, i think it make sense to release new images as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete scripts/docker/zeppelin/bin/Dockerfile for the next release. First we should check if Docker and k8s are able to use a flexible interpreter image. At least k8s is currently not able to use a flexible interpreter image.

It makes sense to push these new images. How should we handle different compilation versions?
At the moment I'm compiling Zeppelin with the newest versions of Hadoop and Spark.

mvn -B package -DskipTests -Pbuild-distr -Pspark-3.0 -Phadoop3 -Pspark-scala-2.12 -Pweb-angular

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for Spark interpreter, it's got binary level compatibility to different spark (and hadoop) versions. Once built, It works with different versions without rebuilding it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it all will work on non-K8S? Like, using Docker just for not installing anything to machine, and one image is more handy to work with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one image is more handy to work with

You are right, one or at least a small set of images is more practical for work. In fact, I currently have only three images (distribution image, server, (one large) interpreter image) in my K8s setup.

The title of ZEPPELIN-4154 and the first PR #3380 imply an image for each interpreter. This PR tries to solve the task.

In my opinion we should at least provide different images for the Zeppelin server and the Zeppelin interpreter. A distribution image is useful to build Zeppelin only once and copy the same version to the Zeppelin server and the Zeppelin interpreter.

My main goal for different images is to reduce the size and start time of images in a container cluster.
The size of my current Zeppelin image is only 410.95MB. The download and total startup time of the new instance is short.
My Zeppelin interpreter image is quite large (1.53 GB). The download time is quite long.

If we want to create an image for each interpreter, the image size is reduced. All interpreter images should use the same base image to benefit from a potentially available layer.

How it all will work on non-K8S? Like, using Docker just for not installing anything to machine

Docker is also able to set up a local network, in most cases this is done via a bridge network. The Zeppelin server needs access to create/modify the network via the Docker daemon's tcp interface or at least the information when new containers are created via the tcp interface.
In my opinion, a docker cluster via a docker swarm should not fall within the scope of this project.

command: ["sh", "-c", "$(ZEPPELIN_HOME)/bin/zeppelin.sh"]
lifecycle:
preStop:
Expand Down
23 changes: 23 additions & 0 deletions scripts/docker/zeppelin-interpreter/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


FROM zeppelin-distribution:latest AS zeppelin-distribution

FROM zeppelin-interpreter-base:latest
# Must declare it after FROM, because it would be reset if it is declared before FROM
ARG interpreter_name
# Copy interpreter
COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/${interpreter_name} ${Z_HOME}/interpreter/${interpreter_name}
40 changes: 40 additions & 0 deletions scripts/docker/zeppelin-interpreter/Dockerfile_interpreter_base
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM zeppelin-distribution:latest AS zeppelin-distribution

FROM ubuntu:18.04

LABEL maintainer="Apache Software Foundation <dev@zeppelin.apache.org>"

ARG version="0.9.0-SNAPSHOT"

ENV VERSION="${version}" \
Z_HOME="/opt/zeppelin"

RUN set -ex && \
apt-get -y update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y openjdk-8-jre-headless wget && \
# Cleanup
rm -rf /var/lib/apt/lists/* && \
apt-get autoclean && \
apt-get clean

COPY --from=zeppelin-distribution /opt/zeppelin/bin ${Z_HOME}/bin
COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/zeppelin-interpreter-shaded-${VERSION}.jar ${Z_HOME}/interpreter/zeppelin-interpreter-shaded-${VERSION}.jar
COPY log4j.properties ${Z_HOME}/conf/
COPY log4j_yarn_cluster.properties ${Z_HOME}/conf/

WORKDIR ${Z_HOME}
22 changes: 22 additions & 0 deletions scripts/docker/zeppelin-interpreter/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

log4j.rootLogger = INFO, stdout

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n
23 changes: 23 additions & 0 deletions scripts/docker/zeppelin-interpreter/log4j_yarn_cluster.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

log4j.rootLogger = INFO, stdout

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n

50 changes: 50 additions & 0 deletions scripts/docker/zeppelin-interpreter/python/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM zeppelin-distribution:latest AS zeppelin-distribution

FROM zeppelin-interpreter-base:latest

ARG miniconda_version="py37_4.8.2"
ARG miniconda_sha256="957d2f0f0701c3d1335e3b39f235d197837ad69a944fa6f5d8ad2c686b69df3b"

ENV MINICONDA_VERSION=${miniconda_version}

# Install additional_conda_packages
COPY python_conda_packages.txt /python_conda_packages.txt
# Some packages are not available via conda
COPY pip_packages.txt /pip_packages.txt
# Install Miniconda3
RUN set -ex && \
wget -nv https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \
echo "${miniconda_sha256} miniconda.sh" > anaconda.sha256 && \
sha256sum --strict -c anaconda.sha256 && \
bash miniconda.sh -b -p /opt/conda && \
export PATH=/opt/conda/bin:$PATH && \
conda config --set always_yes yes --set changeps1 no && \
conda info -a && \
conda config --add channels conda-forge && \
conda install -y --quiet --file /python_conda_packages.txt && \
pip install -q -r /pip_packages.txt && \
# Cleanup
rm -v miniconda.sh anaconda.sha256 && \
# Cleanup based on https://github.com/ContinuumIO/docker-images/commit/cac3352bf21a26fa0b97925b578fb24a0fe8c383
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
conda clean -ay

ENV PATH /opt/conda/bin:$PATH

COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/python ${Z_HOME}/interpreter/python
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bkzep==0.6.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
pycodestyle
numpy
pandas
scipy
grpcio
hvplot
protobuf
pandasql
ipython
matplotlib
ipykernel
jupyter_client
bokeh
apache-beam
30 changes: 30 additions & 0 deletions scripts/docker/zeppelin-interpreter/r/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM zeppelin-distribution:latest AS zeppelin-distribution

FROM zeppelin-interpreter-python:latest

# Install additional_conda_packages
COPY r_conda_packages.txt /r_conda_packages.txt
# Install necessary r packages
RUN set -ex && \
conda install -y --quiet --file /r_conda_packages.txt && \
# Cleanup based on https://github.com/ContinuumIO/docker-images/commit/cac3352bf21a26fa0b97925b578fb24a0fe8c383
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
conda clean -ay

COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/r ${Z_HOME}/interpreter/r
6 changes: 6 additions & 0 deletions scripts/docker/zeppelin-interpreter/r/r_conda_packages.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
r-evaluate
r-base64enc
r-knitr
r-ggplot2
r-shiny
r-googlevis
20 changes: 20 additions & 0 deletions scripts/docker/zeppelin-interpreter/spark/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM zeppelin-distribution:latest AS zeppelin-distribution

FROM zeppelin-interpreter-r:latest

COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/spark ${Z_HOME}/interpreter/spark