Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(dev/docker/hive): shrink hive Docker image size by 420MB #3268

Merged
merged 10 commits into from
May 22, 2024
2 changes: 1 addition & 1 deletion catalogs/catalog-hadoop/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ tasks.test {
dependsOn(tasks.jar)

doFirst {
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.11")
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.12")
}

val init = project.extra.get("initIntegrationTest") as (Test) -> Unit
Expand Down
2 changes: 1 addition & 1 deletion catalogs/catalog-hive/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ tasks.test {
dependsOn(tasks.jar)

doFirst {
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.11")
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.12")
}

val init = project.extra.get("initIntegrationTest") as (Test) -> Unit
Expand Down
2 changes: 1 addition & 1 deletion catalogs/catalog-lakehouse-iceberg/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ tasks.test {
dependsOn(tasks.jar)

doFirst {
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.11")
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.12")
}

val init = project.extra.get("initIntegrationTest") as (Test) -> Unit
Expand Down
2 changes: 1 addition & 1 deletion dev/docker/build-docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ fi

if [[ "${component_type}" == "hive" ]]; then
. ${script_dir}/hive/hive-dependency.sh
build_args="--build-arg HADOOP_PACKAGE_NAME=${HADOOP_PACKAGE_NAME} --build-arg HIVE_PACKAGE_NAME=${HIVE_PACKAGE_NAME} --build-arg JDBC_DIVER_PACKAGE_NAME=${JDBC_DIVER_PACKAGE_NAME}"
build_args="--build-arg HADOOP_PACKAGE_NAME=${HADOOP_PACKAGE_NAME} --build-arg HIVE_PACKAGE_NAME=${HIVE_PACKAGE_NAME} --build-arg JDBC_DIVER_PACKAGE_NAME=${JDBC_DIVER_PACKAGE_NAME} --build-arg HADOOP_VERSION=${HADOOP_VERSION} --build-arg HIVE_VERSION=${HIVE_VERSION} --build-arg MYSQL_JDBC_DRIVER_VERSION=${MYSQL_JDBC_DRIVER_VERSION}"
elif [[ "${component_type}" == "kerberos-hive" ]]; then
. ${script_dir}/kerberos-hive/hive-dependency.sh
build_args="--build-arg HADOOP_PACKAGE_NAME=${HADOOP_PACKAGE_NAME} --build-arg HIVE_PACKAGE_NAME=${HIVE_PACKAGE_NAME} --build-arg JDBC_DIVER_PACKAGE_NAME=${JDBC_DIVER_PACKAGE_NAME}"
Expand Down
19 changes: 11 additions & 8 deletions dev/docker/hive/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@
FROM ubuntu:16.04
LABEL maintainer="support@datastrato.com"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need kept LABEL maintainer="support@datastrato.com" in the here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't use 2 stage build right now, this review is outdated.


ARG HADOOP_PACKAGE_NAME
ARG HIVE_PACKAGE_NAME
ARG HADOOP_VERSION
ARG HIVE_VERSION
ARG JDBC_DIVER_PACKAGE_NAME
ARG MYSQL_JDBC_DRIVER_VERSION
ARG DEBIAN_FRONTEND=noninteractive

WORKDIR /
Expand Down Expand Up @@ -41,12 +42,10 @@ RUN apt-get update && apt-get upgrade -y && apt-get install --fix-missing -yq \
openjdk-8-jdk

#################################################################################
## setup ssh
# setup ssh
RUN mkdir /root/.ssh
RUN cat /dev/zero | ssh-keygen -q -N "" > /dev/null && cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys

COPY packages /tmp/packages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use ADD, if use ADD in docker file for a zip file, docker builder will copy and unzip it, it will reduce the image size.

then, seems we do not need two stage build.

Copy link
Contributor Author

@unknowntpo unknowntpo May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you advise! I used ADD in the beginning, but I found that we need --strip-component to remove outer directory, ADD cannot do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, --strip-component did not worked in ADD, so I use a soft link in Doris Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll try the soft-link approach, thanks 😀


################################################################################
# set environment variables
ENV JAVA_HOME=/usr/local/jdk
Expand Down Expand Up @@ -91,7 +90,8 @@ RUN echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" >> /etc/environment
################################################################################
# install hadoop
RUN mkdir ${HADOOP_HOME}
RUN tar -xz -C ${HADOOP_HOME} --strip-components 1 -f /tmp/packages/${HADOOP_PACKAGE_NAME}
ADD packages/hadoop-${HADOOP_VERSION}.tar.gz /opt/
RUN ln -s /opt/hadoop-${HADOOP_VERSION}/* ${HADOOP_HOME}

# replace configuration templates
RUN rm -f ${HADOOP_CONF_DIR}/core-site.xml
Expand All @@ -111,7 +111,9 @@ RUN ${HADOOP_HOME}/bin/hdfs namenode -format -nonInteractive
################################################################################
# install hive
RUN mkdir ${HIVE_HOME}
RUN tar -xz -C ${HIVE_HOME} --strip-components 1 -f /tmp/packages/${HIVE_PACKAGE_NAME}
ADD packages/apache-hive-${HIVE_VERSION}-bin.tar.gz /opt/
RUN ln -s /opt/apache-hive-${HIVE_VERSION}-bin/* ${HIVE_HOME}

ADD hive-site.xml ${HIVE_HOME}/conf/hive-site.xml

################################################################################
Expand All @@ -127,7 +129,8 @@ RUN sed -i "s/.*bind-address.*/bind-address = 0.0.0.0/" /etc/mysql/mysql.conf.d/

################################################################################
# add mysql jdbc driver
RUN tar -xz -C ${HIVE_HOME}/lib --strip-components 1 -f /tmp/packages/${JDBC_DIVER_PACKAGE_NAME}
ADD packages/mysql-connector-java-${MYSQL_JDBC_DRIVER_VERSION}.tar.gz /opt/
RUN ln -s /opt/mysql-connector-java-${MYSQL_JDBC_DRIVER_VERSION}/* ${HIVE_HOME}/lib

################################################################################
# add users and groups
Expand Down
2 changes: 2 additions & 0 deletions dev/docker/hive/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,7 @@ ${HIVE_HOME}/bin/schematool -initSchema -dbType mysql
${HIVE_HOME}/bin/hive --service hiveserver2 > /dev/null 2>&1 &
${HIVE_HOME}/bin/hive --service metastore > /dev/null 2>&1 &

echo "Hive started successfully."

# persist the container
tail -f /dev/null
3 changes: 3 additions & 0 deletions docs/docker-image-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ You can use this kind of image to test the catalog of Apache Hive.

Changelog

- gravitino-ci-hive:0.1.12
- Shrink hive Docker image size by 420MB

- gravitino-ci-hive:0.1.11
- Remove `yarn` from the startup script; Remove `yarn-site.xml` and `yarn-env.sh` files;
- Change the value of `mapreduce.framework.name` from `yarn` to `local` in the `mapred-site.xml` file.
Expand Down
2 changes: 1 addition & 1 deletion integration-test/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ tasks.test {

doFirst {
// Gravitino CI Docker image
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.11")
environment("GRAVITINO_CI_HIVE_DOCKER_IMAGE", "datastrato/gravitino-ci-hive:0.1.12")
environment("GRAVITINO_CI_TRINO_DOCKER_IMAGE", "datastrato/gravitino-ci-trino:0.1.5")
environment("GRAVITINO_CI_KAFKA_DOCKER_IMAGE", "apache/kafka:3.7.0")
environment("GRAVITINO_CI_DORIS_DOCKER_IMAGE", "datastrato/gravitino-ci-doris:0.1.3")
Expand Down