Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile #2491

Merged
merged 4 commits into from
Mar 13, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
21 changes: 15 additions & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,26 @@ RUN VERSION=$(mvn -q -Dexec.executable=echo -Dexec.args='${project.version}' --n
# Set the BASE_IMAGE build arg when you invoke docker build.
FROM $BASE_IMAGE

ENV DRILL_HOME=/opt/drill DRILL_USER=drilluser
ENV DRILL_HOME=/opt/drill
ENV DRILL_USER=drilluser
ENV DRILL_USER_HOME=/var/lib/drill
ENV DRILL_LOG_DIR=$DRILL_USER_HOME/log
ENV DATA_VOL=/data

RUN mkdir $DRILL_HOME
RUN mkdir $DRILL_HOME $DATA_VOL

COPY --from=build /opt/drill $DRILL_HOME
Copy link
Member

@vvysotskyi vvysotskyi Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, we will have a much larger size of the docker image, since base layers have a larger size. Please move the copy command as close to the end as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi I don't think Docker is meant to duplicate data across layers this way. I think that each layer is supposed to be stored as a delta from the previous layer (even though it may be reported as having the cumulative size of the layers up to that point). So the layer ordering should not affect the size of the final image. Neverthess I have moved everything that I could above the COPY in the Dockerfile and I do still worry about a size blowup because when I list images I see 1.47GB for the image from this Dockerfile, while pulling apache/drill:1.20.0-openjdk-8 gives me an image smaller than 1GB.

apache/drill               snapshot-openjdk-8   57306e5337db   3 minutes ago    1.47GB
apache/drill               1.20.0-openjdk-8     7479402ba1b3   6 days ago       983MB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi in the final commit I found and fixed the size blowup. The RUN chmod command was responsible for duplicating the entire Drill installation just to set file attributes, pretty lame CoW system that Docker has there. Anyway, now the RUN chmod is done in the intermediate container.


RUN groupadd -g 999 $DRILL_USER \
&& useradd -r -u 999 -g $DRILL_USER $DRILL_USER -m -d /var/lib/drill \
&& chown -R $DRILL_USER: $DRILL_HOME
&& useradd -r -u 999 -g $DRILL_USER $DRILL_USER -m -d $DRILL_USER_HOME \
&& chown $DRILL_USER: $DATA_VOL \
&& chmod -R +r $DRILL_HOME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing it here, you can use the --chmod flag when doing the COPY command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi it introduces a dependency on something called BuildKit, output from my attempt to use this flag:

COPY --from=build --chmod=0755 /opt/drill $DRILL_HOME
the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled

Do you still think it's worth it?


USER $DRILL_USER
# A Docker volume where users may store persistent data, e.g. persistent Drill
# config by specifying a Drill BOOT option of sys.store.provider.local.path: "/data".
VOLUME $DATA_VOL

COPY --from=build --chown=$DRILL_USER /opt/drill $DRILL_HOME
USER $DRILL_USER

# Starts Drill in embedded mode and connects to Sqlline
ENTRYPOINT $DRILL_HOME/bin/drill-embedded
jnturton marked this conversation as resolved.
Show resolved Hide resolved
Expand Down