New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile #2491
Conversation
Dockerfile
Outdated
RUN mkdir $DRILL_HOME | ||
RUN mkdir $DRILL_HOME $DATA_VOL | ||
|
||
COPY --from=build /opt/drill $DRILL_HOME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes, we will have a much larger size of the docker image, since base layers have a larger size. Please move the copy command as close to the end as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi I don't think Docker is meant to duplicate data across layers this way. I think that each layer is supposed to be stored as a delta from the previous layer (even though it may be reported as having the cumulative size of the layers up to that point). So the layer ordering should not affect the size of the final image. Neverthess I have moved everything that I could above the COPY in the Dockerfile and I do still worry about a size blowup because when I list images I see 1.47GB for the image from this Dockerfile, while pulling apache/drill:1.20.0-openjdk-8 gives me an image smaller than 1GB.
apache/drill snapshot-openjdk-8 57306e5337db 3 minutes ago 1.47GB
apache/drill 1.20.0-openjdk-8 7479402ba1b3 6 days ago 983MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi in the final commit I found and fixed the size blowup. The RUN chmod
command was responsible for duplicating the entire Drill installation just to set file attributes, pretty lame CoW system that Docker has there. Anyway, now the RUN chmod
is done in the intermediate container.
Dockerfile
Outdated
&& chown -R $DRILL_USER: $DRILL_HOME | ||
&& useradd -r -u 999 -g $DRILL_USER $DRILL_USER -m -d $DRILL_USER_HOME \ | ||
&& chown $DRILL_USER: $DATA_VOL \ | ||
&& chmod -R +r $DRILL_HOME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing it here, you can use the --chmod
flag when doing the COPY command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi it introduces a dependency on something called BuildKit, output from my attempt to use this flag:
COPY --from=build --chmod=0755 /opt/drill $DRILL_HOME
the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled
Do you still think it's worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…pache#2491) * Add a mountpoint and VOLUME for local storage to Dockerfile. * Address review comments concerning layer ordering. * Fix image size blowup by moving chmod to intermediate container. * Combine RUN commands in Dockerfile.
…pache#2491) * Add a mountpoint and VOLUME for local storage to Dockerfile. * Address review comments concerning layer ordering. * Fix image size blowup by moving chmod to intermediate container. * Combine RUN commands in Dockerfile.
* Prepare for the next bugfix iteration. * SAS Reader fixes (#2472) Co-authored-by: pseudomo <pseudomo@yandex.ru> * Add jackson-bom (#2454) * [DRILL-8150] log4j 2.17.2 in format-excel (#2476) * DRILL-8151: Add support for more ElasticSearch and Cassandra data types (#2477) * DRILL-8154: Upgrade to poi 5.2.1 (#2480) * DRILL-8145: Fix flaky TestDrillbitResilience#memoryLeaksWhenCancelled test case (#2471) * Set Brotli codec jar and test to occur only on Linux amd64. * DRILL-8145: Fix flaky TestDrillbitResilience#memoryLeaksWhenCancelled test case - changing timeout for TestDrillbitResilience tests - timing tuning for memoryLeaksWhenCancelled - update TestContainers version - -DforkCount=1 for Travis Maven build - directMemoryMb: 2500 -> 4500 leads to less occasinal test failures Co-authored-by: James Turton <james@somecomputer.xyz> * [MINOR UPDATE] Add Stalebot Config (#2487) * [MINOR UPDATE] Fix license for Stalebot Config (#2488) * DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile (#2491) * Add a mountpoint and VOLUME for local storage to Dockerfile. * Address review comments concerning layer ordering. * Fix image size blowup by moving chmod to intermediate container. * Combine RUN commands in Dockerfile. * DRILL-8168: Do not duplicate attempts to impersonate a user in the REST API (#2495) * DRILL-8172: Use the specified memory usage for Travis CI (#2500) * DRILL-8165: Upgrade liquibase because of CVE-2022-0839 (#2497) * Create codeql-analysis.yml * Update codeql-analysis.yml Removed cpp from code analysis * [MINOR UPDATE] Add license to CodeQL YAML (#2501) * DRILL-8176: Upgrade Jackson Due to CVE-2020-36518 (#2504) * DRILL-8164: Upgrade metadata-extractor because of CVE-2022-24613 (#2493) * DRILL-8164: Upgrade metadata-extractor because of CVE-2022-24613 * Update the ProfileCopyright tag name * Update the mov format name * Add the QuickTime.Rotation tag * Bump metadata-extractor to 2.17.0 * DRILL-8178: Bump AWS Libraries to Latest Version (#2506) * DRILL-8175: Update Drill release script after 1.20 (#2503) * Set DRILL_PID_DIR in Dockerfile to writable location for distributed mode. Some users of the images built from this Dockerfile customise them so that they launch Drill in distributed mode instead of embedded mode. This change saves them from having to set DRILL_PID_DIR themselves in order to succeed. * Update release script and instructions after the release of 1.20. - Add support for specifying a build profile such as "hadoop-2". - Update instuctions for the Drill web site. - Update instructions for uploading RCs (no more home.apache) - Some fixes. * DRILL-8176: minor issue in previous jackson bom (#2508) * minor issue in previous jackson bom * Update pom.xml * DRILL-8187: Dialect factory returns ANSI SQL dialect for BigQuery (#2513) * DRILL-8192: Cassandra queries fail when enabled Mongo plugin (#2518) * DRILL-8013: Drill attempts to push "$SUM0" to JDBC storage plugin for AVG (#2521) * DRILL-8194: Fix the function of REPLACE throws IndexOutOfBoundsException If text's length is more than previously applied (#2522) * DRILL-8200: Update Hadoop libs to ≥ 3.2.3 for CVE-2022-26612 (#2525) * Remove pointless Buffer casts. Compiling Drill with JDK > 8 will still result in ByteBuffer <-> Buffer cast exceptions at runtime when running on JDK 8 even though maven.target.version is set to 8. Setting maven.compiler.release to 8 solves the Buffer casts but raises a compilation error of package sun.security.jgss does not exist for JDK 8. There were a few handwritten casts to avoid the Buffer casting issue but many instances are not covered so the few reverted in this commit achieve nothing. * Update Hadoop to 3.2.3. * [MINOR UPDATE] Update AWS Java SDK to 1.12.211 * DRILL-8219: Handle null catalog names returned by DB2 in storage-jdbc. (#2542) Co-authored-by: pseudomo <yura_levchenko@mail.ru> Co-authored-by: pseudomo <pseudomo@yandex.ru> Co-authored-by: Rymar Maksym <rim.maxim+dev@gmail.com> Co-authored-by: PJ Fanning <pjfanning@users.noreply.github.com> Co-authored-by: Volodymyr Vysotskyi <vvovyk@gmail.com> Co-authored-by: Vitalii Diravka <vitalii@apache.org> Co-authored-by: Charles S. Givre <cgivre@apache.org> Co-authored-by: luoc <luoc@apache.org> Co-authored-by: xurenhe <xurenhe19910131@gmail.com>
…pache#2491) * Add a mountpoint and VOLUME for local storage to Dockerfile. * Address review comments concerning layer ordering. * Fix image size blowup by moving chmod to intermediate container. * Combine RUN commands in Dockerfile.
DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile
Description
Some users of embedded Drill in Docker want to use a Docker volume with Drill, particularly for Drill's persistent local storage so that Drill configuration can persist across container launches. Because Drill no longer runs as root in the Docker container as of 1.20, these users now need a mountpoint inside that container that has been chowned to drilluser.
Also remove unneeded permissions to Drill installation dir that were held by drilluser.
Documentation
Document the existence of the volume mountpoint at /data in the container.
Testing
sys.store.provider.local.path: "/data"
and run test queries. Confirm that an option set with ALTER SYSTEM is persisted across container launches.