Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile #2491

Merged
merged 4 commits into from Mar 13, 2022

Conversation

jnturton
Copy link
Contributor

@jnturton jnturton commented Mar 9, 2022

DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile

Description

Some users of embedded Drill in Docker want to use a Docker volume with Drill, particularly for Drill's persistent local storage so that Drill configuration can persist across container launches. Because Drill no longer runs as root in the Docker container as of 1.20, these users now need a mountpoint inside that container that has been chowned to drilluser.

Also remove unneeded permissions to Drill installation dir that were held by drilluser.

Documentation

Document the existence of the volume mountpoint at /data in the container.

Testing

  1. Build an image using Drill master and the Dockerfile in this branch.
  2. Launch the image with default args (no volume mounted) and run test queries. Confirm that an option set with ALTER SYSTEM is lost across container launches.
  3. Launch the image with a host volume mounted at /data and sys.store.provider.local.path: "/data" and run test queries. Confirm that an option set with ALTER SYSTEM is persisted across container launches.

@jnturton jnturton requested a review from vvysotskyi March 9, 2022 11:02
Dockerfile Outdated
RUN mkdir $DRILL_HOME
RUN mkdir $DRILL_HOME $DATA_VOL

COPY --from=build /opt/drill $DRILL_HOME
Copy link
Member

@vvysotskyi vvysotskyi Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, we will have a much larger size of the docker image, since base layers have a larger size. Please move the copy command as close to the end as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi I don't think Docker is meant to duplicate data across layers this way. I think that each layer is supposed to be stored as a delta from the previous layer (even though it may be reported as having the cumulative size of the layers up to that point). So the layer ordering should not affect the size of the final image. Neverthess I have moved everything that I could above the COPY in the Dockerfile and I do still worry about a size blowup because when I list images I see 1.47GB for the image from this Dockerfile, while pulling apache/drill:1.20.0-openjdk-8 gives me an image smaller than 1GB.

apache/drill               snapshot-openjdk-8   57306e5337db   3 minutes ago    1.47GB
apache/drill               1.20.0-openjdk-8     7479402ba1b3   6 days ago       983MB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi in the final commit I found and fixed the size blowup. The RUN chmod command was responsible for duplicating the entire Drill installation just to set file attributes, pretty lame CoW system that Docker has there. Anyway, now the RUN chmod is done in the intermediate container.

Dockerfile Outdated
&& chown -R $DRILL_USER: $DRILL_HOME
&& useradd -r -u 999 -g $DRILL_USER $DRILL_USER -m -d $DRILL_USER_HOME \
&& chown $DRILL_USER: $DATA_VOL \
&& chmod -R +r $DRILL_HOME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing it here, you can use the --chmod flag when doing the COPY command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi it introduces a dependency on something called BuildKit, output from my attempt to use this flag:

COPY --from=build --chmod=0755 /opt/drill $DRILL_HOME
the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled

Do you still think it's worth it?

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@jnturton jnturton merged commit 00d0cb1 into apache:master Mar 13, 2022
jnturton added a commit to jnturton/drill that referenced this pull request May 11, 2022
…pache#2491)

* Add a mountpoint and VOLUME for local storage to Dockerfile.

* Address review comments concerning layer ordering.

* Fix image size blowup by moving chmod to intermediate container.

* Combine RUN commands in Dockerfile.
jnturton added a commit to jnturton/drill that referenced this pull request May 11, 2022
…pache#2491)

* Add a mountpoint and VOLUME for local storage to Dockerfile.

* Address review comments concerning layer ordering.

* Fix image size blowup by moving chmod to intermediate container.

* Combine RUN commands in Dockerfile.
jnturton added a commit that referenced this pull request May 12, 2022
* Prepare for the next bugfix iteration.

* SAS Reader fixes (#2472)

Co-authored-by: pseudomo <pseudomo@yandex.ru>

* Add jackson-bom (#2454)

* [DRILL-8150] log4j 2.17.2 in format-excel (#2476)

* DRILL-8151: Add support for more ElasticSearch and Cassandra data types (#2477)

* DRILL-8154: Upgrade to poi 5.2.1 (#2480)

* DRILL-8145: Fix flaky TestDrillbitResilience#memoryLeaksWhenCancelled test case (#2471)

* Set Brotli codec jar and test to occur only on Linux amd64.

* DRILL-8145: Fix flaky TestDrillbitResilience#memoryLeaksWhenCancelled test case

- changing timeout for TestDrillbitResilience tests
- timing tuning for memoryLeaksWhenCancelled
- update TestContainers version
- -DforkCount=1 for Travis Maven build
- directMemoryMb: 2500 -> 4500 leads to less occasinal test failures

Co-authored-by: James Turton <james@somecomputer.xyz>

* [MINOR UPDATE] Add Stalebot Config (#2487)

* [MINOR UPDATE] Fix license for Stalebot Config (#2488)

* DRILL-8156: Declare and chown a /data VOLUME in the Drill Dockerfile (#2491)

* Add a mountpoint and VOLUME for local storage to Dockerfile.

* Address review comments concerning layer ordering.

* Fix image size blowup by moving chmod to intermediate container.

* Combine RUN commands in Dockerfile.

* DRILL-8168: Do not duplicate attempts to impersonate a user in the REST API (#2495)

* DRILL-8172: Use the specified memory usage for Travis CI (#2500)

* DRILL-8165: Upgrade liquibase because of CVE-2022-0839 (#2497)

* Create codeql-analysis.yml

* Update codeql-analysis.yml

Removed cpp from code analysis

* [MINOR UPDATE] Add license to CodeQL YAML (#2501)

* DRILL-8176: Upgrade Jackson Due to CVE-2020-36518 (#2504)

* DRILL-8164: Upgrade metadata-extractor because of CVE-2022-24613 (#2493)

* DRILL-8164: Upgrade metadata-extractor because of CVE-2022-24613

* Update the ProfileCopyright tag name

* Update the mov format name

* Add the QuickTime.Rotation tag

* Bump metadata-extractor to 2.17.0

* DRILL-8178: Bump AWS Libraries to Latest Version (#2506)

* DRILL-8175: Update Drill release script after 1.20 (#2503)

* Set DRILL_PID_DIR in Dockerfile to writable location for distributed mode.

Some users of the images built from this Dockerfile customise
them so that they launch Drill in distributed mode instead of
embedded mode.  This change saves them from having to set
DRILL_PID_DIR themselves in order to succeed.

* Update release script and instructions after the release of 1.20.

- Add support for specifying a build profile such as "hadoop-2".
- Update instuctions for the Drill web site.
- Update instructions for uploading RCs (no more home.apache)
- Some fixes.

* DRILL-8176: minor issue in previous jackson bom (#2508)

* minor issue in previous jackson bom

* Update pom.xml

* DRILL-8187: Dialect factory returns ANSI SQL dialect for BigQuery (#2513)

* DRILL-8192: Cassandra queries fail when enabled Mongo plugin (#2518)

* DRILL-8013: Drill attempts to push "$SUM0" to JDBC storage plugin for AVG (#2521)

* DRILL-8194: Fix the function of REPLACE throws IndexOutOfBoundsException If text's length is more than previously applied (#2522)

* DRILL-8200: Update Hadoop libs to ≥ 3.2.3 for CVE-2022-26612 (#2525)

* Remove pointless Buffer casts.

Compiling Drill with JDK > 8 will still result in ByteBuffer <-> Buffer cast
exceptions at runtime when running on JDK 8 even though maven.target.version
is set to 8. Setting maven.compiler.release to 8 solves the Buffer casts
but raises a compilation error of package sun.security.jgss does not exist
for JDK 8. There were a few handwritten casts to avoid the Buffer casting
issue but many instances are not covered so the few reverted in this commit
achieve nothing.

* Update Hadoop to 3.2.3.

* [MINOR UPDATE] Update AWS Java SDK to 1.12.211

* DRILL-8219: Handle null catalog names returned by DB2 in storage-jdbc. (#2542)

Co-authored-by: pseudomo <yura_levchenko@mail.ru>
Co-authored-by: pseudomo <pseudomo@yandex.ru>
Co-authored-by: Rymar Maksym <rim.maxim+dev@gmail.com>
Co-authored-by: PJ Fanning <pjfanning@users.noreply.github.com>
Co-authored-by: Volodymyr Vysotskyi <vvovyk@gmail.com>
Co-authored-by: Vitalii Diravka <vitalii@apache.org>
Co-authored-by: Charles S. Givre <cgivre@apache.org>
Co-authored-by: luoc <luoc@apache.org>
Co-authored-by: xurenhe <xurenhe19910131@gmail.com>
jnturton added a commit to jnturton/drill that referenced this pull request Jul 11, 2022
…pache#2491)

* Add a mountpoint and VOLUME for local storage to Dockerfile.

* Address review comments concerning layer ordering.

* Fix image size blowup by moving chmod to intermediate container.

* Combine RUN commands in Dockerfile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants