Skip to content

Refactor Docker image to allow faster rebuilds#23

Merged
brianloss merged 4 commits intoapache:next-releasefrom
brianloss:split-layers
Sep 19, 2022
Merged

Refactor Docker image to allow faster rebuilds#23
brianloss merged 4 commits intoapache:next-releasefrom
brianloss:split-layers

Conversation

@brianloss
Copy link
Copy Markdown
Member

Convert the existing Docker image into a multi-stage image with
independent layers for the base, Hadoop, Zookeeper, and Accumulo tarball
download/extraction (and native library build in the Accumulo case).
By having each install come from a separate base image, we can modify
the file/version for any of the packages and reuse the build cached for
the others, which greatly improves build times for a developer who is
iterating on Accumulo, for example. Also, by using a separate builder
base, the larger JDK and make tools were installed to build the Accumulo
native libraries, but then those tools are not included in the final
image.

* Update the docker build setup to separate the Accumulo installation
  from the Hadoop/Zookeeper installation. This allows for repeated
  builds with just the Accumulo layer changing to be much faster.
  - Move build arg declarations to just before they are used, since
    changing a build arg will invalidate all layers after it in the
    Dockerfile.
* Replace empty "_FILE" build args with a default of _NOT_SET and check
  for that default when deciding whether or not to download the tarball
  from Apache servers. A "COPY" command with an empty build arg ends up
  copying the entire build context into the image, which is not what we
  want.
Copy link
Copy Markdown
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good. I like how jdk-headless ends up being in the final image and jdk-devel is only used for building the intermediate image and discarded.

* Change glob patterns for extracted archive copy so that it does not
  depend on the "*_VERSION" variables. Now one can change the included
  file without having to update the corresponding VERSION variable.
* Fix a potential issue where if the "_FILE" arg is specified but named
  a non-existent file, the download script would silently try to
  download whatever version was specific in the "_VERSION" variable.
  This could lead to an unintended version getting included in the final
  image. Instead, the download script now fails if a "_FILE" build arg
  is set, but the corresponding file does not exist.
@keith-turner
Copy link
Copy Markdown
Contributor

There is a line in the readme that can be changed now. Can omit the version in the following line from the readme.

docker build --build-arg ACCUMULO_VERSION=2.0.0-SNAPSHOT --build-arg ACCUMULO_FILE=accumulo-2.0.0-SNAPSHOT-bin.tar.gz -t accumulo .

@brianloss
Copy link
Copy Markdown
Member Author

There is a line in the readme that can be changed now. Can omit the version in the following line from the readme.

docker build --build-arg ACCUMULO_VERSION=2.0.0-SNAPSHOT --build-arg ACCUMULO_FILE=accumulo-2.0.0-SNAPSHOT-bin.tar.gz -t accumulo .

Good catch. Will change...

@brianloss brianloss merged commit 7cf9ca9 into apache:next-release Sep 19, 2022
@brianloss brianloss deleted the split-layers branch September 19, 2022 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants