Make rebuilding the docker image for a new Accumulo snapshot faster (ver 2) by keith-turner · Pull Request #22 · apache/accumulo-docker

keith-turner · 2022-09-19T16:26:29Z

See #20 for background. This PR is a replacement of that PR based on feedback from that PR.

This PR used a single docker file and broke the work into 2 run commands. With this structure in the docker file, when there is a new accumulo snapshot tarball it only rebuilds the accumulo and later layers. Takes 5 seconds. Two docker files as in #20 is a bit faster because it does not have to upload the hadoop and ZK tarballs for the accumulo only build. However that is only a few seconds and the extra complexity would not make that worthwhile.

Setting this as a draft again as I have not updated the README. @brianloss and @dlmarion how do these changes look?

dlmarion · 2022-09-19T17:40:43Z

download.sh

@@ -0,0 +1,25 @@
+set -eux;
+
+ACCUMULO_VERSION="2.0.1"


These versions are wrong for Accumulo 2.1.0-SNAPSHOT

Yeah--looks like in the original Dockerfile, these versions were 2.1.0, 3.3.3, and 3.8.0

dlmarion · 2022-09-19T17:48:56Z

Is the intention for the user to run download.sh, and then build the docker image?

brianloss · 2022-09-19T18:08:02Z

Is the intention for the user to run download.sh, and then build the docker image?

I was just having a side conversation with Keith, and that is the intent. I like that it makes the Dockerfile simpler/easier to read. However, it's also nice to be able to just build and change versions with a single command, which is good for new users or CI/CD. At a minimum, if it stays the way it is, the README file will need to be updated. At the same time Keith was doing that, I had taken a stab at this as well to see if it was going to work the way I thought. Here is what I have, for reference or if there's a preference to keep the download inside of the Dockerfile.

dlmarion · 2022-09-19T18:11:59Z

I was curious what would happen in this version of the PR if you changed the Hadoop or ZooKeeper versions in download.sh. Does that layer get rebuilt if download.sh is run again with different versions?

brianloss · 2022-09-19T18:14:56Z

That's a good catch. I think, since the downloaded files have no version info in them, the build args wouldn't change. Therefore, there'd really be nothing that changed and the cached layer would be used. That's my guess, but I suppose if docker is smart enough to say that the build context changed (since there are different files going in), then it's possible the cache would be invalidated. You'd have to test to see, but my guess is this will use the previously cached layer and not the new version of the downloaded file.

keith-turner · 2022-09-19T18:15:11Z

I was curious what would happen in this version of the PR if you changed the Hadoop or ZooKeeper versions in download.sh. Does that layer get rebuilt if download.sh is run again with different versions?

If the file is different it seems to rebuiild. I ran maven to rebuild accumulo and copied it to the docker dir a few times. Each time it would rebuild even though the file name was the same.

if there's a preference to keep the download inside of the Dockerfile.

I don't have strong feeling about moving the docker file in or out. The main thing I wanted to do was speed up the build for a new accumulo snapshot and thats done w/ the two run commands.

brianloss · 2022-09-19T18:16:17Z

If the file is different it seems to rebuiild. I ran maven to rebuild accumulo and copied it to the docker dir a few times. Each time it would rebuild even though the file name was the same.

Ah, good to know. Happy to be wrong about that. :)

keith-turner · 2022-09-19T18:18:21Z

Here is what I have, for reference or if there's a preference to keep the download inside of the Dockerfile.

I would be happy to abandon this PR in favor of those changes if you have preference for keeping DL in the docker build file.

brianloss · 2022-09-19T18:22:00Z

I would be happy to abandon this PR in favor of those changes if you have preference for keeping DL in the docker build file.

I don't have a strong opinion on it--there are tradeoffs either way.

dlmarion · 2022-09-19T18:22:22Z

There may be value in breaking out the Hadoop and ZK into separate layers too. I could see the following layers:

RockyLinux with dependencies and download script
Hadoop
ZooKeeper
Accumulo

brianloss · 2022-09-19T18:34:12Z

Yeah, you could experiment and see what the impact is on image size. The best practice is to minimize the number of layers, but there are definitely times when it makes sense to have more. It's likely that someone would want to iterate with multiple Accumulo builds in a cycle. I'm less sure about having separate layers for the other pieces, though I suppose it would speed up a CI/CD build that was building a matrix with different Accumulo, Hadoop, and Zookeeper versions. If there's a small space penalty, then it could be worth having the flexibility.

dlmarion · 2022-09-19T18:37:58Z

This is what I was thinking when I originally stated multi-stage builds. Like you said though, likely not needed. My branch is not complete.

keith-turner · 2022-09-19T19:50:52Z

I don't have a strong opinion on it--there are tradeoffs either way.

I have decided I like your version w/ download in docker because like you said its good for new users and it also satisfies what I was trying to achieve. I'll close this and you can submit a PR w/ your branch.

dlmarion · 2022-09-19T20:10:52Z

FWIW, I got my previous example working, then I modified it to be a multi-stage. The size of the image goes from 1.81GB to 1.6GB.

brianloss · 2022-09-19T20:24:18Z

FWIW, I got my previous example working, then I modified it to be a multi-stage. The size of the image goes from 1.81GB to 1.6GB.

@dlmarion I had originally tried to do each package in its own layer, but that wouldn't work with straight layers since invalidating a layer invalidates everything on top of it. However, multi-stage builds do work around that problem, so that's what I did. I also only installed the full JDK (which on Rocky Linux was pulling in X11 and other large deps) on the Accumulo build layer and then the java-11-openjdk-headless package (which only includes the JRE) in the main image. This did greatly reduce the size of the image. On my machine it went from 2.4 GB to 1.4 GB. This is all in pr #23.

keith-turner · 2022-09-19T21:07:02Z

On my machine it went from 2.4 GB to 1.4 GB.

For the image I built using this PR, it came in at 1.7GB. 1.4G is nice

keith-turner added 2 commits September 19, 2022 17:10

WIP Makes rebuilding for new Accumulo snapshot faster

4d5d9e5

WIP add command

1b3af78

keith-turner mentioned this pull request Sep 19, 2022

Make rebuilding the docker image for a new Accumulo snapshot faster. #20

Closed

dlmarion reviewed Sep 19, 2022

View reviewed changes

keith-turner closed this Sep 19, 2022

keith-turner mentioned this pull request Sep 19, 2022

Refactor Docker image to allow faster rebuilds #23

Merged

Conversation

keith-turner commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlmarion Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

brianloss Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

dlmarion commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brianloss commented Sep 19, 2022

Uh oh!

dlmarion commented Sep 19, 2022

Uh oh!

brianloss commented Sep 19, 2022

Uh oh!

keith-turner commented Sep 19, 2022

Uh oh!

brianloss commented Sep 19, 2022

Uh oh!

keith-turner commented Sep 19, 2022

Uh oh!

brianloss commented Sep 19, 2022

Uh oh!

dlmarion commented Sep 19, 2022

Uh oh!

brianloss commented Sep 19, 2022

Uh oh!

dlmarion commented Sep 19, 2022

Uh oh!

keith-turner commented Sep 19, 2022

Uh oh!

dlmarion commented Sep 19, 2022

Uh oh!

brianloss commented Sep 19, 2022

Uh oh!

keith-turner commented Sep 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keith-turner commented Sep 19, 2022 •

edited

Loading

dlmarion commented Sep 19, 2022 •

edited

Loading