Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize docker images #1744

Closed
dimabarbul opened this issue Sep 10, 2023 · 1 comment · Fixed by #1745
Closed

Optimize docker images #1744

dimabarbul opened this issue Sep 10, 2023 · 1 comment · Fixed by #1745
Assignees
Milestone

Comments

@dimabarbul
Copy link
Contributor

Following are suggestions regarding dockerfiles used to build Ditto images.

Suggestions

Use JRE instead of JDK

I'm not Java developer, so might miss something basic here, but isn't JRE enough to run Ditto? I tried building images locally using eclipse-temurin:17-jre and everything worked fine. I was able even to attach via debugger (by adding some Java parameters, of course), so it should be good for developer version (file dockerfile-snapshot) as well.

JRE image is somewhat less than JDK, so it should save some traffic and space for everyone.

Use curl instead of wget

In dockerfile-release wget is installed and used. curl is built-in and provides needed functionality.

Clean up after apt-get update

apt-get update downloads some information and, if not removed, it's included into final image. According to the internet, adding rm -rf /var/lib/apt/lists/* would be enough.

Reuse build cache

Split steps into reusable parts

RUN steps produce new layers. If command text matches cached layer (and if they both base on the same layer), the layer will be reused, which saves time for building and (I guess) traffic when downloading images that use the same layer.

Splitting RUN steps into static and dynamic parts will produce more layers, but layers from static commands will be resused.

Move ARG and ENV right before where they are used

ARG and ENV implicitly invalidate cache when they change, even when they are not used. For example, running following command will build an image:

docker build - --build-arg VERSION=1 <<DF
FROM docker.io/eclipse-temurin:17-jre
ARG VERSION
RUN apt-get update
DF

Running the same command with same VERSION once again will reuse build cache. But if you change VERSION value, the cached layer will not be reused despite VERSION argument has not been used in the command.

Results

nightly tag is the Ditto nightly image, 0-SNAPSHOT is the image I've built locally with updated dockerfile-release:

DOCKERFILE=dockerfile-release SERVICE_VERSION=0-RELEASE ./build-images.sh
REPOSITORY TAG SIZE
eclipse/ditto-connectivity 0-SNAPSHOT 369MB
eclipse/ditto-connectivity nightly 550MB
eclipse/ditto-gateway 0-SNAPSHOT 341MB
eclipse/ditto-gateway nightly 522MB
eclipse/ditto-policies 0-SNAPSHOT 338MB
eclipse/ditto-policies nightly 519MB
eclipse/ditto-things 0-SNAPSHOT 340MB
eclipse/ditto-things nightly 521MB
eclipse/ditto-things-search 0-SNAPSHOT 340MB
eclipse/ditto-things-search nightly 521MB
eclipse/ditto-ui 0-SNAPSHOT 43.6MB
eclipse/ditto-ui nightly 43.6MB

Drawbacks

Cache

Sometimes it makes sense to build images without cache - to get fresh version of tini, for instance.

If you want clean build, i.e., without using cached layers, you have several options:

  • add --no-cache option to docker build
  • clean cached layers using docker builder prune -f
    warning: the command does not remove layers for existing images, so you'll need to remove all images that use the layer and then run the command

It might be good to introduce something like NO_DOCKER_CACHE variable for build-images.sh script that will pass "--no-cache" option if set.

In pipeline script it looks like buildx is installed right before building images, so cache should not be an issue here. Moreover, the build will build static RUN steps (like, apt-get update && apt-get install tini) once and then reuse them for next images which, to me, seems better then building without cache entirely and run all static RUN steps for each image.

Links

Best practices for writing Dockerfiles
Optimizing builds with cache management
Layers

@thjaeckle
Copy link
Member

Hi @dimabarbul
I don't really agree to swap JDK out for JRE is a good idea.

Yes, Ditto will also run with a JRE.
However, a JRE does not come with many helpful tools for eg creating heapdumps or inspecting the JVM parameters during runtime by executing into the container and eg invoking jcmd.

See also https://www.baeldung.com/running-jvm-diagnose

I would not want to limit these possibilities to just save some MB when pulling the first of the 6 Ditto images.

The other changes you suggest sound good 👍

@thjaeckle thjaeckle added this to the 3.4.0 milestone Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants