Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-4057 Docker Image is twice as large as necessary #1910

Closed
wants to merge 1 commit into from
Closed

NIFI-4057 Docker Image is twice as large as necessary #1910

wants to merge 1 commit into from

Conversation

OneCricketeer
Copy link

@OneCricketeer OneCricketeer commented Jun 11, 2017

Resulting Docker image is now 1.29 GB instead of 2.58 GB

Detailed changes

  • Merging unnecessary layers
  • MAINTAINER is deprecated
  • Using JRE as base since JDK is not necessary
  • Set GID=1000 since openjdk image already defines 50
  • Add ability to specify Apache mirror site to reduce load on Apache Archive
  • Created templates directory since this is not included in the binary

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

Copy link

@taftster taftster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some thoughts for consideration.

RUN curl -fSL $NIFI_BINARY_URL -o $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz \
&& echo "$(curl $NIFI_BINARY_URL.sha256) *$NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz" | sha256sum -c - \
RUN curl -fSL $MIRROR/$NIFI_BINARY_URL -o $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz \
&& echo "$(curl https://archive.apache.org/dist/$NIFI_BINARY_URL.sha256) *$NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz" | sha256sum -c - \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this inner sha256 curl also use $MIRROR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mirrors don't include the sha

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha! OK then!

RUN groupadd -g $GID nifi || groupmod -n nifi `getent group $GID | cut -d: -f1`
RUN useradd --shell /bin/bash -u $UID -g $GID -m nifi
RUN mkdir -p $NIFI_HOME
RUN groupadd nifi && useradd nifi --shell /bin/bash -u $UID -m -g nifi \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your groupadd command leaves off the "-g $GID" from the previous commit. Since $GID is being specified and passed in from DockerBuild.sh, it's probably best that we continue to create the group with the specified $GID. If it's a valid configuration option, it needs to be used as the group id here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of the JIRA comments, the GID could be removed since useradd makes the group. And GID 50 already existed, anyway

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just saying, if we're going to remove $GID, we should remove it entirely. But if we're going to support it (i.e. by having an option in the Dockerbuild.sh), then we should support it entirely. Line 31 currently ignores $GID.

EXPOSE 8181

USER nifi
VOLUME conf/templates

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of files in conf/ that would likely be good candidates for VOLUME export (flow.xml.gz, nifi.properties, bootstrap.conf, etc.). Maybe this should just be: "VOLUME conf"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that be an extended discussion elsewhere? While I might agree, at least for nifi.properties, the primary goal of this PR was to cut down the image size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. No problem separating that as a separate concern.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it from the latest commit. One can always mount the folder if needed.

@taftster
Copy link

taftster commented Jun 11, 2017

Changes to this Dockerfile should also be reciprocally made to ../dockermaven/Dockerfile

  * Merging unnecessary layers
  * MAINTAINER is deprecated
  * Using JRE as base since JDK is not necessary
  * Set GID=1000 since openjdk image already defines 50
  * Add ability to specify Apache mirror site to reduce load on Apache Archive
  * Created templates directory since this is not included in the binary
@OneCricketeer
Copy link
Author

@taftster I updated the Maven version as well, and I believe it built successfully after updating the pom.xml version.

Was there anything else needing to be addressed?

Copy link

@taftster taftster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks real good. I noticed two very small things (see my comments). Not sure if you care to want to fix them or not.

RUN mkdir -p $NIFI_HOME
RUN groupadd -g $GID nifi || groupmod -n nifi `getent group $GID | cut -d: -f1` \
&& useradd --shell /bin/bash -u $UID -g $GID -m nifi \
&& mkdir -p $NIFI_HOME/conf/templates \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 'mkdir' leftover from the previous pull request? Specifically, didn't we say we'd address the exposed VOLUME directories in another PR? It's not hurting anything, of course, but I don't think it was in the original Dockerfile either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conf/templates directory is not part of the binary, so as a result, I don't think you can save templates as part of running the Docker container.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, wow. Yeah, that would be a problem. Good catch! Definitely need this then.

&& echo "$(curl https://archive.apache.org/dist/$NIFI_BINARY_URL.sha256) *$NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz" | sha256sum -c - \
&& tar -xvzf $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz -C $NIFI_BASE_DIR \
&& rm $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz \
&& chown -R nifi:nifi $NIFI_HOME

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chown here shouldn't be necessary? Since NIFI_BASE_DIR has already been chowned to the NIFI user, which we're using for the curl statement. Is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be necessary, no. Doesn't seem to be hurting anything, though.
I think I did it because I didn't trust the previous chown -R to hold after the curl, but since using USER beforehand, that does make it redundant.

@apiri
Copy link
Member

apiri commented Jul 17, 2017

reviewing

RUN mkdir -p $NIFI_HOME
RUN groupadd -g $GID nifi || groupmod -n nifi `getent group $GID | cut -d: -f1` \
&& useradd --shell /bin/bash -u $UID -g $GID -m nifi \
&& mkdir -p $NIFI_HOME/conf/templates
Copy link
Member

@apiri apiri Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a \

Everything else looks pretty good and just verifying successful build. If so, I can adjust this on merge.

&& mkdir -p $NIFI_HOME/conf/templates
&& chown -R nifi:nifi $NIFI_BASE_DIR

USER nifi

ADD $NIFI_BINARY $NIFI_BASE_DIR
RUN chown -R nifi:nifi $NIFI_HOME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the chown was an issue after user. Moving USER below the chown seems to work appropriately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this causes the duplicate layer issue again. Bit of a different environment as we are not able to add & chmod the files in the same sequence given the nature of the ADD command. May just have to be a concession we make for the local environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge this in as I don't believe we have a better way to work around this for the local case and presumably there is not another way without getting overly complex.

@asfgit asfgit closed this in 3da8b94 Jul 17, 2017
@OneCricketeer
Copy link
Author

Is there anyway to re-publish the 1.2.0 and 1.3.0 images so that those aren't as large?

@apiri
Copy link
Member

apiri commented Jul 18, 2017

@Cricket007 we cannot given ASF guidelines unless we do another release. If there is a patch release that is created, we can certainly feed this in as part of that effort.

@OneCricketeer
Copy link
Author

@apiri I understand that you cannot push code changes for a release < 1.4.0, but this isn't changing source code, only pushing a different Dockerfile to DockerHub.

@apiri
Copy link
Member

apiri commented Jul 18, 2017

@Cricket007 The Dockerfiles are part of the entire NiFi codebase. There has been some discussion about breaking the Docker components into their own repository which would free us from this situation and allow us to release more frequently/independently. However, in the current state, another release would be needed as the tagged release triggers Docker Hub to build.

@OneCricketeer
Copy link
Author

I understand the release tagging and automated build hooks of DockerHub, but I don't see why whoever has permission to push to DockerHub cannot manually overwrite the current "bloated" images.

@apiri
Copy link
Member

apiri commented Jul 18, 2017

Those images are created automatically from release tags (e.g. rel/nifi-1.2.0) on the source code repository. Those tags are protected and immutable on the repository; once they are created they cannot be overwritten whatsoever. It is this standard that prevents us from replacing the images. As a project community, we have no permissions to directly interact with Apache Docker Hub account which is managed by the Infra team.

As mentioned, if a patch version of either those comes out, we can port these changes to that release branch as well. If we were to break docker components out into their own repo, we could overwrite those tags on Docker Hub. Unfortunately, until the source repo of the Dockerfile changes, those versions in Docker Hub are locked to the associated commits that generated them.

@jimzucker
Copy link

When can we get a 1.3.1 with this fix, happy to help if that is needed but the current image is terribly large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants