New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-4057 Docker Image is twice as large as necessary #1910
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some thoughts for consideration.
nifi-docker/dockerhub/Dockerfile
Outdated
RUN curl -fSL $NIFI_BINARY_URL -o $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz \ | ||
&& echo "$(curl $NIFI_BINARY_URL.sha256) *$NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz" | sha256sum -c - \ | ||
RUN curl -fSL $MIRROR/$NIFI_BINARY_URL -o $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz \ | ||
&& echo "$(curl https://archive.apache.org/dist/$NIFI_BINARY_URL.sha256) *$NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz" | sha256sum -c - \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this inner sha256 curl also use $MIRROR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mirrors don't include the sha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ha! OK then!
nifi-docker/dockerhub/Dockerfile
Outdated
RUN groupadd -g $GID nifi || groupmod -n nifi `getent group $GID | cut -d: -f1` | ||
RUN useradd --shell /bin/bash -u $UID -g $GID -m nifi | ||
RUN mkdir -p $NIFI_HOME | ||
RUN groupadd nifi && useradd nifi --shell /bin/bash -u $UID -m -g nifi \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your groupadd command leaves off the "-g $GID" from the previous commit. Since $GID is being specified and passed in from DockerBuild.sh, it's probably best that we continue to create the group with the specified $GID. If it's a valid configuration option, it needs to be used as the group id here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of the JIRA comments, the GID could be removed since useradd makes the group. And GID 50 already existed, anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just saying, if we're going to remove $GID, we should remove it entirely. But if we're going to support it (i.e. by having an option in the Dockerbuild.sh), then we should support it entirely. Line 31 currently ignores $GID.
nifi-docker/dockerhub/Dockerfile
Outdated
EXPOSE 8181 | ||
|
||
USER nifi | ||
VOLUME conf/templates | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are lots of files in conf/ that would likely be good candidates for VOLUME export (flow.xml.gz, nifi.properties, bootstrap.conf, etc.). Maybe this should just be: "VOLUME conf"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can that be an extended discussion elsewhere? While I might agree, at least for nifi.properties, the primary goal of this PR was to cut down the image size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me. No problem separating that as a separate concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed it from the latest commit. One can always mount the folder if needed.
Changes to this Dockerfile should also be reciprocally made to ../dockermaven/Dockerfile |
* Merging unnecessary layers * MAINTAINER is deprecated * Using JRE as base since JDK is not necessary * Set GID=1000 since openjdk image already defines 50 * Add ability to specify Apache mirror site to reduce load on Apache Archive * Created templates directory since this is not included in the binary
@taftster I updated the Maven version as well, and I believe it built successfully after updating the Was there anything else needing to be addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks real good. I noticed two very small things (see my comments). Not sure if you care to want to fix them or not.
RUN mkdir -p $NIFI_HOME | ||
RUN groupadd -g $GID nifi || groupmod -n nifi `getent group $GID | cut -d: -f1` \ | ||
&& useradd --shell /bin/bash -u $UID -g $GID -m nifi \ | ||
&& mkdir -p $NIFI_HOME/conf/templates \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 'mkdir' leftover from the previous pull request? Specifically, didn't we say we'd address the exposed VOLUME directories in another PR? It's not hurting anything, of course, but I don't think it was in the original Dockerfile either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conf/templates
directory is not part of the binary, so as a result, I don't think you can save templates as part of running the Docker container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, wow. Yeah, that would be a problem. Good catch! Definitely need this then.
&& echo "$(curl https://archive.apache.org/dist/$NIFI_BINARY_URL.sha256) *$NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz" | sha256sum -c - \ | ||
&& tar -xvzf $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz -C $NIFI_BASE_DIR \ | ||
&& rm $NIFI_BASE_DIR/nifi-$NIFI_VERSION-bin.tar.gz \ | ||
&& chown -R nifi:nifi $NIFI_HOME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chown here shouldn't be necessary? Since NIFI_BASE_DIR has already been chowned to the NIFI user, which we're using for the curl statement. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be necessary, no. Doesn't seem to be hurting anything, though.
I think I did it because I didn't trust the previous chown -R
to hold after the curl
, but since using USER
beforehand, that does make it redundant.
reviewing |
RUN mkdir -p $NIFI_HOME | ||
RUN groupadd -g $GID nifi || groupmod -n nifi `getent group $GID | cut -d: -f1` \ | ||
&& useradd --shell /bin/bash -u $UID -g $GID -m nifi \ | ||
&& mkdir -p $NIFI_HOME/conf/templates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a \
Everything else looks pretty good and just verifying successful build. If so, I can adjust this on merge.
&& mkdir -p $NIFI_HOME/conf/templates | ||
&& chown -R nifi:nifi $NIFI_BASE_DIR | ||
|
||
USER nifi | ||
|
||
ADD $NIFI_BINARY $NIFI_BASE_DIR | ||
RUN chown -R nifi:nifi $NIFI_HOME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the chown was an issue after user. Moving USER below the chown seems to work appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this causes the duplicate layer issue again. Bit of a different environment as we are not able to add & chmod the files in the same sequence given the nature of the ADD command. May just have to be a concession we make for the local environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to merge this in as I don't believe we have a better way to work around this for the local case and presumably there is not another way without getting overly complex.
Is there anyway to re-publish the 1.2.0 and 1.3.0 images so that those aren't as large? |
@Cricket007 we cannot given ASF guidelines unless we do another release. If there is a patch release that is created, we can certainly feed this in as part of that effort. |
@apiri I understand that you cannot push code changes for a release < 1.4.0, but this isn't changing source code, only pushing a different Dockerfile to DockerHub. |
@Cricket007 The Dockerfiles are part of the entire NiFi codebase. There has been some discussion about breaking the Docker components into their own repository which would free us from this situation and allow us to release more frequently/independently. However, in the current state, another release would be needed as the tagged release triggers Docker Hub to build. |
I understand the release tagging and automated build hooks of DockerHub, but I don't see why whoever has permission to push to DockerHub cannot manually overwrite the current "bloated" images. |
Those images are created automatically from release tags (e.g. rel/nifi-1.2.0) on the source code repository. Those tags are protected and immutable on the repository; once they are created they cannot be overwritten whatsoever. It is this standard that prevents us from replacing the images. As a project community, we have no permissions to directly interact with Apache Docker Hub account which is managed by the Infra team. As mentioned, if a patch version of either those comes out, we can port these changes to that release branch as well. If we were to break docker components out into their own repo, we could overwrite those tags on Docker Hub. Unfortunately, until the source repo of the Dockerfile changes, those versions in Docker Hub are locked to the associated commits that generated them. |
When can we get a 1.3.1 with this fix, happy to help if that is needed but the current image is terribly large. |
Resulting Docker image is now 1.29 GB instead of 2.58 GB
Detailed changes
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.