Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17880. Build 2.10.x with docker #3349

Closed

Conversation

ZhendongBai
Copy link

@ZhendongBai ZhendongBai commented Aug 28, 2021

Description of PR

this pr support build hadoop 2.10.x with docker.

How was this patch tested?

test on mac x86_64

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id? HADOOP-17880
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@goiri
Copy link
Member

goiri commented Aug 30, 2021

@GauthamBanasandra can you take a look at this PR?

@GauthamBanasandra
Copy link
Member

@ZhendongBai could you please mention the HADOOP JIRA in the title of this PR?

@ZhendongBai
Copy link
Author

ZhendongBai commented Aug 31, 2021

@ZhendongBai could you please mention the HADOOP JIRA in the title of this PR?

@GauthamBanasandra thanks a lot for your review, and just now I create a Haooop Jira issue with description, and mention it in PR description. please review again.

@@ -18,234 +17,80 @@
# Dockerfile for installing the necessary dependencies for building Hadoop.
# See BUILDING.txt.

FROM ubuntu:xenial
FROM centos:7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using Ubuntu as the default build environment. May I know why you would want to use Centos 7 instead of Ubuntu Xenial?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GauthamBanasandra for ubuntu, openjdk-7 not found installation candidate, the error shows as bellow:

.....
ERROR [ 8/27] RUN apt-get -q update     && apt-get -q install -y --no-install-recommends openjdk-7-jdk     && apt-get clean     && rm -rf /var/lib/apt/lists/*
#11 94.95 Get:20 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [12.7 kB]
#11 95.07 Fetched 19.4 MB in 1min 34s (205 kB/s)
#11 95.07 Reading package lists...
#11 95.99 Reading package lists...
#11 96.88 Building dependency tree...
#11 97.02 Reading state information...
#11 97.04 Package openjdk-7-jdk is not available, but is referred to by another package.
#11 97.04 This may mean that the package is missing, has been obsoleted, or
#11 97.04 is only available from another source
#11 97.04
#11 97.04 E: Package 'openjdk-7-jdk' has no installation candidate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because Java 7 isn't there in the Ubuntu Xenial toolchain. Have you tried installing the Java 7 by downloading its tarball? Maybe this'll help - https://linuxconfig.org/oracle-java-jdk-7-on-ubuntu-linux-source-or-rpm-installation

I would advise against changing the base image to Centos 7 if this is the only reason. Here are my suggestions -

  1. If you still want to use Centos 7, I would suggest that you create a separate Dockerfile for it (like Dockerfile_centos_7) and leave the original Dockerfile as it is.
  2. Or, just install Java 7 from the tarball in Dockerfile.

Copy link
Author

@ZhendongBai ZhendongBai Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GauthamBanasandra ok, thanks a lot,besides jdk7 not found problem, python pylint is not installed sucessfully,and the python dependencies have problems. So I decide to give up fixing the ubuntu bugs, and choose to create a separate Dockerfile for centos later.

@GauthamBanasandra
Copy link
Member

@ZhendongBai could you please mention the HADOOP JIRA in the title of this PR?

@GauthamBanasandra thanks a lot for your review, and just now I create a Haooop Jira issue with description, and mention it in PR description. please review again.

@ZhendongBai you would need to mention the Hadoop JIRA in the title of the PR, as mentioned in the description.

@ZhendongBai ZhendongBai changed the title build 2.10.x with docker only. HADOOP-17880. build 2.10.x with docker only. Aug 31, 2021
@ZhendongBai
Copy link
Author

@ZhendongBai could you please mention the HADOOP JIRA in the title of this PR?

@GauthamBanasandra thanks a lot for your review, and just now I create a Haooop Jira issue with description, and mention it in PR description. please review again.

@ZhendongBai you would need to mention the Hadoop JIRA in the title of the PR, as mentioned in the description.

@GauthamBanasandra ok, add the jira at the start of PR title.

@ZhendongBai
Copy link
Author

@GauthamBanasandra I update the PR, and add the new Dockerfile named Dockerfile_centos7, and keep the old Dockerfile unchanged, for reasons:

  1. openjdk-7-jdk is not found in ppa:openjdk-r/ppa repo.
  2. shellcheck not found in ppa:jonathonf/ghc-8.0.2 repo.
  3. pylint==1.9.2 cannot be installed successfully,for pip and setuptools and other unkown dependencies compatibilities.

@GauthamBanasandra
Copy link
Member

Ok, sounds good @ZhendongBai . Since there's no CI setup for this branch. Could you please build this branch locally, tee the build command and upload the build logs here?

@ZhendongBai
Copy link
Author

@GauthamBanasandra Ok, I rerun the docker build and mvn commands, after ./start-build-env.sh in project root directory , logs as below: docker_image_build.log. After mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true on docker container, finally, the project was built successfully, logs as below: build.log.
Take a look,please, and thanks a lot.

@GauthamBanasandra
Copy link
Member

@ZhendongBai could you please use the following command to build and upload the build log?

mvn clean package -Dhttps.protocols=TLSv1.2 -DskipTests -Pnative,dist -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.zstd -Drequire.test.libhadoop -Pyarn-ui -Dtar > build.log 2>&1 &

@@ -0,0 +1,96 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename this file to Dockerfile_centos_7? Just so that we're consistent with the filename in trunk.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GauthamBanasandra I rename Dockerfile_centos7 to Dockerfile_centos_7 to keep consistent with the filename in trunk. and mvn clean package -Dhttps.protocols=TLSv1.2 -DskipTests -Pnative,dist -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.zstd -Drequire.test.libhadoop -Pyarn-ui -Dtar -Dmaven.javadoc.skip=true > build.log 2>&1 logs here: build.log, and because some javadocs are illegal, and javadocs check failed, I add -Dmaven.javadoc.skip=true to build command. please review again, thanks.

BUILDING.txt Outdated
as the proposed solution:
https://github.com/boot2docker/boot2docker/issues/64
An alternative solution to this problem is to install Linux native inside a virtual machine
and run your IDE and Docker etc inside that VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to restore these lines as Hadoop 2.10.x is now buildable using Dockerfile on Virtualbox (as of this commit).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GauthamBanasandra but when this PR merged, at some point, we can build hadoop 2.10.0 just with docker directly. And I don't know when the BUILDING.txt should change and how to describe that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should retain the ability to build using Virtualbox and your PR would just add the ability to build using Docker, instead of deprecating the ability to build using Virtualbox. With that said, you wouldn't need to modify the docs here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will revert the file.

start-build-env.sh Outdated Show resolved Hide resolved
dev-support/docker/Dockerfile Outdated Show resolved Hide resolved
start-build-env.sh Outdated Show resolved Hide resolved
BUILDING.txt Outdated
as the proposed solution:
https://github.com/boot2docker/boot2docker/issues/64
An alternative solution to this problem is to install Linux native inside a virtual machine
and run your IDE and Docker etc inside that VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should retain the ability to build using Virtualbox and your PR would just add the ability to build using Docker, instead of deprecating the ability to build using Virtualbox. With that said, you wouldn't need to modify the docs here.

@ZhendongBai
Copy link
Author

@GauthamBanasandra pls review again, thanks a lot.

Copy link
Member

@GauthamBanasandra GauthamBanasandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, one final change that I would like to request from you @ZhendongBai - please change the title to something appropriate. Something like HADOOP-17880. Build 2.10.x with docker in both Github and JIRA. I'll approve this PR after you make this change.

@ZhendongBai ZhendongBai changed the title HADOOP-17880. build 2.10.x with docker only. HADOOP-17880. Build 2.10.x with docker Sep 30, 2021
@ZhendongBai
Copy link
Author

ZhendongBai commented Sep 30, 2021

Looks good to me, one final change that I would like to request from you @ZhendongBai - please change the title to something appropriate. Something like HADOOP-17880. Build 2.10.x with docker in both Github and JIRA. I'll approve this PR after you make this change.

@GauthamBanasandra I already changed the title for PR and JIRA, because this is my first time contribute to Hadoop, cost your time too much, thanks a lot.

@iwasakims
Copy link
Member

@ZhendongBai You should submit the patch based on apache:branch-2.10 instead of apache:branch-2.10.0. branch-2.10.0 is obsolete one for already released 2.10.0.

RUN pkg-resolver/install-zstandard.sh centos:7
RUN pkg-resolver/install-yasm.sh centos:7
RUN pkg-resolver/install-protobuf.sh centos:7
RUN pkg-resolver/install-boost.sh centos:7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need boost for branch-2.10? If we can omit this, building time and footprint of the image can be reduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I upgraded Boost when I wrote this PR #2051. This was on Hadoop 3.x. Since this PR isn't backported to Hadoop 2.x, Boost isn't needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZhendongBai if you make any change to the Dockerfile_centos_7, please do a local build and upload the build log as a comment so that we can verify. We need this step since we don't have a pre-commit CI for Hadoop 2.x branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZhendongBai if you make any change to the Dockerfile_centos_7, please do a local build and upload the build log as a comment so that we can verify. We need this step since we don't have a pre-commit CI for Hadoop 2.x branch.

@iwasakims @GauthamBanasandra thanks,I will try to remove boost and do a local build and upload the build log.

@ZhendongBai
Copy link
Author

@ZhendongBai You should submit the patch based on apache:branch-2.10 instead of apache:branch-2.10.0. branch-2.10.0 is obsolete one for already released 2.10.0.

@iwasakims thanks,I will close this pr, and commit a new pr based on branch-2.10

@ZhendongBai
Copy link
Author

@iwasakims @GauthamBanasandra I commit a new pr base on branch-2.10, #3535, and close this pr, please review again, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants