Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8719 BEAM-8768 BEAM-8769 BEAM-8770 BEAM-8771] Update minor hadoop dependencies #13230

Merged
merged 1 commit into from
Nov 9, 2020

Conversation

pjotrekk
Copy link

@pjotrekk pjotrekk commented Oct 30, 2020

This PR:

  • updates Hadoop dependencies from 2.8.5 to 2.10.1
  • adds tests for Hadoop 2.8.5 compatibility in modules that have hadoop dependencies as 'provided'

For the tests I've taken the approach that is already used in KafkaIO.

I tried to add tests for Hadoop 2.7.7 but without success. I haven't verified if the problem comes from tests configuration or the IO itself but since this PR is to assure the compatibility with the previous version used in Beam (2.8.5) I'd like not to touch it. It could be worth to file a Jira to add tests for 2.7.7 version.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@pjotrekk pjotrekk changed the title [BEAM-8719 BEAM-8768 BEAM-8769 BEAM-8770 BEAM-8771] Update minor hadoop dependencies [WIP] [BEAM-8719 BEAM-8768 BEAM-8769 BEAM-8770 BEAM-8771] Update minor hadoop dependencies Oct 30, 2020
@pjotrekk pjotrekk marked this pull request as draft October 30, 2020 17:54
@pjotrekk pjotrekk force-pushed the hadoop-update branch 5 times, most recently from 240484f to 41300df Compare November 2, 2020 17:59
@pjotrekk
Copy link
Author

pjotrekk commented Nov 2, 2020

Run Java PostCommit

@pjotrekk
Copy link
Author

pjotrekk commented Nov 3, 2020

Run SQL PostCommit

@pjotrekk
Copy link
Author

pjotrekk commented Nov 3, 2020

Run Spark ValidatesRunner

@pjotrekk
Copy link
Author

pjotrekk commented Nov 3, 2020

Run Java HadoopFormatIO Performance Test

@pjotrekk pjotrekk marked this pull request as ready for review November 3, 2020 10:18
@pjotrekk
Copy link
Author

pjotrekk commented Nov 3, 2020

@iemejia @timrobertson100 @aromanenko-dev Is this approach of testing the previus version of Hadoop acceptable? Could I ask you all to take a look at this PR?

outputs.upToDateWhen { false }
include '**/*Test.class'
include '**/HadoopFormatIOElasticIT.class'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without looking in detail, this looks like it might be worth checking. Seems to set up for Cassandra, but runs ES tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, it makes no sense to run HadoopFormatIOElasticIT here. This test has a default configuration that can be run without setting up anything but it indeed is inconsistent with the whole idea of hadoopVersion285Test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cassandra is set for HadoopFormatIOCassandraTest

@timrobertson100
Copy link
Contributor

@iemejia @timrobertson100 @aromanenko-dev Is this approach of testing the previous version of Hadoop acceptable?

I think this makes sense to add. Thanks @piotr-szuberski

In our case, the Hadoop version is so tightly coupled with the Spark, Hive etc versions (managed Cloudera cluster) that it's probably insufficient to test things in isolation - it's a solid addition regardless.

@pjotrekk
Copy link
Author

pjotrekk commented Nov 3, 2020

In our case, the Hadoop version is so tightly coupled with the Spark, Hive etc versions (managed Cloudera cluster) that it's probably insufficient to test things in isolation - it's a solid addition regardless.

Yeah, I guess. I added just the precommit test suites.
It would be worth to add some integration/performance tests - for HadoopFormatIOIT and HadoopFormatIOCassandraIT it should be doable with testcontainers. I don't know how about Hcatalog, Hive etc - I have no prior knowledge about it.

@pjotrekk
Copy link
Author

pjotrekk commented Nov 3, 2020

By the way, could we use the same approach to test KafkaIO against different versions? It's already done for 2.1.0 - does anything stop us from doing the same thing there?

@pjotrekk pjotrekk changed the title [WIP] [BEAM-8719 BEAM-8768 BEAM-8769 BEAM-8770 BEAM-8771] Update minor hadoop dependencies [BEAM-8719 BEAM-8768 BEAM-8769 BEAM-8770 BEAM-8771] Update minor hadoop dependencies Nov 3, 2020
@aromanenko-dev
Copy link
Contributor

@piotr-szuberski I think it's quite reasonable to test different gradle tasks to test with different Kafka client versions. As additional bonus, it would be great to run ITs against different versions of Kafka docker images as well.

@pjotrekk
Copy link
Author

pjotrekk commented Nov 6, 2020

@piotr-szuberski I think it's quite reasonable to test different gradle tasks to test with different Kafka client versions. As additional bonus, it would be great to run ITs against different versions of Kafka docker images as well.

Okay, I'll try to do it next week!

@pjotrekk
Copy link
Author

pjotrekk commented Nov 6, 2020

@timrobertson100 Should I change something in this PR? I think the IT tests could be done in a separate PR - WDYT?

@timrobertson100
Copy link
Contributor

Should I change something in this PR? I think the IT tests could be done in a separate PR - WDYT?

I agree. Other than squashing into a single commit I don't think you need to change anything here

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

Run Java PostCommit

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

Run SQL PostCommit

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

Run Java HadoopFormatIO Performance Test

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

Run Spark ValidatesRunner

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

Run Java PreCommit

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

@timrobertson100 Ok, it's squashed now.

@timrobertson100
Copy link
Contributor

Thanks @piotr-szuberski

Waiting for green, then it can be merged

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

@timrobertson100 green :)

@timrobertson100 timrobertson100 merged commit 054aa7d into apache:master Nov 9, 2020
@timrobertson100
Copy link
Contributor

Thanks @piotr-szuberski !

@pjotrekk
Copy link
Author

pjotrekk commented Nov 9, 2020

Thanks @piotr-szuberski !

No problem! I'll soon publish IT tests on Java postcommit and tests with Hadoop 3.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants