Skip to content

Make some additions to IT suite to make Hadoop related testing more understandable#10667

Merged
capistrant merged 4 commits intoapache:masterfrom
capistrant:hadoop-integration-test-quality-of-life
Dec 28, 2020
Merged

Make some additions to IT suite to make Hadoop related testing more understandable#10667
capistrant merged 4 commits intoapache:masterfrom
capistrant:hadoop-integration-test-quality-of-life

Conversation

@capistrant
Copy link
Contributor

@capistrant capistrant commented Dec 10, 2020

Description

I have been working on some code the past week that involves running the Hadoop Integration Tests. I found the README revolving around the IT suite somewhat confusing when it comes to running the Hadoop Tasks. When to build the image, when to run the image, etc. all seemed a little bit more confusing than it needed to be. Because of that experience, I made some small tweaks that I think will help devs run integration tests using the local hadoop container.

I created a new argument to pass to maven commands that is dedicated to building the Hadoop Container. I added some context in a few different places that helps guide devs towards how to run the tests in different situations (run tests when containers already running? run tests when images built but containers not running? run tests when you need to build and run containers, etc.)

I hope this is found useful by others. Ready and willing to make modifications based on all of your comments in review!


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.

Key changed/added classes in this PR
  • integration-tests/pom.xml
  • integration-tests/script/docker_build_containers.sh
  • integration-tests/README.md

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@capistrant Have you run these tests? (Or do they run in Travis CI automatically, which would be even better?)

If these changes work properly, then they look good to me. The additional docs are helpful, too.

@capistrant
Copy link
Contributor Author

@capistrant Have you run these tests? (Or do they run in Travis CI automatically, which would be even better?)

If these changes work properly, then they look good to me. The additional docs are helpful, too.

None of the hadoop related integration test groups are a part of CI. Why that is, I am not totally sure.

I have flexed the new argument for mvn locally though and run the ITHadoopIndex test when using it. Seems to work. But thats really the only material change and since no hadoop stuff is involved with TravisCI at this time, CI shouldn't be impacted at all.

@gianm
Copy link
Contributor

gianm commented Dec 10, 2020

None of the hadoop related integration test groups are a part of CI. Why that is, I am not totally sure.

Could be for runtime, perhaps?

At any rate, I'm less concerned with breaking CI, and more concerned with having the tests not work. Certain parties (like you, and other folks) do run them offline and I wouldn't want to mess up anyone's workflow. But since you ran them, that sounds good to me.

@maytasm
Copy link
Contributor

maytasm commented Dec 10, 2020

@capistrant Have you run these tests? (Or do they run in Travis CI automatically, which would be even better?)
If these changes work properly, then they look good to me. The additional docs are helpful, too.

None of the hadoop related integration test groups are a part of CI. Why that is, I am not totally sure.

I have flexed the new argument for mvn locally though and run the ITHadoopIndex test when using it. Seems to work. But thats really the only material change and since no hadoop stuff is involved with TravisCI at this time, CI shouldn't be impacted at all.

Hadoop IT requires a lot of memory. Locally, I give Docker around 11GB of memory. I believe anything less than 8-9GB will cause the cluster to fail when you also run hadoop container.

@zhangyue19921010
Copy link
Contributor

Hi @capistrant, Recently I also encountered IT-related problems. And this PR made my job a lot easier! Thanks for this contribution.

@capistrant capistrant marked this pull request as draft December 11, 2020 21:17
@capistrant
Copy link
Contributor Author

Converted this to draft temporarily as I think I actually identified an unrelated issue in the ITHadoopIndexTest that is unrelated to my PR, but can be fixed here regardless. #10219 introduced a change in the output of SegmentMetadataQuery, and the expected output for one of the tests did not get updated to reflect (https://github.com/apache/druid/blob/master/integration-tests/src/test/resources/hadoop/batch_hadoop_queries.json#L3) .... running locally to verify. before pushing commit and un-drafting this PR. missed it cuz I had ignored a few tests to speed up my local build times 🤦

@capistrant capistrant marked this pull request as ready for review December 11, 2020 23:28
@capistrant
Copy link
Contributor Author

ok, two small changes since your review, @gianm .. now the tests actually run for real this time, ha.

The wait for task to complete only waited 5 min, but my dev machine has never been able to finish it in that time. I have a fairly powerful dev machine too, so I think that 10 min is just as good as we are gonna be able to do for running with hadoop docker.

I also had to update the expected query results for one of the tests because of this change. #10219

@capistrant capistrant merged commit 26b911a into apache:master Dec 28, 2020
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
…nderstandable (apache#10667)

* Make some additions to IT suite to make Hadoop related testing more understandable

* add start.hadoop.docker to mvn arg tips in doc

* fix issues preventing ITIndexHadoopTest from running in local mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants