Make some additions to IT suite to make Hadoop related testing more understandable#10667
Conversation
gianm
left a comment
There was a problem hiding this comment.
@capistrant Have you run these tests? (Or do they run in Travis CI automatically, which would be even better?)
If these changes work properly, then they look good to me. The additional docs are helpful, too.
None of the hadoop related integration test groups are a part of CI. Why that is, I am not totally sure. I have flexed the new argument for mvn locally though and run the ITHadoopIndex test when using it. Seems to work. But thats really the only material change and since no hadoop stuff is involved with TravisCI at this time, CI shouldn't be impacted at all. |
Could be for runtime, perhaps? At any rate, I'm less concerned with breaking CI, and more concerned with having the tests not work. Certain parties (like you, and other folks) do run them offline and I wouldn't want to mess up anyone's workflow. But since you ran them, that sounds good to me. |
Hadoop IT requires a lot of memory. Locally, I give Docker around 11GB of memory. I believe anything less than 8-9GB will cause the cluster to fail when you also run hadoop container. |
|
Hi @capistrant, Recently I also encountered IT-related problems. And this PR made my job a lot easier! Thanks for this contribution. |
|
Converted this to draft temporarily as I think I actually identified an unrelated issue in the ITHadoopIndexTest that is unrelated to my PR, but can be fixed here regardless. #10219 introduced a change in the output of SegmentMetadataQuery, and the expected output for one of the tests did not get updated to reflect (https://github.com/apache/druid/blob/master/integration-tests/src/test/resources/hadoop/batch_hadoop_queries.json#L3) .... running locally to verify. before pushing commit and un-drafting this PR. missed it cuz I had ignored a few tests to speed up my local build times 🤦 |
|
ok, two small changes since your review, @gianm .. now the tests actually run for real this time, ha. The wait for task to complete only waited 5 min, but my dev machine has never been able to finish it in that time. I have a fairly powerful dev machine too, so I think that 10 min is just as good as we are gonna be able to do for running with hadoop docker. I also had to update the expected query results for one of the tests because of this change. #10219 |
…nderstandable (apache#10667) * Make some additions to IT suite to make Hadoop related testing more understandable * add start.hadoop.docker to mvn arg tips in doc * fix issues preventing ITIndexHadoopTest from running in local mode
Description
I have been working on some code the past week that involves running the Hadoop Integration Tests. I found the README revolving around the IT suite somewhat confusing when it comes to running the Hadoop Tasks. When to build the image, when to run the image, etc. all seemed a little bit more confusing than it needed to be. Because of that experience, I made some small tweaks that I think will help devs run integration tests using the local hadoop container.
I created a new argument to pass to maven commands that is dedicated to building the Hadoop Container. I added some context in a few different places that helps guide devs towards how to run the tests in different situations (run tests when containers already running? run tests when images built but containers not running? run tests when you need to build and run containers, etc.)
I hope this is found useful by others. Ready and willing to make modifications based on all of your comments in review!
This PR has:
Key changed/added classes in this PR