-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-7899: [Integration][Java] Fix Flight integration test client to verify each batch #6476
ARROW-7899: [Integration][Java] Fix Flight integration test client to verify each batch #6476
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
Along with #6368 , this almost fixes Null type testing in integration. There are currently 2 integration test failures Both of these seem to be due to an implementation difference, where the C++ server doesn't return any more RecordBatches at the end of a flight if the record counts are 0. The Java server does return the empty batches, so tests where Java is serving are able to pass. With C++ server and C++ client, this passes because all read batches are concatenated into a Table and only the final Table is verified - not sure if that is a problem or not since the actual data is the same. So to sum it up, it seems like the remaining issue is that C++ has an optimization to not serve empty batches at the end of a Flight stream and Java does not. Both cases end up with the same data, it's just a question of if the flight testing should verify each batch in the stream or the entirety of the stream at the end. What are your thoughts @wesm @jacques-n @lidavidm ? |
Also a problem is that the Java Flight integration client doesn't seem to have any roundtrip unit tests, that could be added as a followup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
If the JSON file specifies things down to the individual batches, then I vote that the client should validate things at that level too. So the C++ server should send individual batches even if they're empty, and presumably the integration client should validate individual batches.
From a quick look, C++ differs because it reads the JSON file in as a table and writes that, losing the individual batches. Similarly, the server stores a table, not a list of batches.
Ah, should we mark the integration tests as failed for now so that CI passes? |
Thanks for doing this! If you do skip any integration tests that are now failing (IMO that's a fine way to make incremental progress, but I don't object if others want to see them all fixed here), please make Jira(s) for them and please also record the Jiras on https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 (LMK if you need edit access) |
@lidavidm I don't think we need to skip the unit tests. For now, I can change the Java test client to not raise an error if the only remaining record batches are empty. That seems reasonable and we can move forward. I can send a message to the dev list and we can discuss the proper behavior for the server and testing. |
All tests passing now, merged to master. Thanks @lidavidm ! |
Added https://issues.apache.org/jira/browse/ARROW-7933 for additional tests on the IntegrationTestClient |
Previously the Java Flight integration test client would read a number of batches from the server, but only verify the last one is equal the last batch in the JSON file. This fixes the client to verify each batch is correct and that the number of batches sent is the same as in the JSON input file.