Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-7899: [Integration][Java] Fix Flight integration test client to verify each batch #6476

Closed

Conversation

BryanCutler
Copy link
Member

Previously the Java Flight integration test client would read a number of batches from the server, but only verify the last one is equal the last batch in the JSON file. This fixes the client to verify each batch is correct and that the number of batches sent is the same as in the JSON input file.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@BryanCutler
Copy link
Member Author

Along with #6368 , this almost fixes Null type testing in integration. There are currently 2 integration test failures primitive_zerolength and generated_null with C++ serving and Java client.

Both of these seem to be due to an implementation difference, where the C++ server doesn't return any more RecordBatches at the end of a flight if the record counts are 0. The Java server does return the empty batches, so tests where Java is serving are able to pass.

With C++ server and C++ client, this passes because all read batches are concatenated into a Table and only the final Table is verified - not sure if that is a problem or not since the actual data is the same.

So to sum it up, it seems like the remaining issue is that C++ has an optimization to not serve empty batches at the end of a Flight stream and Java does not. Both cases end up with the same data, it's just a question of if the flight testing should verify each batch in the stream or the entirety of the stream at the end. What are your thoughts @wesm @jacques-n @lidavidm ?

@BryanCutler BryanCutler changed the title [WIP] ARROW-7899: [Integration][Java] Fix Flight integration test client to verify each batch ARROW-7899: [Integration][Java] Fix Flight integration test client to verify each batch Feb 23, 2020
@BryanCutler
Copy link
Member Author

Also a problem is that the Java Flight integration client doesn't seem to have any roundtrip unit tests, that could be added as a followup.

@github-actions
Copy link

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

If the JSON file specifies things down to the individual batches, then I vote that the client should validate things at that level too. So the C++ server should send individual batches even if they're empty, and presumably the integration client should validate individual batches.

From a quick look, C++ differs because it reads the JSON file in as a table and writes that, losing the individual batches. Similarly, the server stores a table, not a list of batches.

@lidavidm
Copy link
Member

Ah, should we mark the integration tests as failed for now so that CI passes?

@nealrichardson
Copy link
Member

Thanks for doing this!

If you do skip any integration tests that are now failing (IMO that's a fine way to make incremental progress, but I don't object if others want to see them all fixed here), please make Jira(s) for them and please also record the Jiras on https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 (LMK if you need edit access)

@BryanCutler
Copy link
Member Author

@lidavidm I don't think we need to skip the unit tests. For now, I can change the Java test client to not raise an error if the only remaining record batches are empty. That seems reasonable and we can move forward. I can send a message to the dev list and we can discuss the proper behavior for the server and testing.

@BryanCutler
Copy link
Member Author

All tests passing now, merged to master. Thanks @lidavidm !

@BryanCutler BryanCutler deleted the integration-null-test-ARROW-7899 branch February 24, 2020 22:23
@BryanCutler
Copy link
Member Author

Added https://issues.apache.org/jira/browse/ARROW-7933 for additional tests on the IntegrationTestClient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants