Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server #28

Closed
wants to merge 1 commit into from

Conversation

risdenk
Copy link
Contributor

@risdenk risdenk commented Mar 9, 2018

This branch is a work in progress to show how Apache Spark and Avatica don't seem to be playing along nicely together. Spark JDBC against Avatica returns an empty result even though it determines the correct schema.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 9, 2018

I'm expecting the build to fail with 0 rows being returned when there is actually a row there.

@risdenk risdenk force-pushed the CALCITE-1806 branch 6 times, most recently from e0d68ab to 2d5c270 Compare March 9, 2018 02:17
@risdenk
Copy link
Contributor Author

risdenk commented Mar 9, 2018

Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 7.373 sec <<< FAILURE! - in org.apache.calcite.avatica.remote.SparkClientTest
testSpark[JSON](org.apache.calcite.avatica.remote.SparkClientTest)  Time elapsed: 6.864 sec  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
	at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108)
testSpark[PROTOBUF](org.apache.calcite.avatica.remote.SparkClientTest)  Time elapsed: 0.48 sec  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
	at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108)

@risdenk
Copy link
Contributor Author

risdenk commented Mar 10, 2018

@joshelser - Any ideas on where to look for this? I'm surprised that there are no errors but 0 rows returned.

@joshelser
Copy link
Member

That is strange, @risdenk. I'd start with turning on trace logging for the Avatica protocol (see CALCITE-1879 for details). I'd probably take your simple example with Spark here and compare it to the same thing without Spark.

One thing to double-check before you try that though... you are using the avatica shaded client jar? org.apache.calcite.avatica:avatica as opposed to org.apache.calcite.avatica:avatica-core.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

Thanks @joshelser I'll take a look at the trace output.

I tried the shaded client jar with "real" Spark outside the testing framework. Inside the testing framework, not 100% sure what jar is being picked up right now.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

Trace output from SELECT 1 FROM test over standard JDBC (this is to match what spark is doing for .count())

2018-03-11 09:28:44,690 [qtp1769193365-18 - /] TRACE - request: {"request":"prepareAndExecute","connectionId":"74d1ae39-7f4b-49f8-9c84-3a9c13e686f5","statementId":0,"sql":"SELECT 1 from test","maxRowsInFirstFrame":-1,"maxRowCount":-1}
2018-03-11 09:28:44,692 [qtp1769193365-18 - /] TRACE - prepAndExec statement 74d1ae39-7f4b-49f8-9c84-3a9c13e686f5::0
2018-03-11 09:28:44,694 [qtp1769193365-18 - /] TRACE - response: Response: {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"},"results":[{"response":"resultSet","connectionId":"74d1ae39-7f4b-49f8-9c84-3a9c13e686f5","statementId":0,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},"firstFrame":{"offset":0,"done":true,"rows":[[1]]},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}]}, Status:200

Trace output from Spark .count() on the test table

2018-03-11 09:28:53,471 [qtp1769193365-19 - /] TRACE - request: {"request":"prepare","connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","sql":"SELECT 1 FROM test ","maxRowCount":-1}
2018-03-11 09:28:53,473 [qtp1769193365-19 - /] TRACE - prepared statement 65d00d7e-85f3-4edc-b6cb-23d8c0af21e3::2
2018-03-11 09:28:53,474 [qtp1769193365-19 - /] TRACE - response: Response: {"response":"prepare","statement":{"connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","id":2,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}, Status:200
...
2018-03-11 09:28:53,486 [qtp1769193365-17 - /] TRACE - request: {"request":"execute","statementHandle":{"connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","id":2,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"NUMBER"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"parameterValues":[],"maxRowCount":0}
2018-03-11 09:28:53,488 [qtp1769193365-17 - /] TRACE - response: Response: {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"},"results":[{"response":"resultSet","connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","statementId":2,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":"SELECT"},"firstFrame":{"offset":0,"done":true,"rows":[]},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}]}, Status:200

The big difference is that one is a prepareAndExecute and the other is a create prepared and then execute separately. The results are also different for the first frame:

"firstFrame":{"offset":0,"done":true,"rows":[[1]]}     # as part of standard JDBC
"firstFrame":{"offset":0,"done":true,"rows":[]}         # as part of Spark .count()

Still digging in further. Will try to reproduce the create prepared and separate execute next with standard JDBC.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

So it looks like Spark is setting fetch size to be 0 on the prepared statement. This is causing the maxRows to get set to 0 during the request.

For standard JDBC when the execute request is made: "maxRowCount":100} which results in at least 100 rows.
For Spark when the execute request is made: "maxRowCount":0} which results in 0 rows.

I can reproduce this by setFetchSize(0) on the PreparedStatement before executing it. Using setMaxRows(n) seems to have no affect. It looks like the two (fetch size and max size) are being confused somewhere in Avatica.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

I'm going to push my current Spark test which shows this failure without Spark even being invoked.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

Test failure as expected:

Failed tests: 
  SparkClientTest.testSpark:119
  SparkClientTest.testSpark:119

This is because the call ResultSet.next() fails when there are no rows when there should be rows.

The request is incorrectly getting sent with maxRowCount:0 because setFetchSize(0) was called..

2018-03-11 15:29:11,100 [qtp1411675719-124] TRACE - request: {"request":"execute","statementHandle":{"connectionId":"c5fbb060-4de7-424f-9013-6d2465616063","id":5,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"NUMBER"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 from test","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"parameterValues":[],"maxRowCount":0}

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

I don't think the overall problem is related to Spark support but Spark found the issue.

I found a solution to the setFetchSize issue. However I'm not entirely sure it is correct. It looks like fetchSize, maxRowsInFirstFrame, and maxRowCount could be misused throughout Avatica. It looks like at some point maxRowsinFirstFrame was added but not all the variable names were changed? I don't think I understand the different between fetchSize and maxRowsInFirstFrame since they should be similar if not the same?

@joshelser - Any thoughts on the above? I can dig in further just going to take longer to understand how it is all put together.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 11, 2018

Looks like the test failure is related to JDK7 and bumping the hsqldb version? This doesn't mean the workaround for maxRowsInFirstFrame I put in is correct. There seems to be some lacking test coverage around setFetchSize, setMaxRows, and maxRowsInFirstFrame.

@joshelser
Copy link
Member

I can reproduce this by setFetchSize(0) on the PreparedStatement before executing it.

Oof. Getting fetchSize and maxResultSize mixed up is bad.

I found a solution to the setFetchSize issue. However I'm not entirely sure it is correct. It looks like fetchSize, maxRowsInFirstFrame, and maxRowCount could be misused throughout Avatica. It looks like at some point maxRowsinFirstFrame was added but not all the variable names were changed? I don't think I understand the different between fetchSize and maxRowsInFirstFrame since they should be similar if not the same?

maxRowsInFirstFrame and maxRowCount are irritating. This was trying to fix an old bug in a backwards-compat manner. IIRC, maxRowCount is supposed to be a global "I want no more than $maxRowCount rows from this query", whereas maxRowsInFirstFrame is a protocol-specific thing that limits the number of rows we'll pack into a single ResultSetResponse message from server->client.

I don't recall how fetchSize fits into this, but it seems to be getting used in Avatica somewhere, instead of just getting set onto the "real" Statement inside of the server. (Avatica could try to do something smart with the fetchSize, but an acceptable starting point would just be to pass the value along).

Does that help? If you have a unit test you can share, that would help me get up to speed in debugging.

@risdenk
Copy link
Contributor Author

risdenk commented Mar 12, 2018

@joshelser - Thanks for the insight. I think that helps.

As far as a unit test, adding ps.setFetchSize(0) on Line 563 of RemoteDriverTest and adding statement.setFetchSize(0) on Line 569 of RemoteDriverTest will get the RemoteDriverTest to fail for a few of the existing tests.

@risdenk
Copy link
Contributor Author

risdenk commented Apr 20, 2018

Rebased on latest master to look at this again.

Signed-off-by: Kevin Risden <krisden@apache.org>
@zabetak
Copy link
Contributor

zabetak commented Oct 23, 2022

This PR has been inactive for quite some time now and it seems there is no interest from the authors to push this forward thus I am closing it down.

@zabetak zabetak closed this Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants