[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server #28

risdenk · 2018-03-09T00:44:40Z

This branch is a work in progress to show how Apache Spark and Avatica don't seem to be playing along nicely together. Spark JDBC against Avatica returns an empty result even though it determines the correct schema.

risdenk · 2018-03-09T00:46:17Z

I'm expecting the build to fail with 0 rows being returned when there is actually a row there.

risdenk · 2018-03-09T02:22:26Z

Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 7.373 sec <<< FAILURE! - in org.apache.calcite.avatica.remote.SparkClientTest
testSpark[JSON](org.apache.calcite.avatica.remote.SparkClientTest)  Time elapsed: 6.864 sec  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
	at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108)
testSpark[PROTOBUF](org.apache.calcite.avatica.remote.SparkClientTest)  Time elapsed: 0.48 sec  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
	at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108)

risdenk · 2018-03-10T14:31:05Z

@joshelser - Any ideas on where to look for this? I'm surprised that there are no errors but 0 rows returned.

joshelser · 2018-03-11T02:14:40Z

That is strange, @risdenk. I'd start with turning on trace logging for the Avatica protocol (see CALCITE-1879 for details). I'd probably take your simple example with Spark here and compare it to the same thing without Spark.

One thing to double-check before you try that though... you are using the avatica shaded client jar? org.apache.calcite.avatica:avatica as opposed to org.apache.calcite.avatica:avatica-core.

risdenk · 2018-03-11T14:08:01Z

Thanks @joshelser I'll take a look at the trace output.

I tried the shaded client jar with "real" Spark outside the testing framework. Inside the testing framework, not 100% sure what jar is being picked up right now.

risdenk · 2018-03-11T14:38:30Z

Trace output from SELECT 1 FROM test over standard JDBC (this is to match what spark is doing for .count())

2018-03-11 09:28:44,690 [qtp1769193365-18 - /] TRACE - request: {"request":"prepareAndExecute","connectionId":"74d1ae39-7f4b-49f8-9c84-3a9c13e686f5","statementId":0,"sql":"SELECT 1 from test","maxRowsInFirstFrame":-1,"maxRowCount":-1}
2018-03-11 09:28:44,692 [qtp1769193365-18 - /] TRACE - prepAndExec statement 74d1ae39-7f4b-49f8-9c84-3a9c13e686f5::0
2018-03-11 09:28:44,694 [qtp1769193365-18 - /] TRACE - response: Response: {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"},"results":[{"response":"resultSet","connectionId":"74d1ae39-7f4b-49f8-9c84-3a9c13e686f5","statementId":0,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},"firstFrame":{"offset":0,"done":true,"rows":[[1]]},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}]}, Status:200

Trace output from Spark .count() on the test table

2018-03-11 09:28:53,471 [qtp1769193365-19 - /] TRACE - request: {"request":"prepare","connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","sql":"SELECT 1 FROM test ","maxRowCount":-1}
2018-03-11 09:28:53,473 [qtp1769193365-19 - /] TRACE - prepared statement 65d00d7e-85f3-4edc-b6cb-23d8c0af21e3::2
2018-03-11 09:28:53,474 [qtp1769193365-19 - /] TRACE - response: Response: {"response":"prepare","statement":{"connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","id":2,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}, Status:200
...
2018-03-11 09:28:53,486 [qtp1769193365-17 - /] TRACE - request: {"request":"execute","statementHandle":{"connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","id":2,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"NUMBER"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"parameterValues":[],"maxRowCount":0}
2018-03-11 09:28:53,488 [qtp1769193365-17 - /] TRACE - response: Response: {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"},"results":[{"response":"resultSet","connectionId":"65d00d7e-85f3-4edc-b6cb-23d8c0af21e3","statementId":2,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 FROM test ","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":"SELECT"},"firstFrame":{"offset":0,"done":true,"rows":[]},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"localhost:54186"}}]}, Status:200

The big difference is that one is a prepareAndExecute and the other is a create prepared and then execute separately. The results are also different for the first frame:

"firstFrame":{"offset":0,"done":true,"rows":[[1]]}     # as part of standard JDBC
"firstFrame":{"offset":0,"done":true,"rows":[]}         # as part of Spark .count()

Still digging in further. Will try to reproduce the create prepared and separate execute next with standard JDBC.

risdenk · 2018-03-11T15:06:07Z

So it looks like Spark is setting fetch size to be 0 on the prepared statement. This is causing the maxRows to get set to 0 during the request.

For standard JDBC when the execute request is made: "maxRowCount":100} which results in at least 100 rows.
For Spark when the execute request is made: "maxRowCount":0} which results in 0 rows.

I can reproduce this by setFetchSize(0) on the PreparedStatement before executing it. Using setMaxRows(n) seems to have no affect. It looks like the two (fetch size and max size) are being confused somewhere in Avatica.

risdenk · 2018-03-11T15:28:08Z

I'm going to push my current Spark test which shows this failure without Spark even being invoked.

risdenk · 2018-03-11T15:32:38Z

Test failure as expected:

Failed tests: 
  SparkClientTest.testSpark:119
  SparkClientTest.testSpark:119

This is because the call ResultSet.next() fails when there are no rows when there should be rows.

The request is incorrectly getting sent with maxRowCount:0 because setFetchSize(0) was called..

2018-03-11 15:29:11,100 [qtp1411675719-124] TRACE - request: {"request":"execute","statementHandle":{"connectionId":"c5fbb060-4de7-424f-9013-6d2465616063","id":5,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":false,"currency":false,"nullable":0,"signed":true,"displaySize":11,"label":"C1","columnName":"C1","schemaName":"","precision":32,"scale":0,"tableName":"","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"NUMBER"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":"SELECT 1 from test","parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null}},"parameterValues":[],"maxRowCount":0}

risdenk · 2018-03-11T15:54:48Z

I don't think the overall problem is related to Spark support but Spark found the issue.

I found a solution to the setFetchSize issue. However I'm not entirely sure it is correct. It looks like fetchSize, maxRowsInFirstFrame, and maxRowCount could be misused throughout Avatica. It looks like at some point maxRowsinFirstFrame was added but not all the variable names were changed? I don't think I understand the different between fetchSize and maxRowsInFirstFrame since they should be similar if not the same?

@joshelser - Any thoughts on the above? I can dig in further just going to take longer to understand how it is all put together.

risdenk · 2018-03-11T15:57:49Z

Looks like the test failure is related to JDK7 and bumping the hsqldb version? This doesn't mean the workaround for maxRowsInFirstFrame I put in is correct. There seems to be some lacking test coverage around setFetchSize, setMaxRows, and maxRowsInFirstFrame.

joshelser · 2018-03-12T21:55:27Z

I can reproduce this by setFetchSize(0) on the PreparedStatement before executing it.

Oof. Getting fetchSize and maxResultSize mixed up is bad.

I found a solution to the setFetchSize issue. However I'm not entirely sure it is correct. It looks like fetchSize, maxRowsInFirstFrame, and maxRowCount could be misused throughout Avatica. It looks like at some point maxRowsinFirstFrame was added but not all the variable names were changed? I don't think I understand the different between fetchSize and maxRowsInFirstFrame since they should be similar if not the same?

maxRowsInFirstFrame and maxRowCount are irritating. This was trying to fix an old bug in a backwards-compat manner. IIRC, maxRowCount is supposed to be a global "I want no more than $maxRowCount rows from this query", whereas maxRowsInFirstFrame is a protocol-specific thing that limits the number of rows we'll pack into a single ResultSetResponse message from server->client.

I don't recall how fetchSize fits into this, but it seems to be getting used in Avatica somewhere, instead of just getting set onto the "real" Statement inside of the server. (Avatica could try to do something smart with the fetchSize, but an acceptable starting point would just be to pass the value along).

Does that help? If you have a unit test you can share, that would help me get up to speed in debugging.

risdenk · 2018-03-12T23:49:36Z

@joshelser - Thanks for the insight. I think that helps.

As far as a unit test, adding ps.setFetchSize(0) on Line 563 of RemoteDriverTest and adding statement.setFetchSize(0) on Line 569 of RemoteDriverTest will get the RemoteDriverTest to fail for a few of the existing tests.

risdenk · 2018-04-20T03:21:44Z

Rebased on latest master to look at this again.

Signed-off-by: Kevin Risden <krisden@apache.org>

zabetak · 2022-10-23T17:11:43Z

This PR has been inactive for quite some time now and it seems there is no interest from the authors to push this forward thus I am closing it down.

risdenk force-pushed the CALCITE-1806 branch 6 times, most recently from e0d68ab to 2d5c270 Compare March 9, 2018 02:17

risdenk force-pushed the CALCITE-1806 branch from 2d5c270 to 54ed1fb Compare March 11, 2018 15:26

risdenk force-pushed the CALCITE-1806 branch from 54ed1fb to ba1b361 Compare March 11, 2018 15:50

risdenk force-pushed the CALCITE-1806 branch from ba1b361 to 32021e0 Compare April 20, 2018 03:21

asfgit force-pushed the master branch from 2a80eec to 146d310 Compare September 23, 2018 02:42

asfgit force-pushed the master branch 5 times, most recently from ff37d89 to a8617e0 Compare November 16, 2018 07:52

[CALCITE-1806] Add Apache Spark JDBC test to Avatica server

c333abb

Signed-off-by: Kevin Risden <krisden@apache.org>

risdenk force-pushed the CALCITE-1806 branch from 32021e0 to c333abb Compare January 19, 2019 20:20

F21 force-pushed the master branch 2 times, most recently from 0640c66 to d52c203 Compare May 9, 2019 08:41

zabetak force-pushed the master branch from 5639977 to 8f329f4 Compare July 11, 2019 07:55

vlsi force-pushed the master branch 3 times, most recently from d90fb8c to 92045d0 Compare November 17, 2019 14:44

F21 force-pushed the master branch from 204d588 to 512bbee Compare December 11, 2019 21:58

F21 force-pushed the master branch from 0c147fb to 1786676 Compare May 12, 2021 04:42

julianhyde force-pushed the master branch 2 times, most recently from a9723ff to ef277ff Compare October 11, 2021 18:50

zabetak closed this Oct 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server #28

[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server #28

risdenk commented Mar 9, 2018

risdenk commented Mar 9, 2018 •

edited

Loading

risdenk commented Mar 9, 2018

risdenk commented Mar 10, 2018

joshelser commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

joshelser commented Mar 12, 2018

risdenk commented Mar 12, 2018

risdenk commented Apr 20, 2018

zabetak commented Oct 23, 2022

[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server #28

[WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server #28

Conversation

risdenk commented Mar 9, 2018

risdenk commented Mar 9, 2018 • edited Loading

risdenk commented Mar 9, 2018

risdenk commented Mar 10, 2018

joshelser commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

risdenk commented Mar 11, 2018

joshelser commented Mar 12, 2018

risdenk commented Mar 12, 2018

risdenk commented Apr 20, 2018

zabetak commented Oct 23, 2022

risdenk commented Mar 9, 2018 •

edited

Loading