Skip to content

[SPARK-28472][SQL][TEST] Add test for thriftserver protocol versions#25228

Closed
wangyum wants to merge 7 commits intoapache:masterfrom
wangyum:SPARK-28472
Closed

[SPARK-28472][SQL][TEST] Add test for thriftserver protocol versions#25228
wangyum wants to merge 7 commits intoapache:masterfrom
wangyum:SPARK-28472

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jul 22, 2019

What changes were proposed in this pull request?

This pr adds a test(SparkThriftServerProtocolVersionsSuite) to test different versions of the thrift protocol because we use different logic to handle the RowSet:

public static RowSet create(TableSchema schema, TProtocolVersion version) {
if (version.getValue() >= HIVE_CLI_SERVICE_PROTOCOL_V6.getValue()) {
return new ColumnBasedSet(schema);
}
return new RowBasedSet(schema);
}
public static RowSet create(TRowSet results, TProtocolVersion version) {
if (version.getValue() >= HIVE_CLI_SERVICE_PROTOCOL_V6.getValue()) {
return new ColumnBasedSet(results);
}
return new RowBasedSet(results);
}

When adding this test cases, found three bugs:
SPARK-26969: Using ODBC not able to see the data in table when datatype is decimal
SPARK-28463: Thriftserver throws BigDecimal incompatible with HiveDecimal
SPARK-28474: Lower JDBC client version(Hive 0.12) cannot read binary type

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108009 has finished for PR 25228 at commit a8f0ad3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108012 has finished for PR 25228 at commit d1f4d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [SPARK-28472][SQL][TEST] Add test for thriftserver protocol versions [SPARK-28472][SQL][TEST][test-hadoop3.2] Add test for thriftserver protocol versions Jul 22, 2019
@wangyum
Copy link
Member Author

wangyum commented Jul 22, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108013 has finished for PR 25228 at commit d1f4d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108018 has finished for PR 25228 at commit 3fa6e53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jul 22, 2019

cc @juliuszsompolski @gatorsmile

@SparkQA
Copy link

SparkQA commented Jul 23, 2019

Test build #108028 has finished for PR 25228 at commit 00c4c69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2019

Test build #108048 has finished for PR 25228 at commit 7d1cc80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private[thriftserver] def toJavaSQLType(s: String): Int = Type.getType(s).toJavaSQLType

private[thriftserver] val testedProtocalVersions = Seq(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocal -> Protocol
Do we really have to test against so many versions? how many are in use by versions of Hive supported by 3.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options:

  1. HIVE_CLI_SERVICE_PROTOCOL_V3 -> HIVE_CLI_SERVICE_PROTOCOL_V8 because we support Hive 0.12.
  2. HIVE_CLI_SERVICE_PROTOCOL_V1 -> HIVE_CLI_SERVICE_PROTOCOL_V8. Like Hive, lower versions of the client can always connect to a higher version of ThriftServer. e.g. Hive 0.11 still can connect to Hive 3.1's ThriftServer without any warning:
[root@spark-3267648 hive-0.11.0-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/default
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-0.11.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Hive (version 3.1.1)
Driver: Hive (version 0.11.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.11.0 by Apache Hive
0: jdbc:hive2://localhost:10000/default> select 1;
+------+
| _c0  |
+------+
| 1    |
+------+
1 row selected (0.291 seconds)

I'd prefer option 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the downside is we spend extra time testing more protocols? would we ever care to fix support of such old protocols if they didn't work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about test HIVE_CLI_SERVICE_PROTOCOL_V3 -> HIVE_CLI_SERVICE_PROTOCOL_V8 because we support Hive 0.12?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust your judgment either way, but that seems reasonable to me.

@SparkQA
Copy link

SparkQA commented Jul 26, 2019

Test build #108231 has finished for PR 25228 at commit bde0129.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @juliuszsompolski @bogdanrdc

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor question/nit

// We do not fully support interval type
ignore(s"$version get interval type") {
testWithProtocolVersion(version, "SELECT interval '1' year '2' day") { rs =>
assert(rs.next())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we still assert what the returned value is equal to here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will throw exception.
Client log:

0: jdbc:hive2://localhost:10000/default> SELECT interval '1' year '2' day;
Error: java.lang.IllegalArgumentException: Unrecognized type name: interval (state=,code=0)

Server log:

java.lang.RuntimeException: java.lang.IllegalArgumentException: Unrecognized type name: interval
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.security.AccessController.doPrivileged(AccessController.java:770)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at com.sun.proxy.$Proxy26.getResultSetMetadata(Unknown Source)
	at org.apache.hive.service.cli.CLIService.getResultSetMetadata(CLIService.java:436)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.GetResultSetMetadata(ThriftCLIService.java:607)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1533)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1518)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:819)
Caused by: java.lang.IllegalArgumentException: Unrecognized type name: interval
	at org.apache.hive.service.cli.Type.getType(Type.java:169)
	at org.apache.hive.service.cli.TypeDescriptor.<init>(TypeDescriptor.java:53)
	at org.apache.hive.service.cli.ColumnDescriptor.<init>(ColumnDescriptor.java:53)
	at org.apache.hive.service.cli.TableSchema.<init>(TableSchema.java:52)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$.getTableSchema(SparkExecuteStatementOperation.scala:314)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema$lzycompute(SparkExecuteStatementOperation.scala:69)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema(SparkExecuteStatementOperation.scala:64)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getResultSetSchema(SparkExecuteStatementOperation.scala:158)
	at org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:209)
	at org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:773)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	... 18 more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliuszsompolski
Copy link
Contributor

Sorry, more comments post LGTM:

Regarding testing V3 to V10 vs V1 to V10:
Simba drivers seem to connect initially with V1, make some initial calls (set -v to get all configs, GetInfo calls to get CLI_SERVER_NAME, CLI_DBMS_VER), and then it disconnects and reconnects with a higher version (V8 for thriftserver v1.2.1).

Maybe we could test starting to V1 then, and add some tests for other requests than executing queries (GetInfo calls, some metadata requests...)?

@SparkQA
Copy link

SparkQA commented Jul 30, 2019

Test build #108359 has finished for PR 25228 at commit 1c9afae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@juliuszsompolski
Copy link
Contributor

LGTM. Thanks!

@juliuszsompolski
Copy link
Contributor

cc @gatorsmile

@gatorsmile gatorsmile changed the title [SPARK-28472][SQL][TEST][test-hadoop3.2] Add test for thriftserver protocol versions [SPARK-28472][SQL][TEST] Add test for thriftserver protocol versions Aug 3, 2019
@gatorsmile
Copy link
Member

retest this please

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented Aug 3, 2019

Test build #108593 has finished for PR 25228 at commit 1c9afae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2019

Test build #108696 has finished for PR 25228 at commit b44804a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2019

Test build #108697 has finished for PR 25228 at commit 1c9afae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile closed this in a59fdc4 Aug 7, 2019
@wangyum wangyum deleted the SPARK-28472 branch August 7, 2019 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants