[LIVY-622][LIVY-623][LIVY-624][LIVY-625][Thrift]Support GetFunctions, GetSchemas, GetTables, GetColumns in Livy thrift server #194

yiheng · 2019-08-08T13:20:15Z

What changes were proposed in this pull request?

In this patch, we add the implementations of GetSchemas, GetFunctions, GetTables, and GetColumns in Livy Thrift server.

https://issues.apache.org/jira/browse/LIVY-622
https://issues.apache.org/jira/browse/LIVY-623
https://issues.apache.org/jira/browse/LIVY-624
https://issues.apache.org/jira/browse/LIVY-625

How was this patch tested?

Add new unit tests and integration test. Run them with existing tests.

…t server

yiheng · 2019-08-08T13:23:25Z

thriftserver/server/src/main/scala/org/apache/livy/thriftserver/cli/ThriftCLIService.scala

@@ -427,8 +427,8 @@ abstract class ThriftCLIService(val cliService: LivyCLIService, val serviceName:
  override def GetSchemas(req: TGetSchemasReq): TGetSchemasResp = {
    val resp = new TGetSchemasResp
    try {
-      val opHandle = cliService.getSchemas(
-        new SessionHandle(req.getSessionHandle), req.getCatalogName, req.getSchemaName)
+      val opHandle = cliService.getSchemas(createSessionHandle(req.getSessionHandle),


Create a session handle with the real protocol version of this session. The original is using version_v1 as default, which will not pass the requirement when generating the thrift result set. see here

yiheng · 2019-08-08T13:33:42Z

...tserver/server/src/main/scala/org/apache/livy/thriftserver/operation/MetadataOperation.scala

@@ -44,6 +45,40 @@ abstract class MetadataOperation(sessionHandle: SessionHandle, opType: Operation
    if (orientation.equals(FetchOrientation.FETCH_FIRST)) {
      rowSet.setRowOffset(0)
    }
-    rowSet
+    rowSet.extractSubset(maxRows)


Fix metadata resultset is infinite issue.

A new rowSet will be generated which contains subset data. The offset of the original row will be moved.

yiheng · 2019-08-08T13:36:14Z

Unlike spark thrift server, we use spark catalog to fetch the metadata instead of Hive client, to avoid a too strong binding relationship between livy and hive. @mgaido91

mgaido91

may you please also try and test your patch using Squirrel or similar stuff, in order to ensure that the information is retrieved correctly for the metadata? It would be great to include screenshots in order to show that it is working.

Thanks for your contribution!

thriftserver/server/src/main/scala/org/apache/livy/thriftserver/LivyCLIService.scala

...erver/server/src/main/scala/org/apache/livy/thriftserver/operation/GetColumnsOperation.scala

mgaido91 · 2019-08-10T09:02:11Z

thriftserver/session/src/main/java/org/apache/livy/thriftserver/session/GetSchemasJob.java

+        for(Row r : rows) {
+            schemas.add(new Object[]{
+                r.getString(0),
+                ""


why do we need this?if it is always empty, makes few sense to return it, doesn't it?

Refer to Spark thrift server. The schema catalog field is not supported in spark, so here return an empty string.

This is related spark code code1 code2

Use a meaningful variable to hold the value code

thriftserver/session/src/main/java/org/apache/livy/thriftserver/session/GetTablesJob.java

mgaido91 · 2019-08-10T09:19:30Z

thriftserver/session/src/main/java/org/apache/livy/thriftserver/session/SparkUtils.java

+    }
+  }
+
+  public static Integer getColumnSize(org.apache.spark.sql.types.DataType type) {


where did you take this and the following methods from?

They're from spark thrift server. Please find the related code here

thriftserver/session/src/test/java/org/apache/livy/thriftserver/session/ThriftSessionTest.java

yiheng · 2019-08-13T11:01:53Z

@mgaido91 I tested with beeline and squirrel-sql. Please notice that there're some issues in the existing metadata operation(getCatalog/getTableTypes/getTypeInfo). I raised another patch #197 to fix them.

After fixed these issues, here's the sceenshot:

yiheng · 2019-08-13T11:02:30Z

I have fixed the comments. @mgaido91 can you start another round of code review, thank!

thriftserver/server/src/main/scala/org/apache/livy/thriftserver/LivyCLIService.scala

thriftserver/server/src/main/scala/org/apache/livy/thriftserver/LivyOperationManager.scala

...erver/server/src/main/scala/org/apache/livy/thriftserver/operation/GetColumnsOperation.scala

...ver/server/src/main/scala/org/apache/livy/thriftserver/operation/SparkCatalogOperation.scala

thriftserver/session/src/main/java/org/apache/livy/thriftserver/session/GetColumnsJob.java

...server/session/src/main/java/org/apache/livy/thriftserver/session/FetchCatalogResultJob.java

codecov-io · 2019-08-14T13:14:24Z

Codecov Report

Merging #194 into master will increase coverage by 40.28%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##             master     #194       +/-   ##
=============================================
+ Coverage     28.33%   68.62%   +40.28%     
- Complexity      343      912      +569     
=============================================
  Files           100      100               
  Lines          5679     5679               
  Branches        855      855               
=============================================
+ Hits           1609     3897     +2288     
+ Misses         3739     1224     -2515     
- Partials        331      558      +227

Impacted Files	Coverage Δ	Complexity Δ
...main/scala/org/apache/livy/server/LivyServer.scala	`35.96% <0%> (+0.98%)`	`11% <0%> (ø)`	⬇️
...rver/src/main/scala/org/apache/livy/LivyConf.scala	`95.87% <0%> (+1.03%)`	`21% <0%> (+3%)`	⬆️
.../main/scala/org/apache/livy/server/WebServer.scala	`53.33% <0%> (+1.66%)`	`10% <0%> (+1%)`	⬆️
...la/org/apache/livy/server/batch/BatchSession.scala	`86.17% <0%> (+2.12%)`	`14% <0%> (ø)`	⬇️
...la/org/apache/livy/utils/SparkProcessBuilder.scala	`54.44% <0%> (+2.22%)`	`11% <0%> (+1%)`	⬆️
.../scala/org/apache/livy/sessions/SessionState.scala	`61.11% <0%> (+2.77%)`	`2% <0%> (ø)`	⬇️
...e/livy/server/interactive/InteractiveSession.scala	`69.11% <0%> (+3.66%)`	`44% <0%> (+2%)`	⬆️
...org/apache/livy/server/recovery/SessionStore.scala	`80% <0%> (+5%)`	`10% <0%> (ø)`	⬇️
...ain/scala/org/apache/livy/server/JsonServlet.scala	`38.46% <0%> (+5.76%)`	`18% <0%> (+4%)`	⬆️
.../apache/livy/server/batch/CreateBatchRequest.scala	`68.75% <0%> (+6.25%)`	`19% <0%> (+1%)`	⬆️
... and 70 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7f23e0...a25488c. Read the comment docs.

thriftserver/server/src/main/scala/org/apache/livy/thriftserver/LivyOperationManager.scala

...ver/server/src/main/scala/org/apache/livy/thriftserver/operation/SparkCatalogOperation.scala

thriftserver/session/src/main/java/org/apache/livy/thriftserver/session/GetSchemasJob.java

thriftserver/session/src/main/java/org/apache/livy/thriftserver/session/GetTablesJob.java

jerryshao · 2019-08-15T03:31:15Z

...ver/server/src/main/scala/org/apache/livy/thriftserver/operation/SparkCatalogOperation.scala

+  // The initialization need to be lazy in order not to block when the instance is created
+  protected lazy val rscClient = {
+    // This call is blocking, we are waiting for the session to be ready.
+    sessionManager.getLivySession(sessionHandle).client.get


Shall we check if client is not null, require(client != null)?

From this code, it seems if we cannot get a session, an error will be thrown in livy session manager.

jerryshao

LGTM, just some small issues.

jerryshao · 2019-08-16T02:29:03Z

LGTM, merging to master branch, thanks for the contribution.

mgaido91

I think there are still come critical parts in this PR despite it was merged. I'd suggest to continue the discussion and create a followup fixing the remaining issues

mgaido91 · 2019-08-21T09:48:06Z

...ver/server/src/main/scala/org/apache/livy/thriftserver/operation/GetFunctionsOperation.scala

+    GetFunctionsOperation.SCHEMA
+  }
+
+  private def convertFunctionName(name: String): String = {


sorry, I still don't understand why we need this method. May you explain me?

This is ported from here. The basic reason is Spark is using regex to filter the function name. See here and here. We need to covert the SQL wildcard to regex.

mgaido91 · 2019-08-21T09:49:17Z

...tserver/server/src/main/scala/org/apache/livy/thriftserver/operation/MetadataOperation.scala

+/**
+  * MetadataOperation is the base class for operations which do not perform any call on Spark side
+  *
+  * @param sessionHandle


what does this mean? I mean, no description at all, the name of the parameters can also be read from the method signature...

Let me remove it.

mgaido91 · 2019-08-21T09:58:05Z

...ver/server/src/main/scala/org/apache/livy/thriftserver/operation/SparkCatalogOperation.scala

+    val maxRows = maxRowsL.toInt
+    val results = rscClient.submit(new FetchCatalogResultJob(sessionId, jobId, maxRows)).get()
+
+    val rowSet = ThriftResultSet.apply(getResultSetSchema, protocolVersion)


why not ThriftResultSet.apply(results)?

results is List<Object[]>

mgaido91 · 2019-08-21T10:02:43Z

...rver/session/src/main/java/org/apache/livy/thriftserver/session/CleanupCatalogResultJob.java

+import org.apache.livy.Job;
+import org.apache.livy.JobContext;
+
+public class CleanupCatalogResultJob implements Job<Boolean> {


I don't understand why we need this and the new state, instead of reusing the existing one for statements?

Currently, I fetch metadata objects from Spark SessionCatalog API and construct the result set in the Livy server side. So the schema and types of the StatementState are useless in this case. And Iterator<Row> is not quite fit the data to send.

Maybe we can change to construct the ResultSet on SparkCatalogJob and return it directly to the client on Livy server. This can reduce the number of SparkCatalogOperation. Is this what you mean?

Maybe we can change to construct the ResultSet on SparkCatalogJob
This should be definitely done. We should alway transfer ResultSet on the wire since it a compressed representation of the data compared to normal arrays of objects.

ok. I will submit a PR to refactor the code

mgaido91 · 2019-08-25T08:31:10Z

...server/server/src/main/scala/org/apache/livy/thriftserver/operation/GetTablesOperation.scala

+    try {
+      rscClient.submit(new GetTablesJob(
+        convertSchemaPattern(schemaName),
+        convertIdentifierPattern(tableName, datanucleusFormat = true),


I am not sure how you decided to put datanucleusFormat to true or false in the various calls, may you explain?

When passing the pattern to Spark SessionCatalog API, e.g. list tables, the datanucleusFormat is set to true. The convertPattern will replace % with *. The SessionCatalog require * wildcard, and it will convert it to .* internally, please see here.

When filtering the objects by the pattern in the Livy code, the datanucleusFormat is set to false. % will be replaced by .* as we use regex to filter the names, e.g. list columns. Please see here

…Set in catalog operations ## What changes were proposed in this pull request? This is a followup of #194 which addresses all the remaining concerns. The main changes are: - reverting the introduction of a state specific for catalog operations; - usage of `ResultSet` to send over the wire the data for catalog operations too. ## How was this patch tested? existing modified UTs Author: Marco Gaido <mgaido@apache.org> Closes #217 from mgaido91/LIVY-622_followup.

Support GetFunctions, GetSchemas, GetTables, GetColumns in livy thrif…

2819efb

…t server

yiheng commented Aug 8, 2019

View reviewed changes

yiheng closed this Aug 9, 2019

yiheng reopened this Aug 9, 2019

yiheng closed this Aug 9, 2019

yiheng reopened this Aug 9, 2019

mgaido91 reviewed Aug 10, 2019

View reviewed changes

yiheng force-pushed the fix_575 branch from 1da9958 to 13100d8 Compare August 12, 2019 13:41

meet code review

4fdfa28

yiheng force-pushed the fix_575 branch from 13100d8 to 4fdfa28 Compare August 13, 2019 10:00

jerryshao requested changes Aug 13, 2019

View reviewed changes

yiheng added 2 commits August 14, 2019 12:13

meet code review

7d2e44b

meet code review

b453234

jerryshao reviewed Aug 14, 2019

View reviewed changes

fix code style

a25488c

yiheng force-pushed the fix_575 branch from 36b47cc to a25488c Compare August 15, 2019 02:14

jerryshao reviewed Aug 15, 2019

View reviewed changes

jerryshao approved these changes Aug 15, 2019

View reviewed changes

jerryshao closed this in cae9d97 Aug 16, 2019

mgaido91 reviewed Aug 21, 2019

View reviewed changes

mgaido91 reviewed Aug 25, 2019

View reviewed changes

mgaido91 mentioned this pull request Aug 26, 2019

[LIVY-622][LIVY-623][LIVY-624][LIVY-625][THRIFT][FOLLOWUP] Use ResultSet in catalog operations #217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LIVY-622][LIVY-623][LIVY-624][LIVY-625][Thrift]Support GetFunctions, GetSchemas, GetTables, GetColumns in Livy thrift server #194

[LIVY-622][LIVY-623][LIVY-624][LIVY-625][Thrift]Support GetFunctions, GetSchemas, GetTables, GetColumns in Livy thrift server #194

yiheng commented Aug 8, 2019

yiheng Aug 8, 2019

yiheng Aug 8, 2019

yiheng commented Aug 8, 2019 •

edited

mgaido91 left a comment

mgaido91 Aug 10, 2019

yiheng Aug 13, 2019

mgaido91 Aug 10, 2019

yiheng Aug 13, 2019

yiheng commented Aug 13, 2019

yiheng commented Aug 13, 2019

codecov-io commented Aug 14, 2019 •

edited

jerryshao Aug 15, 2019

yiheng Aug 15, 2019

jerryshao left a comment

jerryshao commented Aug 16, 2019

mgaido91 left a comment

mgaido91 Aug 21, 2019

yiheng Aug 23, 2019

mgaido91 Aug 21, 2019

yiheng Aug 23, 2019

mgaido91 Aug 21, 2019

yiheng Aug 23, 2019 •

edited

mgaido91 Aug 21, 2019

yiheng Aug 23, 2019 •

edited

mgaido91 Aug 25, 2019

yiheng Aug 26, 2019

mgaido91 Aug 25, 2019

yiheng Aug 26, 2019

[LIVY-622][LIVY-623][LIVY-624][LIVY-625][Thrift]Support GetFunctions, GetSchemas, GetTables, GetColumns in Livy thrift server #194

[LIVY-622][LIVY-623][LIVY-624][LIVY-625][Thrift]Support GetFunctions, GetSchemas, GetTables, GetColumns in Livy thrift server #194

Conversation

yiheng commented Aug 8, 2019

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiheng commented Aug 8, 2019 • edited

mgaido91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiheng commented Aug 13, 2019

yiheng commented Aug 13, 2019

codecov-io commented Aug 14, 2019 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao left a comment

Choose a reason for hiding this comment

jerryshao commented Aug 16, 2019

mgaido91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiheng Aug 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiheng Aug 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiheng commented Aug 8, 2019 •

edited

codecov-io commented Aug 14, 2019 •

edited

yiheng Aug 23, 2019 •

edited

yiheng Aug 23, 2019 •

edited