[SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog by huaxingao · Pull Request #30473 · apache/spark

huaxingao · 2020-11-23T18:29:29Z

What changes were proposed in this pull request?

Add namespaces support in JDBC v2 Table Catalog by making JDBCTableCatalog extendsSupportsNamespaces

Why are the changes needed?

make v2 JDBC implementation complete

Does this PR introduce any user-facing change?

Yes. Add the following to JDBCTableCatalog

listNamespaces
listNamespaces(String[] namespace)
namespaceExists(String[] namespace)
loadNamespaceMetadata(String[] namespace)
createNamespace
alterNamespace
dropNamespace

How was this patch tested?

Add new docker tests

huaxingao · 2020-11-23T18:35:28Z

I have problem testing the new namespace APIs using H2: after creating a new namespace, somehow I can't see it in listNamespaces. Testing against Postgres works OK. That's why I didn't add a regular JDBC test. I added a Postgres docker test instead.

huaxingao · 2020-11-23T18:37:30Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

Seems to me the only property we can support for name space is schema comment. I don't have a good way to retrieve schema comment, so I will return an empty map for now.

SparkQA · 2020-11-23T19:43:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36168/

MaxGekk

I have problem testing the new namespace APIs using H2

Does Derby have the same issue? if not, you could put your tests in a separate test suite.

SparkQA · 2020-11-23T20:08:21Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36168/

SparkQA · 2020-11-23T21:16:33Z

Test build #131567 has finished for PR 30473 at commit ec5f279.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCTableCatalog extends TableCatalog with SupportsNamespaces with Logging

huaxingao · 2020-11-23T22:33:23Z

retest this please

SparkQA · 2020-11-24T03:39:09Z

Test build #131580 has finished for PR 30473 at commit ec5f279.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCTableCatalog extends TableCatalog with SupportsNamespaces with Logging

huaxingao · 2020-11-24T04:58:52Z

retest this please

SparkQA · 2020-11-24T07:31:02Z

Test build #131620 has finished for PR 30473 at commit ec5f279.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCTableCatalog extends TableCatalog with SupportsNamespaces with Logging

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

huaxingao · 2020-11-30T06:58:28Z

retest this please

cloud-fan · 2020-11-30T08:41:49Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

shall we check that catalogs only have one element? otherwise it's weird to see we pick the first catalog only.

cloud-fan · 2020-11-30T08:42:47Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

this seems expensive. is there a better way?

You mean it's expensive to fetch namespaces info from the underlying databases, right? We can't save the previously fetched info and reuse it, because somebody else might have created or dropped namespaces after the last fetch. I guess we have to fetch again every time we need the info?

One way is to use conn.getMetaData.getSchemas(catalog, db) to avoid listing all databases.

cloud-fan · 2020-11-30T08:47:08Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

shall we make sure it exists?

cloud-fan · 2020-11-30T08:48:08Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

CREATE NAMESPACE

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

SparkQA · 2020-11-30T09:49:55Z

Test build #131966 has finished for PR 30473 at commit ec5f279.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCTableCatalog extends TableCatalog with SupportsNamespaces with Logging

cloud-fan · 2020-12-01T12:38:04Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

can we avoid hardcoding it? SupportsNamespaces.PROP_COMMENT

cloud-fan · 2020-12-02T05:40:04Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

+  override def namespaceExists(namespace: Array[String]): Boolean = namespace match {
+    case Array(db) =>
+      withConnection { conn =>
+        val rs = conn.getMetaData.getSchemas(null, db)


does it return schemas only from the current catalog?

DB2 jdbc driver implements this to return schemas from the current catalog (the current database jdbc driver connects to). Not sure how other jdbc drivers implement this. I tested with postgres jdbc driver, it also returns schemas from the current catalog. In listTables (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala#L62), we also assume getTables with null as catalog value returns tables from the current catalog.

cloud-fan · 2020-12-02T05:41:28Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

+      case Array() =>
+        listNamespaces()
+      case Array(db) if namespaceExists(namespace) =>
+        Array()


shall we return the input namespace here?

I am following the implementation in V2SessionCatalog (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L211). According to the method definition List namespaces in a namespace, the method returns namespaces inside the input namespace, which is empty.

cloud-fan · 2020-12-02T05:42:50Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

shall we fail to match the behavior of unknown table properties?

Sure. Fixed this along with the others

cloud-fan · 2020-12-02T05:43:25Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

ditto, shall we fail?

cloud-fan · 2020-12-02T05:43:35Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

cloud-fan · 2020-12-02T05:44:08Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

and shall we fail?

I think we should fail. Fixed.
I was following the implementation in V2SessionCatalog (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L273). Do I need to change this too?

cloud-fan · 2020-12-02T05:46:00Z

...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCNamespaceTest.scala

this seems pgsql specific.

SparkQA · 2020-12-02T23:13:17Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36672/

SparkQA · 2020-12-02T23:44:23Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36672/

SparkQA · 2020-12-03T00:51:04Z

Test build #132073 has finished for PR 30473 at commit 8317cd8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-12-03T12:28:06Z

...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCNamespaceTest.scala

how about def builtinNamespaces: Seq[Seq[String]]
then in test

assert(catalog.listNamespaces() === Array(Array("foo")) ++ builtinNamespaces)

cloud-fan

LGTM except one comment

SparkQA · 2020-12-03T17:01:02Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36746/

SparkQA · 2020-12-03T17:26:14Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36746/

SparkQA · 2020-12-03T18:41:01Z

Test build #132145 has finished for PR 30473 at commit 68838c9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-12-04T07:23:35Z

thanks, merging to master!

huaxingao · 2020-12-04T07:37:45Z

Thank you!

github-actions bot added the SQL label Nov 23, 2020

huaxingao commented Nov 23, 2020

View reviewed changes

MaxGekk reviewed Nov 23, 2020

View reviewed changes

cloud-fan reviewed Nov 25, 2020

View reviewed changes

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Nov 30, 2020

View reviewed changes

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala Outdated

Copy link

Contributor

cloud-fan Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE NAMESPACE

cloud-fan reviewed Nov 30, 2020

View reviewed changes

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Dec 1, 2020

View reviewed changes

huaxingao added 3 commits December 1, 2020 15:38

[SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog

3a98a0b

address comments

6fd1362

address comments

583dcb7

huaxingao force-pushed the name_space branch from e8d8a4e to 583dcb7 Compare December 1, 2020 23:44

cloud-fan reviewed Dec 2, 2020

View reviewed changes

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala Outdated

Copy link

Contributor

cloud-fan Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, shall we fail?

cloud-fan reviewed Dec 2, 2020

View reviewed changes

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala Outdated

Copy link

Contributor

cloud-fan Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

cloud-fan reviewed Dec 2, 2020

View reviewed changes

address comments

8317cd8

cloud-fan reviewed Dec 3, 2020

View reviewed changes

cloud-fan approved these changes Dec 3, 2020

View reviewed changes

address comments

68838c9

cloud-fan closed this in 15579ba Dec 4, 2020

huaxingao deleted the name_space branch December 4, 2020 07:37

Conversation

huaxingao commented Nov 23, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

huaxingao commented Nov 23, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 23, 2020

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 23, 2020

Uh oh!

SparkQA commented Nov 23, 2020

Uh oh!

huaxingao commented Nov 23, 2020

Uh oh!

SparkQA commented Nov 24, 2020

Uh oh!

huaxingao commented Nov 24, 2020

Uh oh!

SparkQA commented Nov 24, 2020

Uh oh!

Uh oh!

huaxingao commented Nov 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Nov 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

cloud-fan Dec 1, 2020 •

edited

Loading