[SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog#30473
[SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog#30473huaxingao wants to merge 5 commits intoapache:masterfrom
Conversation
|
I have problem testing the new namespace APIs using H2: after creating a new namespace, somehow I can't see it in listNamespaces. Testing against Postgres works OK. That's why I didn't add a regular JDBC test. I added a Postgres docker test instead. |
There was a problem hiding this comment.
Seems to me the only property we can support for name space is schema comment. I don't have a good way to retrieve schema comment, so I will return an empty map for now.
|
Kubernetes integration test starting |
MaxGekk
left a comment
There was a problem hiding this comment.
I have problem testing the new namespace APIs using H2
Does Derby have the same issue? if not, you could put your tests in a separate test suite.
|
Kubernetes integration test status success |
|
Test build #131567 has finished for PR 30473 at commit
|
|
retest this please |
|
Test build #131580 has finished for PR 30473 at commit
|
|
retest this please |
|
Test build #131620 has finished for PR 30473 at commit
|
...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
Outdated
Show resolved
Hide resolved
|
retest this please |
There was a problem hiding this comment.
shall we check that catalogs only have one element? otherwise it's weird to see we pick the first catalog only.
There was a problem hiding this comment.
this seems expensive. is there a better way?
There was a problem hiding this comment.
You mean it's expensive to fetch namespaces info from the underlying databases, right? We can't save the previously fetched info and reuse it, because somebody else might have created or dropped namespaces after the last fetch. I guess we have to fetch again every time we need the info?
There was a problem hiding this comment.
One way is to use conn.getMetaData.getSchemas(catalog, db) to avoid listing all databases.
There was a problem hiding this comment.
shall we make sure it exists?
...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
Outdated
Show resolved
Hide resolved
|
Test build #131966 has finished for PR 30473 at commit
|
There was a problem hiding this comment.
can we avoid hardcoding it? SupportsNamespaces.PROP_COMMENT
| override def namespaceExists(namespace: Array[String]): Boolean = namespace match { | ||
| case Array(db) => | ||
| withConnection { conn => | ||
| val rs = conn.getMetaData.getSchemas(null, db) |
There was a problem hiding this comment.
does it return schemas only from the current catalog?
There was a problem hiding this comment.
DB2 jdbc driver implements this to return schemas from the current catalog (the current database jdbc driver connects to). Not sure how other jdbc drivers implement this. I tested with postgres jdbc driver, it also returns schemas from the current catalog. In listTables (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala#L62), we also assume getTables with null as catalog value returns tables from the current catalog.
| case Array() => | ||
| listNamespaces() | ||
| case Array(db) if namespaceExists(namespace) => | ||
| Array() |
There was a problem hiding this comment.
shall we return the input namespace here?
There was a problem hiding this comment.
I am following the implementation in V2SessionCatalog (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L211). According to the method definition List namespaces in a namespace, the method returns namespaces inside the input namespace, which is empty.
There was a problem hiding this comment.
shall we fail to match the behavior of unknown table properties?
There was a problem hiding this comment.
Sure. Fixed this along with the others
There was a problem hiding this comment.
I think we should fail. Fixed.
I was following the implementation in V2SessionCatalog (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L273). Do I need to change this too?
There was a problem hiding this comment.
this seems pgsql specific.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #132073 has finished for PR 30473 at commit
|
There was a problem hiding this comment.
how about def builtinNamespaces: Seq[Seq[String]]
then in test
assert(catalog.listNamespaces() === Array(Array("foo")) ++ builtinNamespaces)
cloud-fan
left a comment
There was a problem hiding this comment.
LGTM except one comment
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132145 has finished for PR 30473 at commit
|
|
thanks, merging to master! |
|
Thank you! |
What changes were proposed in this pull request?
Add namespaces support in JDBC v2 Table Catalog by making
JDBCTableCatalogextendsSupportsNamespacesWhy are the changes needed?
make v2 JDBC implementation complete
Does this PR introduce any user-facing change?
Yes. Add the following to
JDBCTableCatalogHow was this patch tested?
Add new docker tests