Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #4218] Prefer to use TableCatalog to load tables from session catalog #4219

Closed

Conversation

nousot-cloud-guy
Copy link
Contributor

@nousot-cloud-guy nousot-cloud-guy commented Jan 31, 2023

Why are the changes needed?

To close #4218 .
This change ensures BI tools can list columns on Delta Lake tables in all schemas.

image

image

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@@ -141,7 +141,7 @@ class CatalogShim_v2_4 extends SparkCatalogShim {
catalog.getTablesByName(identifiers).flatMap { t =>
val tableSchema =
if (t.provider.getOrElse("").equalsIgnoreCase("delta")) {
spark.table(t.identifier.table).schema
spark.table(f"${db}.${t.identifier.table}").schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use t.identifier.toString here?

@pan3793
Copy link
Member

pan3793 commented Feb 1, 2023

Thanks for investigating and making this PR, could you please verify the case when the db or table name contains dots?

@pan3793
Copy link
Member

pan3793 commented Feb 1, 2023

After second thought, I think we can totally remove the current workaround(revert #1476), and just invert the pattern match branch order in CatalogShim_v3_0#getColumnsByCatalog

- catalog match {
-   case builtin if builtin.name() == SESSION_CATALOG =>
-     super.getColumnsByCatalog(...)
-   case tc: TableCatalog =>
-     ...
- }

+ catalog match {
+   case tc: TableCatalog =>
+     ...
+   case builtin if builtin.name() == SESSION_CATALOG =>
+     super.getColumnsByCatalog(...)
+ }

Would you please try this way? @nousot-cloud-guy

@codecov-commenter
Copy link

codecov-commenter commented Feb 1, 2023

Codecov Report

Merging #4219 (5698432) into master (7c036fb) will increase coverage by 0.43%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master    #4219      +/-   ##
============================================
+ Coverage     53.07%   53.51%   +0.43%     
  Complexity       13       13              
============================================
  Files           548      560      +12     
  Lines         30010    30410     +400     
  Branches       4035     4082      +47     
============================================
+ Hits          15929    16274     +345     
- Misses        12599    12634      +35     
- Partials       1482     1502      +20     
Impacted Files Coverage Δ
...he/kyuubi/engine/spark/shim/CatalogShim_v2_4.scala 64.06% <0.00%> (-7.58%) ⬇️
...he/kyuubi/engine/spark/shim/CatalogShim_v3_0.scala 80.43% <0.00%> (-6.53%) ⬇️
...uthentication/LdapAuthenticationProviderImpl.scala 78.04% <0.00%> (-13.96%) ⬇️
...scala/org/apache/kyuubi/service/ServiceUtils.scala 88.88% <0.00%> (-11.12%) ⬇️
.../apache/kyuubi/server/api/v1/BatchesResource.scala 70.09% <0.00%> (-2.81%) ⬇️
...g/apache/kyuubi/operation/BatchJobSubmission.scala 75.27% <0.00%> (-1.70%) ⬇️
...apache/kyuubi/service/TBinaryFrontendService.scala 48.38% <0.00%> (-1.08%) ⬇️
...ommon/src/main/scala/org/apache/kyuubi/Utils.scala 73.00% <0.00%> (-0.94%) ⬇️
...mon/src/main/scala/org/apache/kyuubi/Logging.scala 42.50% <0.00%> (-0.93%) ⬇️
... and 31 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -141,7 +141,7 @@ class CatalogShim_v2_4 extends SparkCatalogShim {
catalog.getTablesByName(identifiers).flatMap { t =>
val tableSchema =
if (t.provider.getOrElse("").equalsIgnoreCase("delta")) {
spark.table(t.identifier.table).schema
spark.table(f"${db}.${t.identifier.table}").schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spark.table(f"${db}.${t.identifier.table}").schema
spark.table(t.identifier.toString).schema

@nousot-cloud-guy
Copy link
Contributor Author

That seems to be working. I've updated the PR with the reverts and new commit.

@pan3793 pan3793 closed this in 2b958c6 Feb 6, 2023
pan3793 pushed a commit that referenced this pull request Feb 6, 2023
### _Why are the changes needed?_

To close #4218 .
This change ensures BI tools can list columns on Delta Lake tables in all schemas.

<img width="312" alt="image" src="https://user-images.githubusercontent.com/89149767/215793967-722eb5f9-ffe4-4ffb-b7f9-1ded06c146d7.png">

<img width="725" alt="image" src="https://user-images.githubusercontent.com/89149767/215794036-871f005f-1494-487d-90aa-1f99891177c2.png">

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4219 from nousot-cloud-guy/feature/delta-db-schema.

Closes #4218

5698432 [Alex Wiss-Wolferding] Reversing match order in getColumnsByCatalog.
a6d973a [Alex Wiss-Wolferding] Revert "[KYUUBI #1458] Delta lake table columns won't show up in DBeaver."
20337dc [Alex Wiss-Wolferding] Revert "Using DB and table name when checking Delta table schema."
f7e4675 [Alex Wiss-Wolferding] Using DB and table name when checking Delta table schema.

Authored-by: Alex Wiss-Wolferding <alex@nousot.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 2b958c6)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793
Copy link
Member

pan3793 commented Feb 6, 2023

Merged to master/1.6, thanks @nousot-cloud-guy

@pan3793 pan3793 added this to the v1.6.2 milestone Feb 6, 2023
@pan3793 pan3793 changed the title [KYUUBI #4218] Using DB and table name when checking Delta table schema. [KYUUBI #4218] Prefer to use TableCatalog to load tables from session catalog Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Delta Lake columns only show in DBeaver/Power BI/etc. in the "default" database
4 participants