Skip to content

[SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization#40457

Closed
yaooqinn wants to merge 2 commits intoapache:masterfrom
yaooqinn:SPARK-42823
Closed

[SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization#40457
yaooqinn wants to merge 2 commits intoapache:masterfrom
yaooqinn:SPARK-42823

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented Mar 16, 2023

What changes were proposed in this pull request?

Currently, we only support initializing spark-sql shell with a single-part schema, which also must be forced to the session catalog.

case 1, specifying catalog field for v1sessioncatalog

bin/spark-sql --database spark_catalog.default

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'spark_catalog.default' not found

case 2, setting the default catalog to another one

bin/spark-sql -c spark.sql.defaultCatalog=testcat -c spark.sql.catalog.testcat=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog -c spark.sql.catalog.testcat.url='jdbc:derby:memory:testcat;create=true' -c spark.sql.catalog.testcat.driver=org.apache.derby.jdbc.AutoloadedDriver -c spark.sql.catalogImplementation=in-memory  --database SYS
23/03/16 18:40:49 WARN ObjectStore: Failed to get database sys, returning NoSuchObjectException
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'sys' not found

In this PR, we switch to use-statement to support multipart namespaces, which helps us resovle
to catalog correctly.

Why are the changes needed?

Make spark-sql shell better support the v2 catalog framework.

Does this PR introduce any user-facing change?

Yes, --database option supports multipart namespaces and works for v2 catalogs now. And you will see this behavior on spark web ui.

How was this patch tested?

new ut

@github-actions github-actions bot added the SQL label Mar 16, 2023
@yaooqinn
Copy link
Copy Markdown
Member Author

cc @HyukjinKwon @cloud-fan @dongjoon-hyun, thanks

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you for pinging me, @yaooqinn .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization [SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization Mar 17, 2023
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. It looks like worthy to bring to Apache Spark 3.4.0.
Merged to master/3.4.

dongjoon-hyun pushed a commit that referenced this pull request Mar 17, 2023
…r initialization

### What changes were proposed in this pull request?

Currently, we only support initializing spark-sql shell with a single-part schema, which also must be forced to the session catalog.

#### case 1, specifying catalog field for v1sessioncatalog
```sql
bin/spark-sql --database spark_catalog.default

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'spark_catalog.default' not found
```

#### case 2, setting the default catalog to another one

```sql
bin/spark-sql -c spark.sql.defaultCatalog=testcat -c spark.sql.catalog.testcat=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog -c spark.sql.catalog.testcat.url='jdbc:derby:memory:testcat;create=true' -c spark.sql.catalog.testcat.driver=org.apache.derby.jdbc.AutoloadedDriver -c spark.sql.catalogImplementation=in-memory  --database SYS
23/03/16 18:40:49 WARN ObjectStore: Failed to get database sys, returning NoSuchObjectException
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'sys' not found

```
In this PR, we switch to use-statement to support multipart namespaces, which helps us resovle
to catalog correctly.

### Why are the changes needed?

Make spark-sql shell better support the v2 catalog framework.

### Does this PR introduce _any_ user-facing change?

Yes, `--database` option supports multipart namespaces and works for v2 catalogs now. And you will see this behavior on spark web ui.

### How was this patch tested?

new ut

Closes #40457 from yaooqinn/SPARK-42823.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2000d5f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@yaooqinn yaooqinn deleted the SPARK-42823 branch March 17, 2023 03:31
@yaooqinn
Copy link
Copy Markdown
Member Author

thank you @dongjoon-hyun

@dongjoon-hyun
Copy link
Copy Markdown
Member

You're welcome!

@cloud-fan
Copy link
Copy Markdown
Contributor

late LGTM

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…r initialization

### What changes were proposed in this pull request?

Currently, we only support initializing spark-sql shell with a single-part schema, which also must be forced to the session catalog.

#### case 1, specifying catalog field for v1sessioncatalog
```sql
bin/spark-sql --database spark_catalog.default

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'spark_catalog.default' not found
```

#### case 2, setting the default catalog to another one

```sql
bin/spark-sql -c spark.sql.defaultCatalog=testcat -c spark.sql.catalog.testcat=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog -c spark.sql.catalog.testcat.url='jdbc:derby:memory:testcat;create=true' -c spark.sql.catalog.testcat.driver=org.apache.derby.jdbc.AutoloadedDriver -c spark.sql.catalogImplementation=in-memory  --database SYS
23/03/16 18:40:49 WARN ObjectStore: Failed to get database sys, returning NoSuchObjectException
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'sys' not found

```
In this PR, we switch to use-statement to support multipart namespaces, which helps us resovle
to catalog correctly.

### Why are the changes needed?

Make spark-sql shell better support the v2 catalog framework.

### Does this PR introduce _any_ user-facing change?

Yes, `--database` option supports multipart namespaces and works for v2 catalogs now. And you will see this behavior on spark web ui.

### How was this patch tested?

new ut

Closes apache#40457 from yaooqinn/SPARK-42823.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2000d5f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants