Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48851][SQL] Change the value of SCHEMA_NOT_FOUND from namespace to catalog.namespace #47276

Closed
wants to merge 13 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jul 10, 2024

What changes were proposed in this pull request?

The pr aims to change the value of SCHEMA_NOT_FOUND from namespace to catalog.namespace.

Why are the changes needed?

As discussing #47038 (comment), we should provide more friendly and clear prompt error message.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Update existed UT & Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -52,6 +52,13 @@ class NoSuchDatabaseException private(
messageParameters = Map("schemaName" -> quoteIdentifier(db)),
cause = None)
}

def this(db: Seq[String]) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def this(db: Seq[String]) = {
def this(namespace: Seq[String]) = {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use NoSuchNamespaceException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@@ -71,7 +71,9 @@ class SessionCatalog(
functionExpressionBuilder: FunctionExpressionBuilder,
cacheSize: Int = SQLConf.get.tableRelationCacheSize,
cacheTTL: Long = SQLConf.get.metadataCacheTTL,
defaultDatabase: String = SQLConf.get.defaultDatabase) extends SQLConfHelper with Logging {
defaultDatabase: String = SQLConf.get.defaultDatabase,
var catalogName: Option[String] = Some(CatalogManager.SESSION_CATALOG_NAME))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? It can only be SESSION_CATALOG_NAME

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Give a simple example, eg:
package org.apache.spark.sql.connector

import java.util

import scala.collection.mutable
import scala.jdk.CollectionConverters._

import org.apache.spark.sql.catalyst.SQLConfHelper
import org.apache.spark.sql.catalyst.catalog.{CatalogDatabase, SessionCatalog}
import org.apache.spark.sql.connector.catalog.{CatalogPlugin, NamespaceChange, SupportsNamespaces}
import org.apache.spark.sql.util.CaseInsensitiveStringMap

class BaiduSessionCatalog extends SupportsNamespaces with SQLConfHelper with CatalogPlugin {

  var sparkSessionCatalog: SessionCatalog = _

  override def initialize(name: String, options: CaseInsensitiveStringMap): Unit = {}

  override def name(): String = "baidu"

  override def listNamespaces(): Array[Array[String]] = {
    Array.empty
  }

  override def listNamespaces(namespace: Array[String]): Array[Array[String]] = {
    Array.empty
  }

  override def loadNamespaceMetadata(namespace: Array[String]): util.Map[String, String] = {
    new util.HashMap[String, String]()
  }

  override def createNamespace(namespace: Array[String], metadata: util.Map[String, String]): Unit = {
    println("baidu catalog createNamespace ...")
  }

  override def alterNamespace(namespace: Array[String], changes: NamespaceChange*): Unit = {
    sparkSessionCatalog.catalogName = Some(name())
    namespace match {
      case Array(db) =>
        val metadata = sparkSessionCatalog.getDatabaseMetadata(db).toMetadata
      case _ =>
    }
  }

  override def dropNamespace(namespace: Array[String], cascade: Boolean): Boolean = {
    true
  }

  private implicit class CatalogDatabaseHelper(catalogDatabase: CatalogDatabase) {
    def toMetadata: util.Map[String, String] = {
      val metadata = mutable.HashMap[String, String]()

      catalogDatabase.properties.foreach {
        case (key, value) => metadata.put(key, value)
      }
      metadata.put(SupportsNamespaces.PROP_LOCATION, catalogDatabase.locationUri.toString)
      metadata.put(SupportsNamespaces.PROP_COMMENT, catalogDatabase.description)

      metadata.asJava
    }
  }
}
  • Test code as following:
package org.apache.spark.sql.connector

import org.scalatest.BeforeAndAfter

import org.apache.spark.sql.QueryTest
import org.apache.spark.sql.connector.catalog.SupportsNamespaces
import org.apache.spark.sql.test.SharedSparkSession

class BaiduSessionCatalogSuite
  extends QueryTest
  with SharedSparkSession
  with BeforeAndAfter {

  protected val catalogClassName: String = classOf[BaiduSessionCatalog].getName
  private lazy val baiduCatalog = spark.sessionState.catalogManager.catalog("baidu")

  before {
    spark.conf.set("spark.sql.catalog.baidu", catalogClassName)
    baiduCatalog.asInstanceOf[BaiduSessionCatalog].sparkSessionCatalog = spark.sessionState.catalog
  }

  test("alter namespace") {
    baiduCatalog.asInstanceOf[SupportsNamespaces].alterNamespace(Array("ns"), null)
    println("alter namespace successfully.")
  }
}
  • If we use hardcode SESSION_CATALOG_NAME , the error message above will always be.
[SCHEMA_NOT_FOUND] The schema `spark_catalog`.`ns` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS. SQLSTATE: 42704
image
  • Otherwise, it will display
    image

  • Do we need to consider this scenario? @cloud-fan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hardcoded SESSION_CATALOG_NAME first and ignored the above scenario.
I will add it when needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very hacky to introduce custom catalogs with the SessionCatalog class... SessionCatalog should always point to the HMS.

@panbingkun panbingkun changed the title [WIP][SPARK-48851][SQL] Attach full name for SCHEMA_NOT_FOUND [WIP][SPARK-48851][SQL] Change the value of SCHEMA_NOT_FOUND from namespace to catalog.namespace Jul 10, 2024
@panbingkun panbingkun changed the title [WIP][SPARK-48851][SQL] Change the value of SCHEMA_NOT_FOUND from namespace to catalog.namespace [SPARK-48851][SQL] Change the value of SCHEMA_NOT_FOUND from namespace to catalog.namespace Jul 10, 2024
@panbingkun panbingkun marked this pull request as ready for review July 10, 2024 09:28
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM too (Pending CIs).

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b4e3c2a Jul 11, 2024
@panbingkun
Copy link
Contributor Author

thanks, merging to master!

Thank you for your review.
I will fix similar scenarios (when table not exist), by adding catalog name for the table to make it more consistent and reasonable.

private def requireTableExists(name: TableIdentifier): Unit = {
if (!tableExists(name)) {
throw new NoSuchTableException(db = name.database.get, table = name.table)
}
}
private def requireTableNotExists(name: TableIdentifier): Unit = {
if (tableExists(name)) {
throw new TableAlreadyExistsException(db = name.database.get, table = name.table)
}
}

biruktesf-db pushed a commit to biruktesf-db/spark that referenced this pull request Jul 11, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@zhengruifeng
Copy link
Contributor

zhengruifeng commented Jul 22, 2024

This change is not only a user-facing change, it actually breaks the exception handling in external catalogs like:

try {

} catch {
  case e: NoSuchDatabaseException => // ignore or handle

}

I am going to restore this in #47433

cloud-fan pushed a commit that referenced this pull request Jul 22, 2024
…baseException` to restore the exception handling

### What changes were proposed in this pull request?
Make `NoSuchNamespaceException` extend `NoSuchNamespaceException`

### Why are the changes needed?
1, #47276 made many SQL commands throw `NoSuchNamespaceException` instead of `NoSuchDatabaseException`, it is more then an end-user facing change, it is a breaking change which break the exception handling in 3-rd libraries in the eco-system.

2, `NoSuchNamespaceException` and `NoSuchDatabaseException` actually share the same error class `SCHEMA_NOT_FOUND`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47433 from zhengruifeng/make_nons_nodb.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…baseException` to restore the exception handling

### What changes were proposed in this pull request?
Make `NoSuchNamespaceException` extend `NoSuchNamespaceException`

### Why are the changes needed?
1, apache#47276 made many SQL commands throw `NoSuchNamespaceException` instead of `NoSuchDatabaseException`, it is more then an end-user facing change, it is a breaking change which break the exception handling in 3-rd libraries in the eco-system.

2, `NoSuchNamespaceException` and `NoSuchDatabaseException` actually share the same error class `SCHEMA_NOT_FOUND`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47433 from zhengruifeng/make_nons_nodb.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…baseException` to restore the exception handling

### What changes were proposed in this pull request?
Make `NoSuchNamespaceException` extend `NoSuchNamespaceException`

### Why are the changes needed?
1, apache#47276 made many SQL commands throw `NoSuchNamespaceException` instead of `NoSuchDatabaseException`, it is more then an end-user facing change, it is a breaking change which break the exception handling in 3-rd libraries in the eco-system.

2, `NoSuchNamespaceException` and `NoSuchDatabaseException` actually share the same error class `SCHEMA_NOT_FOUND`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47433 from zhengruifeng/make_nons_nodb.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…baseException` to restore the exception handling

### What changes were proposed in this pull request?
Make `NoSuchNamespaceException` extend `NoSuchNamespaceException`

### Why are the changes needed?
1, apache#47276 made many SQL commands throw `NoSuchNamespaceException` instead of `NoSuchDatabaseException`, it is more then an end-user facing change, it is a breaking change which break the exception handling in 3-rd libraries in the eco-system.

2, `NoSuchNamespaceException` and `NoSuchDatabaseException` actually share the same error class `SCHEMA_NOT_FOUND`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47433 from zhengruifeng/make_nons_nodb.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants