Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14014][SQL] Integrate session catalog (attempt #2) #11938

Closed
wants to merge 5 commits into from

Conversation

andrewor14
Copy link
Contributor

What changes were proposed in this pull request?

This reopens #11836, which was merged but promptly reverted because it introduced flaky Hive tests.

How was this patch tested?

See CatalogTestCases, SessionCatalogSuite and HiveContextSuite.

Andrew Or added 3 commits March 24, 2016 10:08
## What changes were proposed in this pull request?

`SessionCatalog`, introduced in apache#11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.

As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
- SPARK-14013: Properly implement temporary functions in `SessionCatalog`
- SPARK-13879: Decide which DDL/DML commands to support natively in Spark
- SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
- SPARK-?????: Merge SQL/HiveContext

## How was this patch tested?

This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.

Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>

Closes apache#11836 from andrewor14/use-session-catalog.
@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2679 has started for PR 11938 at commit 70530cf.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54060 has finished for PR 11938 at commit 70530cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2678 has finished for PR 11938 at commit 70530cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2680 has finished for PR 11938 at commit 70530cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2677 has finished for PR 11938 at commit 70530cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2681 has finished for PR 11938 at commit 70530cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2686 has started for PR 11938 at commit dbcdce6.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2687 has finished for PR 11938 at commit dbcdce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TestHiveSingleton extends SparkFunSuite with BeforeAndAfterAll
    • class HiveMetastoreCatalogSuite extends TestHiveSingleton

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2684 has finished for PR 11938 at commit dbcdce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TestHiveSingleton extends SparkFunSuite with BeforeAndAfterAll
    • class HiveMetastoreCatalogSuite extends TestHiveSingleton

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2685 has finished for PR 11938 at commit dbcdce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TestHiveSingleton extends SparkFunSuite with BeforeAndAfterAll
    • class HiveMetastoreCatalogSuite extends TestHiveSingleton

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #2683 has finished for PR 11938 at commit dbcdce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TestHiveSingleton extends SparkFunSuite with BeforeAndAfterAll
    • class HiveMetastoreCatalogSuite extends TestHiveSingleton

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54095 has finished for PR 11938 at commit dbcdce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TestHiveSingleton extends SparkFunSuite with BeforeAndAfterAll
    • class HiveMetastoreCatalogSuite extends TestHiveSingleton

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #2690 has started for PR 11938 at commit 94c5dbe.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #2689 has finished for PR 11938 at commit dbcdce6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TestHiveSingleton extends SparkFunSuite with BeforeAndAfterAll
    • class HiveMetastoreCatalogSuite extends TestHiveSingleton

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #2691 has finished for PR 11938 at commit 94c5dbe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54126 has finished for PR 11938 at commit 94c5dbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #2692 has finished for PR 11938 at commit 94c5dbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #2693 has finished for PR 11938 at commit 94c5dbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

Test failure was not related. The latest commit passed tests 4 times in a row so I'm going to merge this.

@asfgit asfgit closed this in 20ddf5f Mar 25, 2016
@rxin
Copy link
Contributor

rxin commented Mar 25, 2016

@andrewor14 in the future we should make the pr description the same as the previous one, with your latest addendum. Otherwise it is very difficult to track once it is committed.

private[this] var currentDb = "default"
protected[this] var currentDb = {
val defaultName = "default"
val defaultDbDefinition = CatalogDatabase(defaultName, "default database", "", Map())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @andrewor14 I have a very late comment. For a default database, the locationUri is an empty string? Is that allowed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this error when trying to create such a database:

org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Hive case the default database should already exist in the metastore, so this won't do anything, right? This is mainly for the in-memory case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currentDb is also shared by HiveSessionCatalog. I think we could just override this variable in HiveSessionCatalog and directly return the defaultName = "default" without attempting to create a database:

https://github.com/apache/spark/pull/11938/files#diff-b3f9800839b9b9a1df9da9cbfc01adf8R53

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HiveSessionCatalog, we can get the default path by:

  override def getDefaultPath: String = client.getConf(HiveConf.ConfVars.METASTOREWAREHOUSE.varname,
    HiveConf.ConfVars.METASTOREWAREHOUSE.defaultStrVal)

It might not be useful in this case. I am just trying to implement the CreateDatabase DDL.

@andrewor14 andrewor14 deleted the session-catalog-again branch March 25, 2016 08:21
@@ -71,7 +71,8 @@ case class DropTable(
}
hiveContext.invalidateTable(tableName)
hiveContext.runSqlHive(s"DROP TABLE $ifExistsClause$tableName")
hiveContext.sessionState.catalog.unregisterTable(TableIdentifier(tableName))
hiveContext.sessionState.catalog.dropTable(
TableIdentifier(tableName), ignoreIfNotExists = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewor14 @yhuai Sorry, I have a question about this.

When doing Create/Alter/Drop Database, the changes in my PR affect both SQLContext and HiveContext. Basically, we only call the related APIs of sessionState.catalog.

After reading this, I am wondering if we also need to call runSqlHive for "Create/Alter/Drop Database"? Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, SessionCatalog calls HiveCatalog internally so the state will be stored in the metastore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants