-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External hive support in SnappySession #1220
Conversation
The koloboke project has been dead and unmaintained for a couple of years now so replaced with eclipse collections though latter are a bit slower for some operations and also add significant bulk (~10M).
- also added implicit retry for catalog stale exception in queries - invalidate entire cache of connector for a create/drop/alter since the version stored for other relations in RelationInfo will also certainly be stale
allow for absence of baseTable in external catalog table drop since it can be a temporary table
…ng HiveStrategies
Allow for "gemfire" data source to make a catalog entry during create table execution in its createRelation itself. It needs the creation to add new parameters to the options bag. Fixed dependent handling to avoid duplicates.
…eparate files on this branch
Instead of changing the sessionState/sharedState inside SnappySession, switch the existing active session to SparkSession. This also fixes failure in InsertIntoHiveTable which was due to state inside SnappySession having being switched back when makeCopy of that plan is invoked.
Porting the hive test suites from Spark to use SnappySession with external hive enabled. Hive compatible DDL support as present in SparkSession has also been added in this PR to SnappySession. An additional property has been added to use hive provider as default when no provider has been provided in a CREATE TABLE (which is ROW table otherwise). The convention in CREATE TABLE is to use external hive catalog for hive provider and in-built catalog otherwise, so all of hive suites from Spark should work as is. |
instead rename hiveCompatible to snappydata.sql.hive.compatibility and use a tri-state (default, spark, hive) to denote what level of compatibility to use. Specifically the spark and hive levels will use 'hive' as the default provider.
if spark.sql.sources.default is explicitly set then use the same in SQL parser with default as 'row' like before
instead honour Spark's "spark.sql.catalogImplementation" itself to make the configuration identical to Spark with the difference that "hive" implementation in SnappySession actually refers to the union of builtin and external catalogs fixed few precheckin failures
…tion due to enableHiveSupport getting set
ExpressionSQLBuilderSuite -> SnappyExpressionSQLBuilderSuite
…wise not allowed by Spark other fixes and cleanups
if hive-specific extensions are present in CREATE TABLE then always assume the provider to be "hive" and pass to Spark parser
make the behaviour of "drop schema" and "drop database" as identical to drop from both builtin and external catalog since "create schema" is identical to "create database"
also cleaned up current schema/database setup
also improved CommandLineToolsSuite to not print failed output to screen
Changes proposed in this pull request
This adds support for the two components of Spark's hive session:
and strategies for such hive managed tables
"CREATE TABLE ... USING hive" is allowed that explicitly specifies the table to use hive provider.
There are two user-level properties:
to the builtin catalog when the value is set to "hive". Note that first builtin catalog is used and then
the external one, so in case of name clashes, the builtin one is given preference. For writes,
all tables using "hive" as the provider will use the external hive metastore while rest use builtin.
then the default behaviour of "create table ..." without any USING provider and any Hive DDL
extensions will change to create a hive table instead of a row table.
A lazily instantiated instance of Hive-enabled SparkSession is kept inside SnappySessionState
which gets referred if the "spark.sql.catalogImplementation" is "hive" for the session.
For 1), the list/get/create methods in SnappySessionCatalog have been overridden to read/write to
the hive catalog after the snappy catalog if hive support is enabled on the session.
For 2), wrapper Rule/Strategy classes have been added that wrap the extra rules/strategies from
hive session and run them only if the property has been enabled on SnappySession.
The code temporarily switches to the hive-enabled SparkSession when running hive
rules/strategies some of which expect the internal sharedState/sessionState to be those of hive.
Patch testing
precheckin and manual testing
ReleaseNotes.txt changes
Documentation for the new property and what it provides for users.
Other PRs
TIBCOSoftware/snappy-store#499
TIBCOSoftware/snappy-spark#164
https://github.com/SnappyDataInc/snappy-aqp/pull/178