[SPARK-14879] [SQL] Move CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect to sql/core#12645
[SPARK-14879] [SQL] Move CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect to sql/core#12645yhuai wants to merge 4 commits intoapache:masterfrom yhuai:moveCreateDataSource
Conversation
| matcher.matches() | ||
| } | ||
|
|
||
| def createDataSourceTable( |
There was a problem hiding this comment.
This method is mainly copied from HiveMetastoreCatalog. I removed val QualifiedTableName(dbName, tblName) = getQualifiedTableName(tableIdent) and make it call SessionCatalog's createTable method.
|
Test build #56815 has finished for PR 12645 at commit
|
|
Test build #56816 has finished for PR 12645 at commit
|
|
Test build #56818 has finished for PR 12645 at commit
|
| extends RunnableCommand { | ||
|
|
||
| override def run(sqlContext: SQLContext): Seq[Row] = { | ||
| // Since we are saving metadata to metastore, we need to check if metastore supports |
There was a problem hiding this comment.
metastore -> catalog; metastore is very hive specific
|
Most of the problems I pointed out also existed in the old code, so feel free to merge this one and submit a follow-up pr to address them. |
|
OK. Thanks. Will send out a follow-up pr. |
| c.tableIdent, c.provider, c.partitionColumns, c.mode, c.options, c.child) | ||
| ExecutedCommandExec(cmd) :: Nil | ||
|
|
||
| case c: CreateTableUsingAsSelect => |
There was a problem hiding this comment.
If the source table is a Hive Table, we still need this rule. Right? Should I submit a PR?
There was a problem hiding this comment.
This is not for Hive tables. I think we still have a special rule just for hive.
There was a problem hiding this comment.
saveAsTable API in DataFrameWriter always generates CreateTableUsingAsSelect. The JIRA https://issues.apache.org/jira/browse/SPARK-16789# might be related to this issue.
There was a problem hiding this comment.
yea, it's true. saveAsTable was only added for data source tables because we do not have a way to specify a hive format in DataFrameWriter. I think this change should be part of the work of unifying these two kinds of table.
There was a problem hiding this comment.
We will refractor the write path and Hive in 2.1. The problem can be eventually resolved in 2.1.
Should we try to fix it in 2.0.1? I am afraid we might see more JIRAs opened by 1.x users.
There was a problem hiding this comment.
Yeah, if we change it to insertInto, it works.
df.write.insertInto("sample.sample")Let me try to use saveAsTable in Spark 1.6.
There was a problem hiding this comment.
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []
scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]
scala> df.write.mode("append").saveAsTable("sample.sample")
scala> sql("select * from sample.sample").show()
+---+-----+
|key|value|
+---+-----+
| 1| abc|
| 1| abc|
+---+-----+In Spark 1.6, it works, but Spark 2.0 does not work. The error message from Spark 2.0 is
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation sample, sample
is not supported.;It sounds like we need to fix it in 2.0.1. Is my understanding right?
There was a problem hiding this comment.
1.6 works because it internally uses insertInto. But, if we change it back it will break the semantic of saveAsTable (this method uses by-name resolution instead of using by-position resolution used by insertInto).
There was a problem hiding this comment.
I think we should just document it as a known issue and ask users to use insertInto.
There was a problem hiding this comment.
How about changing the error message? So far, the error message is misleading. Then, users at least can know what they should do next.
What is the best position for documenting this?
What changes were proposed in this pull request?
CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect are not Hive-specific. So, this PR moves them from sql/hive to sql/core. Also, I am adding
Commandsuffix to these two classes.How was this patch tested?
Existing tests.