[SPARK-14879] [SQL] Move CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect to sql/core by yhuai · Pull Request #12645 · apache/spark

yhuai · 2016-04-23T22:24:55Z

What changes were proposed in this pull request?

CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect are not Hive-specific. So, this PR moves them from sql/hive to sql/core. Also, I am adding Command suffix to these two classes.

How was this patch tested?

Existing tests.

…ore.

yhuai · 2016-04-23T22:25:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

+    matcher.matches()
+  }
+
+  def createDataSourceTable(


This method is mainly copied from HiveMetastoreCatalog. I removed val QualifiedTableName(dbName, tblName) = getQualifiedTableName(tableIdent) and make it call SessionCatalog's createTable method.

SparkQA · 2016-04-23T22:30:30Z

Test build #56815 has finished for PR 12645 at commit 95b61bc.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-24T00:07:56Z

Test build #56816 has finished for PR 12645 at commit 2e18c61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-24T00:21:16Z

Test build #56818 has finished for PR 12645 at commit 9053334.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-04-24T04:43:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

+  extends RunnableCommand {
+
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+    // Since we are saving metadata to metastore, we need to check if metastore supports


metastore -> catalog; metastore is very hive specific

rxin · 2016-04-24T05:15:11Z

Most of the problems I pointed out also existed in the old code, so feel free to merge this one and submit a follow-up pr to address them.

yhuai · 2016-04-24T05:27:09Z

OK. Thanks. Will send out a follow-up pr.

gatorsmile · 2016-07-29T16:23:12Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala

-          c.tableIdent, c.provider, c.partitionColumns, c.mode, c.options, c.child)
-        ExecutedCommandExec(cmd) :: Nil
-
-      case c: CreateTableUsingAsSelect =>


If the source table is a Hive Table, we still need this rule. Right? Should I submit a PR?

This is not for Hive tables. I think we still have a special rule just for hive.

saveAsTable API in DataFrameWriter always generates CreateTableUsingAsSelect. The JIRA https://issues.apache.org/jira/browse/SPARK-16789# might be related to this issue.

yea, it's true. saveAsTable was only added for data source tables because we do not have a way to specify a hive format in DataFrameWriter. I think this change should be part of the work of unifying these two kinds of table.

We will refractor the write path and Hive in 2.1. The problem can be eventually resolved in 2.1.

Should we try to fix it in 2.0.1? I am afraid we might see more JIRAs opened by 1.x users.

Yeah, if we change it to insertInto, it works.

df.write.insertInto("sample.sample")

Let me try to use saveAsTable in Spark 1.6.

scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as key, 'abc' as value") res2: org.apache.spark.sql.DataFrame = [] scala> val df = sql("select key, value as value from sample.sample") df: org.apache.spark.sql.DataFrame = [key: int, value: string] scala> df.write.mode("append").saveAsTable("sample.sample") scala> sql("select * from sample.sample").show() +---+-----+ |key|value| +---+-----+ | 1| abc| | 1| abc| +---+-----+

In Spark 1.6, it works, but Spark 2.0 does not work. The error message from Spark 2.0 is

scala> df.write.mode("append").saveAsTable("sample.sample") org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation sample, sample is not supported.;

It sounds like we need to fix it in 2.0.1. Is my understanding right?

1.6 works because it internally uses insertInto. But, if we change it back it will break the semantic of saveAsTable (this method uses by-name resolution instead of using by-position resolution used by insertInto).

I think we should just document it as a known issue and ask users to use insertInto.

How about changing the error message? So far, the error message is misleading. Then, users at least can know what they should do next.

What is the best position for documenting this?

yhuai added 2 commits April 23, 2016 15:13

Move CreateDataSourceTable and CreateDataSourceTableAsSelect to sql/c…

bd4d6f2

…ore.

Remove HiveDDLStrategy

95b61bc

yhuai reviewed Apr 23, 2016
View reviewed changes

yhuai added 2 commits April 23, 2016 15:34

Add import and finish the doc

2e18c61

Merge remote-tracking branch 'upstream/master' into moveCreateDataSource

9053334

rxin reviewed Apr 24, 2016
View reviewed changes

asfgit closed this in 1672149 Apr 24, 2016

gatorsmile reviewed Jul 29, 2016
View reviewed changes

Conversation

yhuai commented Apr 23, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

yhuai Apr 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 23, 2016

Uh oh!

SparkQA commented Apr 24, 2016

Uh oh!

SparkQA commented Apr 24, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 24, 2016

Uh oh!

yhuai commented Apr 24, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhuai Jul 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yhuai Apr 23, 2016 •

edited

Loading

yhuai Jul 29, 2016 •

edited

Loading