Skip to content

[SPARK-14879] [SQL] Move CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect to sql/core#12645

Closed
yhuai wants to merge 4 commits intoapache:masterfrom
yhuai:moveCreateDataSource
Closed

[SPARK-14879] [SQL] Move CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect to sql/core#12645
yhuai wants to merge 4 commits intoapache:masterfrom
yhuai:moveCreateDataSource

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Apr 23, 2016

What changes were proposed in this pull request?

CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect are not Hive-specific. So, this PR moves them from sql/hive to sql/core. Also, I am adding Command suffix to these two classes.

How was this patch tested?

Existing tests.

matcher.matches()
}

def createDataSourceTable(
Copy link
Contributor Author

@yhuai yhuai Apr 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is mainly copied from HiveMetastoreCatalog. I removed val QualifiedTableName(dbName, tblName) = getQualifiedTableName(tableIdent) and make it call SessionCatalog's createTable method.

@SparkQA
Copy link

SparkQA commented Apr 23, 2016

Test build #56815 has finished for PR 12645 at commit 95b61bc.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56816 has finished for PR 12645 at commit 2e18c61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56818 has finished for PR 12645 at commit 9053334.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

extends RunnableCommand {

override def run(sqlContext: SQLContext): Seq[Row] = {
// Since we are saving metadata to metastore, we need to check if metastore supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metastore -> catalog; metastore is very hive specific

@rxin
Copy link
Contributor

rxin commented Apr 24, 2016

Most of the problems I pointed out also existed in the old code, so feel free to merge this one and submit a follow-up pr to address them.

@yhuai
Copy link
Contributor Author

yhuai commented Apr 24, 2016

OK. Thanks. Will send out a follow-up pr.

@asfgit asfgit closed this in 1672149 Apr 24, 2016
c.tableIdent, c.provider, c.partitionColumns, c.mode, c.options, c.child)
ExecutedCommandExec(cmd) :: Nil

case c: CreateTableUsingAsSelect =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the source table is a Hive Table, we still need this rule. Right? Should I submit a PR?

Copy link
Contributor Author

@yhuai yhuai Jul 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for Hive tables. I think we still have a special rule just for hive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saveAsTable API in DataFrameWriter always generates CreateTableUsingAsSelect. The JIRA https://issues.apache.org/jira/browse/SPARK-16789# might be related to this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it's true. saveAsTable was only added for data source tables because we do not have a way to specify a hive format in DataFrameWriter. I think this change should be part of the work of unifying these two kinds of table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will refractor the write path and Hive in 2.1. The problem can be eventually resolved in 2.1.

Should we try to fix it in 2.0.1? I am afraid we might see more JIRAs opened by 1.x users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we change it to insertInto, it works.

    df.write.insertInto("sample.sample")

Let me try to use saveAsTable in Spark 1.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-----+
|key|value|
+---+-----+
|  1|  abc|
|  1|  abc|
+---+-----+

In Spark 1.6, it works, but Spark 2.0 does not work. The error message from Spark 2.0 is

scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation sample, sample
 is not supported.;

It sounds like we need to fix it in 2.0.1. Is my understanding right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.6 works because it internally uses insertInto. But, if we change it back it will break the semantic of saveAsTable (this method uses by-name resolution instead of using by-position resolution used by insertInto).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just document it as a known issue and ask users to use insertInto.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the error message? So far, the error message is misleading. Then, users at least can know what they should do next.

What is the best position for documenting this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants