[SPARK-19148][SQL] do not expose the external table concept in Catalog #16528

cloud-fan · 2017-01-10T12:36:31Z

What changes were proposed in this pull request?

In #16296 , we reached a consensus that we should hide the external/managed table concept to users and only expose custom table path.

This PR renames Catalog.createExternalTable to createTable(still keep the old versions for backward compatibility), and only set the table type to EXTERNAL if path is specified in options.

How was this patch tested?

new tests in CatalogSuite

cloud-fan · 2017-01-10T12:37:57Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

-          """
-        }.mkString
-      })
+      foldFunctions = _.map { funCall =>


minor code cleanup, not related to this PR.

How about we revert this change?

cloud-fan · 2017-01-10T12:40:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

-      case fs: HadoopFsRelation =>
-        if (table.tableType == CatalogTableType.EXTERNAL && fs.location.rootPaths.isEmpty) {
-          throw new AnalysisException(
-            "Cannot create a file-based external data source table without path")


We will never hit this branch after this PR. There is no public API to set the table type and users can only set custom table path, so we will never create an external table without path

cloud-fan · 2017-01-10T12:41:49Z

cc @yhuai @rxin @gatorsmile

SparkQA · 2017-01-10T12:55:23Z

Test build #71123 has finished for PR 16528 at commit f06ed16.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-10T15:03:48Z

Test build #71125 has finished for PR 16528 at commit b0c252a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-10T15:15:31Z

retest this please

SparkQA · 2017-01-10T17:19:59Z

Test build #71128 has finished for PR 16528 at commit b0c252a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-10T18:35:43Z

retest this please

gatorsmile · 2017-01-10T18:54:17Z

sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala

+
+  /**
+   * :: Experimental ::
+   * Creates a table from the given path and returns the corresponding DataFrame.


Maybe we can explain here to say the data files will not be dropped when dropping the table?

we already documented this in programming guide. If we wanna explain the custom path here, there are a lot of similar places we need to add comments too. So I'd like to leave it as it was.

gatorsmile · 2017-01-10T18:57:07Z

createTableHeader ('(' colTypeList ')')? tableProvider
  (OPTIONS options=tablePropertyList)?
  (PARTITIONED BY partitionColumnNames=identifierList)?
  bucketSpec? locationSpec?
  (COMMENT comment=STRING)?
  (AS? query)?                                                   #createTable

  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

If we compare the above two interfaces, the Catalog's createTable API only can create a non-partitioned table. How about adding a new createTable API for users to specify the partitioning info and bucketing info?

gatorsmile · 2017-01-10T18:58:43Z

We also need to update the Python interface. See the code

SparkQA · 2017-01-10T22:47:43Z

Test build #71143 has finished for PR 16528 at commit b0c252a.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-11T00:56:38Z

adding a new createTable API for users to specify the partitioning info and bucketing info

I'd like to do it in follow-ups

I'll update the python api too

SparkQA · 2017-01-11T05:20:17Z

Test build #71175 has finished for PR 16528 at commit f1f75ed.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-11T07:03:38Z

Test build #71191 has started for PR 16528 at commit 0d1baf1.

gatorsmile · 2017-01-11T08:15:10Z

retest this please

SparkQA · 2017-01-11T10:20:03Z

Test build #71198 has finished for PR 16528 at commit 0d1baf1.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-11T14:55:25Z

Test build #71214 has finished for PR 16528 at commit 8f4e86e.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-12T01:55:33Z

Test build #71238 has finished for PR 16528 at commit 3b59221.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-13T05:40:33Z

Test build #71292 has finished for PR 16528 at commit 2e1d378.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-13T14:13:38Z

Test build #71316 has finished for PR 16528 at commit d5b3b4f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-13T17:06:18Z

Test build #71325 has finished for PR 16528 at commit 55cf0c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-13T17:15:47Z

python/pyspark/sql/catalog.py

+        warnings.warn(
+            "createExternalTable is deprecated since Spark 2.2, please use createTable instead.",
+            DeprecationWarning)
+        return self.createTable(tableName, path, source, schema, **options)


**options -> options?

it's python syntax, like what we do in scala: func(options: _*)

Yeah. Got it. I also manually tried it in pyspark. It works fine.

gatorsmile · 2017-01-14T04:56:42Z

LGTM

gatorsmile · 2017-01-15T23:06:46Z

ok to test

SparkQA · 2017-01-16T01:35:36Z

Test build #71408 has finished for PR 16528 at commit 55cf0c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-16T02:25:55Z

cc @yhuai for final sign-off

yhuai · 2017-01-16T22:37:06Z

looks good to me. If possible, I'd like to get the code mentioned by https://github.com/apache/spark/pull/16528/files#r96314156 reverted.

SparkQA · 2017-01-17T04:10:50Z

Test build #71469 has finished for PR 16528 at commit 318dc04.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-17T04:55:28Z

thanks for the review, merging to master!

## What changes were proposed in this pull request? In apache#16296 , we reached a consensus that we should hide the external/managed table concept to users and only expose custom table path. This PR renames `Catalog.createExternalTable` to `createTable`(still keep the old versions for backward compatibility), and only set the table type to EXTERNAL if `path` is specified in options. ## How was this patch tested? new tests in `CatalogSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16528 from cloud-fan/create-table.

…t in Catalog ### What changes were proposed in this pull request? After we renames `Catalog`.`createExternalTable` to `createTable` in the PR: apache#16528, we also need to deprecate the corresponding functions in `SQLContext`. ### How was this patch tested? N/A Author: Xiao Li <gatorsmile@gmail.com> Closes apache#17502 from gatorsmile/deprecateCreateExternalTable.

cloud-fan commented Jan 10, 2017

View reviewed changes

do not expose the external table concept in Catalog

b0c252a

cloud-fan force-pushed the create-table branch from f06ed16 to b0c252a Compare January 10, 2017 13:01

gatorsmile reviewed Jan 10, 2017

View reviewed changes

cloud-fan force-pushed the create-table branch from f1f75ed to 0d1baf1 Compare January 11, 2017 06:57

cloud-fan force-pushed the create-table branch from 0d1baf1 to 8f4e86e Compare January 11, 2017 14:50

cloud-fan force-pushed the create-table branch from 8f4e86e to 3b59221 Compare January 12, 2017 01:49

cloud-fan force-pushed the create-table branch from 3b59221 to 2e1d378 Compare January 13, 2017 05:33

cloud-fan force-pushed the create-table branch from 2e1d378 to d5b3b4f Compare January 13, 2017 11:39

fix python api

55cf0c3

cloud-fan force-pushed the create-table branch from d5b3b4f to 55cf0c3 Compare January 13, 2017 14:31

gatorsmile reviewed Jan 13, 2017

View reviewed changes

revert useless changes

318dc04

asfgit closed this in 18ee55d Jan 17, 2017

This was referenced Mar 30, 2017

[SPARK-20159][SPARKR][SQL] Support all catalog API in R #17483

Closed

[SPARK-19148][SQL][follow-up] do not expose the external table concept in Catalog #17502

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19148][SQL] do not expose the external table concept in Catalog #16528

[SPARK-19148][SQL] do not expose the external table concept in Catalog #16528

cloud-fan commented Jan 10, 2017

cloud-fan Jan 10, 2017

yhuai Jan 16, 2017

cloud-fan Jan 10, 2017

cloud-fan commented Jan 10, 2017

SparkQA commented Jan 10, 2017

SparkQA commented Jan 10, 2017

cloud-fan commented Jan 10, 2017

SparkQA commented Jan 10, 2017

gatorsmile commented Jan 10, 2017

gatorsmile Jan 10, 2017

cloud-fan Jan 11, 2017

gatorsmile commented Jan 10, 2017

gatorsmile commented Jan 10, 2017

SparkQA commented Jan 10, 2017

cloud-fan commented Jan 11, 2017

SparkQA commented Jan 11, 2017

SparkQA commented Jan 11, 2017

gatorsmile commented Jan 11, 2017

SparkQA commented Jan 11, 2017

SparkQA commented Jan 11, 2017

SparkQA commented Jan 12, 2017

SparkQA commented Jan 13, 2017

SparkQA commented Jan 13, 2017

SparkQA commented Jan 13, 2017

gatorsmile Jan 13, 2017

cloud-fan Jan 14, 2017

gatorsmile Jan 14, 2017

gatorsmile commented Jan 14, 2017

gatorsmile commented Jan 15, 2017

SparkQA commented Jan 16, 2017

cloud-fan commented Jan 16, 2017

yhuai commented Jan 16, 2017 •

edited

Loading

SparkQA commented Jan 17, 2017

cloud-fan commented Jan 17, 2017

[SPARK-19148][SQL] do not expose the external table concept in Catalog #16528

[SPARK-19148][SQL] do not expose the external table concept in Catalog #16528

Conversation

cloud-fan commented Jan 10, 2017

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jan 10, 2017

SparkQA commented Jan 10, 2017

SparkQA commented Jan 10, 2017

cloud-fan commented Jan 10, 2017

SparkQA commented Jan 10, 2017

gatorsmile commented Jan 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 10, 2017

gatorsmile commented Jan 10, 2017

SparkQA commented Jan 10, 2017

cloud-fan commented Jan 11, 2017

SparkQA commented Jan 11, 2017

SparkQA commented Jan 11, 2017

gatorsmile commented Jan 11, 2017

SparkQA commented Jan 11, 2017

SparkQA commented Jan 11, 2017

SparkQA commented Jan 12, 2017

SparkQA commented Jan 13, 2017

SparkQA commented Jan 13, 2017

SparkQA commented Jan 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 14, 2017

gatorsmile commented Jan 15, 2017

SparkQA commented Jan 16, 2017

cloud-fan commented Jan 16, 2017

yhuai commented Jan 16, 2017 • edited Loading

SparkQA commented Jan 17, 2017

cloud-fan commented Jan 17, 2017

yhuai commented Jan 16, 2017 •

edited

Loading