[SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore. #4795

yhuai · 2015-02-26T21:47:08Z

JIRA: https://issues.apache.org/jira/browse/SPARK-6024

…it in metastore.

SparkQA · 2015-02-26T21:53:03Z

Test build #28020 has started for PR 4795 at commit 12bacae.

This patch merges cleanly.

SparkQA · 2015-02-26T22:07:32Z

Test build #28022 has started for PR 4795 at commit cc1d472.

This patch merges cleanly.

SparkQA · 2015-02-26T23:14:17Z

Test build #28020 has finished for PR 4795 at commit 12bacae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-26T23:14:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28020/
Test PASSed.

rxin · 2015-02-26T23:16:24Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

+        tbl.setProperty("spark.sql.sources.schema.numOfParts", "1")
+        // We use spark.sql.sources.schema instead of using spark.sql.sources.schema.part.0
+        // because users may have already created data source tables in metastore.
+        tbl.setProperty("spark.sql.sources.schema", schemaJsonString)


why don't we just always use schema.part.0 ? seems easier to consolidate the two code path

SparkQA · 2015-02-26T23:23:36Z

Test build #28022 has finished for PR 4795 at commit cc1d472.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-26T23:23:39Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28022/
Test PASSed.

SparkQA · 2015-02-26T23:37:42Z

Test build #28025 has started for PR 4795 at commit 143927a.

This patch merges cleanly.

SparkQA · 2015-02-27T00:53:40Z

Test build #28025 has finished for PR 4795 at commit 143927a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-27T00:53:43Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28025/
Test PASSed.

rxin · 2015-02-27T01:37:42Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

@@ -69,13 +69,19 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
        val table = synchronized {
          client.getTable(in.database, in.name)
        }
-        val schemaString = table.getProperty("spark.sql.sources.schema")
+        val schemaString = Option(table.getProperty("spark.sql.sources.schema.numOfParts")) match {


I think it is more conventional to use numParts instead of numOfParts. Also you can remove the pattern matching by just applying a map.

Option(table.getProperty("spark.sql.sources.schema.numParts")).map { numParts => ... }

SparkQA · 2015-02-27T01:57:46Z

Test build #28031 has started for PR 4795 at commit 73e71b4.

This patch merges cleanly.

rxin · 2015-02-27T02:44:02Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

+              val part = table.getProperty(s"spark.sql.sources.schema.part.${index}")
+              if (part == null) {
+                throw new AnalysisException(
+                  "Could not read schema from the metastore because it is corrupted.")


sorry for being picky, but it would be great to include the reason why it is corrupted (i.e. "missing part x")

SparkQA · 2015-02-27T03:14:30Z

Test build #28031 has finished for PR 4795 at commit 73e71b4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-27T03:14:33Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28031/
Test PASSed.

SparkQA · 2015-02-27T03:22:34Z

Test build #28043 has started for PR 4795 at commit 4882e6f.

This patch merges cleanly.

rxin · 2015-02-27T03:29:22Z

lgtm

SparkQA · 2015-02-27T04:42:37Z

Test build #28043 has finished for PR 4795 at commit 4882e6f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-27T04:42:40Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28043/
Test PASSed.

rxin · 2015-02-27T04:46:18Z

Merging in!

… schema cannot be stored in metastore. JIRA: https://issues.apache.org/jira/browse/SPARK-6024 Author: Yin Huai <yhuai@databricks.com> Closes #4795 from yhuai/wideSchema and squashes the following commits: 4882e6f [Yin Huai] Address comments. 73e71b4 [Yin Huai] Address comments. 143927a [Yin Huai] Simplify code. cc1d472 [Yin Huai] Make the schema wider. 12bacae [Yin Huai] If the JSON string of a schema is too large, split it before storing it in metastore. e9b4f70 [Yin Huai] Failed test. (cherry picked from commit 5e5ad65) Signed-off-by: Reynold Xin <rxin@databricks.com>

yhuai added 2 commits February 26, 2015 13:27

Failed test.

e9b4f70

If the JSON string of a schema is too large, split it before storing …

12bacae

…it in metastore.

Make the schema wider.

cc1d472

rxin reviewed Feb 26, 2015
View reviewed changes

Simplify code.

143927a

rxin reviewed Feb 27, 2015
View reviewed changes

Address comments.

73e71b4

rxin reviewed Feb 27, 2015
View reviewed changes

Address comments.

4882e6f

asfgit closed this in 5e5ad65 Feb 27, 2015

gatorsmile mentioned this pull request Jul 20, 2016

[SPARK-16498][SQL] move hive hack for data source table into HiveExternalCatalog #14155

Closed

planetf1 mentioned this pull request Feb 17, 2023

Column metadata not retrieved (IBM Data Engine) odpi/egeria-connector-hivemetastore#115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore. #4795

[SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore. #4795

yhuai commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

rxin Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

rxin Feb 27, 2015

SparkQA commented Feb 27, 2015

rxin Feb 27, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

SparkQA commented Feb 27, 2015

rxin commented Feb 27, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

rxin commented Feb 27, 2015

[SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore. #4795

[SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore. #4795

Conversation

yhuai commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

rxin Feb 26, 2015

Choose a reason for hiding this comment

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

rxin Feb 27, 2015

Choose a reason for hiding this comment

SparkQA commented Feb 27, 2015

rxin Feb 27, 2015

Choose a reason for hiding this comment

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

SparkQA commented Feb 27, 2015

rxin commented Feb 27, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

rxin commented Feb 27, 2015