[SPARK-33667][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW PARTITIONS` #30615

MaxGekk · 2020-12-05T07:19:47Z

What changes were proposed in this pull request?

Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation ShowPartitionsCommand, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag spark.sql.caseSensitive.

Why are the changes needed?

V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config spark.sql.caseSensitive which is false by default, for instance:

spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS;

The SHOW PARTITIONS command must show the partition year = 2015, month = 1 specified by YEAR = 2015, Month = 1.

Does this PR introduce any user-facing change?

Yes. After the changes, the command above works as expected:

spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
year=2015/month=1

How was this patch tested?

By running the affected test suites:

v1/ShowPartitionsSuite
v2/ShowPartitionsSuite

MaxGekk · 2020-12-05T07:20:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

-    if (spec.isDefined) {
-      val badColumns = spec.get.keySet.filterNot(table.partitionColumnNames.contains)
-      if (badColumns.nonEmpty) {
-        val badCols = badColumns.mkString("[", ", ", "]")
-        throw new AnalysisException(
-          s"Non-partitioning column(s) $badCols are specified for SHOW PARTITIONS")
-      }
-    }


This duplicates the check inside of normalizePartitionSpec()

MaxGekk · 2020-12-05T07:28:14Z

@dongjoon-hyun @HyukjinKwon @cloud-fan Could you review this bug fix, please.

SparkQA · 2020-12-05T08:05:15Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36865/

SparkQA · 2020-12-05T08:32:16Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36865/

SparkQA · 2020-12-05T11:39:27Z

Test build #132264 has finished for PR 30615 at commit cb0ea4b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-12-06T02:50:47Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala

@@ -149,4 +149,28 @@ trait ShowPartitionsSuiteBase extends QueryTest with SQLTestUtils {
      }
    }
  }
+
+  test("case sensitivity of partition spec") {


Shall we add a JIRA prefix @MaxGekk?

The test already has prefixes V1, V2 or Hive V1. If I add one more prefix, this will look not beauty, though...

SparkQA · 2020-12-06T07:13:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36886/

SparkQA · 2020-12-06T07:47:16Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36886/

dongjoon-hyun

+1, LGTM. Thanks, @MaxGekk .

SparkQA · 2020-12-06T10:40:31Z

Test build #132285 has finished for PR 30615 at commit 603823a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… resolving partition spec in v1 `SHOW PARTITIONS` ### What changes were proposed in this pull request? Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation `ShowPartitionsCommand`, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag **spark.sql.caseSensitive**. ### Why are the changes needed? V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config **spark.sql.caseSensitive** which is false by default, for instance: ```sql spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; ``` The `SHOW PARTITIONS` command must show the partition `year = 2015, month = 1` specified by `YEAR = 2015, Month = 1`. ### Does this PR introduce _any_ user-facing change? Yes. After the changes, the command above works as expected: ```sql spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); year=2015/month=1 ``` ### How was this patch tested? By running the affected test suites: - `v1/ShowPartitionsSuite` - `v2/ShowPartitionsSuite` Closes #30615 from MaxGekk/show-partitions-case-sensitivity-test. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4829781) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2020-12-06T10:57:35Z

Merged to master/3.1. Could you make a backport for branch-3.0 and branch-2.4, @MaxGekk ?

… resolving partition spec in v1 `SHOW PARTITIONS` Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation `ShowPartitionsCommand`, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag **spark.sql.caseSensitive**. V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config **spark.sql.caseSensitive** which is false by default, for instance: ```sql spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; ``` The `SHOW PARTITIONS` command must show the partition `year = 2015, month = 1` specified by `YEAR = 2015, Month = 1`. Yes. After the changes, the command above works as expected: ```sql spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); year=2015/month=1 ``` By running the affected test suites: - `v1/ShowPartitionsSuite` - `v2/ShowPartitionsSuite` Closes apache#30615 from MaxGekk/show-partitions-case-sensitivity-test. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4829781) Signed-off-by: Max Gekk <max.gekk@gmail.com>

MaxGekk · 2020-12-06T16:03:21Z

Here are backports:

branch-3.0: [SPARK-33667][SQL][3.0] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS #30626
branch-2.4: [SPARK-33667][SQL][2.4] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS #30627

MaxGekk added 4 commits December 5, 2020 09:49

Add a test

814cddc

Perform normalization

f769614

Remove dead code

b7e2481

Minor refactoring

cb0ea4b

github-actions bot added the SQL label Dec 5, 2020

MaxGekk commented Dec 5, 2020

View reviewed changes

MaxGekk changed the title ~~[SPARK-33667][SQL] Respect case sensitivity in V1 SHOW PARTITIONS~~ [SPARK-33667][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS Dec 5, 2020

HyukjinKwon reviewed Dec 6, 2020

View reviewed changes

HyukjinKwon approved these changes Dec 6, 2020

View reviewed changes

Add JIRA id to test's title

603823a

dongjoon-hyun approved these changes Dec 6, 2020

View reviewed changes

dongjoon-hyun closed this in 4829781 Dec 6, 2020

MaxGekk deleted the show-partitions-case-sensitivity-test branch February 19, 2021 15:04

MaxGekk mentioned this pull request Feb 25, 2021

[SPARK-34543][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SET LOCATION #31651

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33667][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW PARTITIONS` #30615

[SPARK-33667][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW PARTITIONS` #30615

MaxGekk commented Dec 5, 2020 •

edited

Loading

MaxGekk Dec 5, 2020

MaxGekk commented Dec 5, 2020

SparkQA commented Dec 5, 2020

SparkQA commented Dec 5, 2020

SparkQA commented Dec 5, 2020

HyukjinKwon Dec 6, 2020

MaxGekk Dec 6, 2020

SparkQA commented Dec 6, 2020

SparkQA commented Dec 6, 2020

dongjoon-hyun left a comment

SparkQA commented Dec 6, 2020

dongjoon-hyun commented Dec 6, 2020

MaxGekk commented Dec 6, 2020

[SPARK-33667][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS #30615

[SPARK-33667][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS #30615

Conversation

MaxGekk commented Dec 5, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

MaxGekk Dec 5, 2020

Choose a reason for hiding this comment

MaxGekk commented Dec 5, 2020

SparkQA commented Dec 5, 2020

SparkQA commented Dec 5, 2020

SparkQA commented Dec 5, 2020

HyukjinKwon Dec 6, 2020

Choose a reason for hiding this comment

MaxGekk Dec 6, 2020

Choose a reason for hiding this comment

SparkQA commented Dec 6, 2020

SparkQA commented Dec 6, 2020

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented Dec 6, 2020

dongjoon-hyun commented Dec 6, 2020

MaxGekk commented Dec 6, 2020

[SPARK-33667][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW PARTITIONS` #30615

[SPARK-33667][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW PARTITIONS` #30615

MaxGekk commented Dec 5, 2020 •

edited

Loading