Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33667][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS #30615

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Dec 5, 2020

What changes were proposed in this pull request?

Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation ShowPartitionsCommand, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag spark.sql.caseSensitive.

Why are the changes needed?

V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config spark.sql.caseSensitive which is false by default, for instance:

spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS;

The SHOW PARTITIONS command must show the partition year = 2015, month = 1 specified by YEAR = 2015, Month = 1.

Does this PR introduce any user-facing change?

Yes. After the changes, the command above works as expected:

spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
year=2015/month=1

How was this patch tested?

By running the affected test suites:

  • v1/ShowPartitionsSuite
  • v2/ShowPartitionsSuite

@github-actions github-actions bot added the SQL label Dec 5, 2020
Comment on lines -1013 to -1020
if (spec.isDefined) {
val badColumns = spec.get.keySet.filterNot(table.partitionColumnNames.contains)
if (badColumns.nonEmpty) {
val badCols = badColumns.mkString("[", ", ", "]")
throw new AnalysisException(
s"Non-partitioning column(s) $badCols are specified for SHOW PARTITIONS")
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the check inside of normalizePartitionSpec()

@MaxGekk MaxGekk changed the title [SPARK-33667][SQL] Respect case sensitivity in V1 SHOW PARTITIONS [SPARK-33667][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW PARTITIONS Dec 5, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 5, 2020

@dongjoon-hyun @HyukjinKwon @cloud-fan Could you review this bug fix, please.

@SparkQA
Copy link

SparkQA commented Dec 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36865/

@SparkQA
Copy link

SparkQA commented Dec 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36865/

@SparkQA
Copy link

SparkQA commented Dec 5, 2020

Test build #132264 has finished for PR 30615 at commit cb0ea4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -149,4 +149,28 @@ trait ShowPartitionsSuiteBase extends QueryTest with SQLTestUtils {
}
}
}

test("case sensitivity of partition spec") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a JIRA prefix @MaxGekk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test already has prefixes V1, V2 or Hive V1. If I add one more prefix, this will look not beauty, though...

@SparkQA
Copy link

SparkQA commented Dec 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36886/

@SparkQA
Copy link

SparkQA commented Dec 6, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36886/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thanks, @MaxGekk .

@SparkQA
Copy link

SparkQA commented Dec 6, 2020

Test build #132285 has finished for PR 30615 at commit 603823a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Dec 6, 2020
… resolving partition spec in v1 `SHOW PARTITIONS`

### What changes were proposed in this pull request?
Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation `ShowPartitionsCommand`, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag  **spark.sql.caseSensitive**.

### Why are the changes needed?
V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config **spark.sql.caseSensitive** which is false by default, for instance:
```sql
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS;
```
The `SHOW PARTITIONS` command must show the partition `year = 2015, month = 1` specified by `YEAR = 2015, Month = 1`.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, the command above works as expected:
```sql
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
year=2015/month=1
```

### How was this patch tested?
By running the affected test suites:
- `v1/ShowPartitionsSuite`
- `v2/ShowPartitionsSuite`

Closes #30615 from MaxGekk/show-partitions-case-sensitivity-test.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4829781)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to master/3.1. Could you make a backport for branch-3.0 and branch-2.4, @MaxGekk ?

MaxGekk added a commit to MaxGekk/spark that referenced this pull request Dec 6, 2020
… resolving partition spec in v1 `SHOW PARTITIONS`

Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation `ShowPartitionsCommand`, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag  **spark.sql.caseSensitive**.

V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config **spark.sql.caseSensitive** which is false by default, for instance:
```sql
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS;
```
The `SHOW PARTITIONS` command must show the partition `year = 2015, month = 1` specified by `YEAR = 2015, Month = 1`.

Yes. After the changes, the command above works as expected:
```sql
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
year=2015/month=1
```

By running the affected test suites:
- `v1/ShowPartitionsSuite`
- `v2/ShowPartitionsSuite`

Closes apache#30615 from MaxGekk/show-partitions-case-sensitivity-test.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4829781)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Dec 6, 2020
… resolving partition spec in v1 `SHOW PARTITIONS`

Preprocess the partition spec passed to the V1 SHOW PARTITIONS implementation `ShowPartitionsCommand`, and normalize the passed spec according to the partition columns w.r.t the case sensitivity flag  **spark.sql.caseSensitive**.

V1 SHOW PARTITIONS is case sensitive in fact, and doesn't respect the SQL config **spark.sql.caseSensitive** which is false by default, for instance:
```sql
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS;
```
The `SHOW PARTITIONS` command must show the partition `year = 2015, month = 1` specified by `YEAR = 2015, Month = 1`.

Yes. After the changes, the command above works as expected:
```sql
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
year=2015/month=1
```

By running the affected test suites:
- `v1/ShowPartitionsSuite`
- `v2/ShowPartitionsSuite`

Closes apache#30615 from MaxGekk/show-partitions-case-sensitivity-test.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4829781)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 6, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants