[SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` #30551

MaxGekk · 2020-11-30T14:15:01Z

What changes were proposed in this pull request?

Perform partition spec normalization in ShowTablesCommand according to the table schema before getting partitions from the catalog. The normalization via PartitioningUtils.normalizePartitionSpec() adjusts the column names in partition specification, w.r.t. the real partition column names and case sensitivity.

Why are the changes needed?

Even when spark.sql.caseSensitive is false which is the default value, v1 SHOW TABLE EXTENDED is case sensitive:

spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > partitioned by (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`';

Does this PR introduce any user-facing change?

Yes. After the changes, the SHOW TABLE EXTENDED command respects the SQL config. And for example above, it returns correct result:

spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
default	tbl1	false	Partition Values: [year=2015, month=1]
Location: file:/Users/maximgekk/spark-warehouse/tbl1/year=2015/month=1
Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties: [serialization.format=1, path=file:/Users/maximgekk/spark-warehouse/tbl1]
Partition Parameters: {transient_lastDdlTime=1606595118, totalSize=623, numFiles=1}
Created Time: Sat Nov 28 23:25:18 MSK 2020
Last Access: UNKNOWN
Partition Statistics: 623 bytes

How was this patch tested?

By running the modified test suite via:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DDLSuite"

… resolving partition spec in v1 `SHOW TABLE EXTENDED` Perform partition spec normalization in `ShowTablesCommand` according to the table schema before getting partitions from the catalog. The normalization via `PartitioningUtils.normalizePartitionSpec()` adjusts the column names in partition specification, w.r.t. the real partition column names and case sensitivity. Even when `spark.sql.caseSensitive` is `false` which is the default value, v1 `SHOW TABLE EXTENDED` is case sensitive: ```sql spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > partitioned by (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`'; ``` Yes. After the changes, the `SHOW TABLE EXTENDED` command respects the SQL config. And for example above, it returns correct result: ```sql spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); default tbl1 false Partition Values: [year=2015, month=1] Location: file:/Users/maximgekk/spark-warehouse/tbl1/year=2015/month=1 Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Storage Properties: [serialization.format=1, path=file:/Users/maximgekk/spark-warehouse/tbl1] Partition Parameters: {transient_lastDdlTime=1606595118, totalSize=623, numFiles=1} Created Time: Sat Nov 28 23:25:18 MSK 2020 Last Access: UNKNOWN Partition Statistics: 623 bytes ``` By running the modified test suite `v1/ShowTablesSuite` Closes apache#30529 from MaxGekk/show-table-case-sensitive-spec. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0054fc9) Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit f09fcec) Signed-off-by: Max Gekk <max.gekk@gmail.com>

MaxGekk · 2020-11-30T14:17:10Z

@dongjoon-hyun Please, review this PR.

dongjoon-hyun

+1, LGTM. Thank you, @MaxGekk .
Merged to branch-2.4.

…while resolving partition spec in v1 `SHOW TABLE EXTENDED` ### What changes were proposed in this pull request? Perform partition spec normalization in `ShowTablesCommand` according to the table schema before getting partitions from the catalog. The normalization via `PartitioningUtils.normalizePartitionSpec()` adjusts the column names in partition specification, w.r.t. the real partition column names and case sensitivity. ### Why are the changes needed? Even when `spark.sql.caseSensitive` is `false` which is the default value, v1 `SHOW TABLE EXTENDED` is case sensitive: ```sql spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > partitioned by (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`'; ``` ### Does this PR introduce _any_ user-facing change? Yes. After the changes, the `SHOW TABLE EXTENDED` command respects the SQL config. And for example above, it returns correct result: ```sql spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); default tbl1 false Partition Values: [year=2015, month=1] Location: file:/Users/maximgekk/spark-warehouse/tbl1/year=2015/month=1 Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Storage Properties: [serialization.format=1, path=file:/Users/maximgekk/spark-warehouse/tbl1] Partition Parameters: {transient_lastDdlTime=1606595118, totalSize=623, numFiles=1} Created Time: Sat Nov 28 23:25:18 MSK 2020 Last Access: UNKNOWN Partition Statistics: 623 bytes ``` ### How was this patch tested? By running the modified test suite via: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DDLSuite" ``` Closes #30551 from MaxGekk/show-table-case-sensitive-spec-2.4. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

SparkQA · 2020-11-30T17:28:42Z

Test build #131993 has finished for PR 30551 at commit c2ca5dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk mentioned this pull request Nov 30, 2020

[SPARK-33588][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW TABLE EXTENDED #30529

Closed

dongjoon-hyun approved these changes Nov 30, 2020

View reviewed changes

dongjoon-hyun closed this Nov 30, 2020

MaxGekk deleted the show-table-case-sensitive-spec-2.4 branch December 11, 2020 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` #30551

[SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` #30551

MaxGekk commented Nov 30, 2020 •

edited by dongjoon-hyun

Loading

MaxGekk commented Nov 30, 2020

dongjoon-hyun left a comment

SparkQA commented Nov 30, 2020

[SPARK-33588][SQL][2.4] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW TABLE EXTENDED #30551

[SPARK-33588][SQL][2.4] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SHOW TABLE EXTENDED #30551

Conversation

MaxGekk commented Nov 30, 2020 • edited by dongjoon-hyun Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

MaxGekk commented Nov 30, 2020

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented Nov 30, 2020

[SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` #30551

[SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` #30551

MaxGekk commented Nov 30, 2020 •

edited by dongjoon-hyun

Loading