[SPARK-49543][SQL] Add SHOW COLLATIONS command#55099
[SPARK-49543][SQL] Add SHOW COLLATIONS command#55099viirya wants to merge 8 commits intoapache:masterfrom
Conversation
Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE%'). Output schema: NAME, LANGUAGE, COUNTRY, ACCENT_SENSITIVITY, CASE_SENSITIVITY, PAD_ATTRIBUTE, ICU_VERSION — matching the existing collations() TVF but without the constant CATALOG/SCHEMA columns. Implementation follows the ShowCatalogsCommand pattern as collations are engine-global and not tied to any catalog or namespace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM (Pending CIs).
…NS token Add COLLATIONS to SQL keyword golden files and hardcoded keyword lists in ThriftServer and SparkConnect JDBC tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…LATIONS Add COLLATIONS to reserved keyword list in keywords-enforced.sql.out and add COLLATIONS documentation entry in sql-ref-ansi-compliance.md. COLLATIONS is reserved in ANSI mode (ansiNonReserved) and non-reserved in non-ANSI mode; it is not part of SQL-2016 standard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
filterPattern uses * (not %) as the wildcard character, consistent with other SHOW commands like SHOW NAMESPACES and SHOW FUNCTIONS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Although the test passed, the last test code change commit looks suspicious to me.
| assert(utf8Row.getString(3) == "ACCENT_SENSITIVE") | ||
| assert(utf8Row.getString(4) == "CASE_SENSITIVE") | ||
|
|
||
| val likeResult = sql("SHOW COLLATIONS LIKE 'UNICODE*'").collect() |
There was a problem hiding this comment.
Could you double-check this, @viirya ? LIKE should use % instead of *.
* is used for another syntax like REGEXP.
There was a problem hiding this comment.
Thanks for catching this!
For context: other existing SHOW commands like SHOW NAMESPACES and SHOW FUNCTIONS also use *. This SHOW COLLATIONS fix makes it consistent with SQL LIKE convention, unlike those commands.
ShowNamespacesSuiteBase.scala:88 — SHOW NAMESPACES LIKE '1'
ShowFunctionsParserSuite.scala:50 — SHOW FUNCTIONS LIKE 'funct*'
ShowTablesParserSuite.scala:36 — SHOW TABLES LIKE 'test'
ShowTablesParserSuite.scala:54 — SHOW TABLE EXTENDED LIKE 'test'
ShowTablesSuiteBase.scala:338 — SHOW TABLE EXTENDED LIKE '$viewName*'
I also take a look at our document for SHOW TABLES LIKE https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html. It said the command uses regex_pattern after LIKE:
Syntax
SHOW TABLES [ { FROM | IN } database_name ] [ LIKE regex_pattern ]
There was a problem hiding this comment.
Oh. So, we support both % and *?
There was a problem hiding this comment.
Thank you, @viirya . After reading the doc once more, I realized my misunderstanding. Sorry for making you confused.
There was a problem hiding this comment.
BTW, can we have a new document for SHOW COLLATIONS like SHOW TABLES LIKE https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html ?
There was a problem hiding this comment.
Yes, let me add a document for it. Thanks for the reminder.
There was a problem hiding this comment.
there was a ticket to make it follow the standard SQL LIKE pattern SPARK-45880 (#43751) but was not landed.
Convert SQL LIKE wildcard % to glob * before passing to filterPattern, so SHOW COLLATIONS LIKE 'UNICODE%' works correctly. Revert test to use % per SQL LIKE convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… pattern" This reverts commit c8a3010.
Add sql-ref-syntax-aux-show-collations.md following the same structure as other SHOW command docs (description, syntax, parameters, output schema, examples, related statements). Also add entry to the SQL syntax index in sql-ref-syntax.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| parser.parsePlan("SHOW COLLATIONS"), | ||
| ShowCollationsCommand(None)) | ||
| comparePlans( | ||
| parser.parsePlan("SHOW COLLATIONS LIKE 'UNICODE%'"), |
There was a problem hiding this comment.
If we don't support %, shall we avoid % in the test case, @viirya ?
There was a problem hiding this comment.
Good catch! Let me remove it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I updated the PR description too. -Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE%').
+Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*'). |
Thank you @dongjoon-hyun |
|
Merged to master for Apache Spark 4.2.0. |
### What changes were proposed in this pull request? Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*'). Output schema: NAME, LANGUAGE, COUNTRY, ACCENT_SENSITIVITY, CASE_SENSITIVITY, PAD_ATTRIBUTE, ICU_VERSION — matching the existing collations() TVF but without the constant CATALOG/SCHEMA columns. Implementation follows the ShowCatalogsCommand pattern as collations are engine-global and not tied to any catalog or namespace. ### Why are the changes needed? SHOW COLLATIONS is a SQL command supported by MySQL and its derivatives (MariaDB, TiDB) for listing available collations. Spark currently only exposes this information via a table-valued function (SELECT * FROM collations()), which is inconsistent with how other catalog objects are queried (SHOW CATALOGS, SHOW TABLES, etc.) and unfamiliar to users coming from MySQL-compatible databases. This change adds a more intuitive SQL syntax consistent with Spark's existing SHOW command family. ### Does this PR introduce _any_ user-facing change? Yes, this adds `SHOW COLLATIONS` command. ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6 Closes apache#55099 from viirya/SPARK-49543-show-collations. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Thank you @dongjoon-hyun |
What changes were proposed in this pull request?
Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*').
Output schema: NAME, LANGUAGE, COUNTRY, ACCENT_SENSITIVITY, CASE_SENSITIVITY, PAD_ATTRIBUTE, ICU_VERSION — matching the existing collations() TVF but without the constant CATALOG/SCHEMA columns.
Implementation follows the ShowCatalogsCommand pattern as collations are engine-global and not tied to any catalog or namespace.
Why are the changes needed?
SHOW COLLATIONS is a SQL command supported by MySQL and its derivatives (MariaDB, TiDB) for listing available collations. Spark currently only exposes this information via a table-valued function (SELECT * FROM collations()), which is inconsistent with how other catalog objects are queried (SHOW CATALOGS, SHOW TABLES, etc.) and unfamiliar to users coming from MySQL-compatible databases. This change adds a more intuitive SQL syntax consistent with Spark's existing SHOW command family.
Does this PR introduce any user-facing change?
Yes, this adds
SHOW COLLATIONScommand.How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6