Skip to content

[SPARK-49543][SQL] Add SHOW COLLATIONS command#55099

Closed
viirya wants to merge 8 commits intoapache:masterfrom
viirya:SPARK-49543-show-collations
Closed

[SPARK-49543][SQL] Add SHOW COLLATIONS command#55099
viirya wants to merge 8 commits intoapache:masterfrom
viirya:SPARK-49543-show-collations

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Mar 30, 2026

What changes were proposed in this pull request?

Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*').

Output schema: NAME, LANGUAGE, COUNTRY, ACCENT_SENSITIVITY, CASE_SENSITIVITY, PAD_ATTRIBUTE, ICU_VERSION — matching the existing collations() TVF but without the constant CATALOG/SCHEMA columns.

Implementation follows the ShowCatalogsCommand pattern as collations are engine-global and not tied to any catalog or namespace.

Why are the changes needed?

SHOW COLLATIONS is a SQL command supported by MySQL and its derivatives (MariaDB, TiDB) for listing available collations. Spark currently only exposes this information via a table-valued function (SELECT * FROM collations()), which is inconsistent with how other catalog objects are queried (SHOW CATALOGS, SHOW TABLES, etc.) and unfamiliar to users coming from MySQL-compatible databases. This change adds a more intuitive SQL syntax consistent with Spark's existing SHOW command family.

Does this PR introduce any user-facing change?

Yes, this adds SHOW COLLATIONS command.

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations.
Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE%').

Output schema: NAME, LANGUAGE, COUNTRY, ACCENT_SENSITIVITY, CASE_SENSITIVITY,
PAD_ATTRIBUTE, ICU_VERSION — matching the existing collations() TVF but without
the constant CATALOG/SCHEMA columns.

Implementation follows the ShowCatalogsCommand pattern as collations are
engine-global and not tied to any catalog or namespace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs).

viirya and others added 3 commits March 30, 2026 17:29
…NS token

Add COLLATIONS to SQL keyword golden files and hardcoded keyword lists
in ThriftServer and SparkConnect JDBC tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…LATIONS

Add COLLATIONS to reserved keyword list in keywords-enforced.sql.out
and add COLLATIONS documentation entry in sql-ref-ansi-compliance.md.
COLLATIONS is reserved in ANSI mode (ansiNonReserved) and non-reserved
in non-ANSI mode; it is not part of SQL-2016 standard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
filterPattern uses * (not %) as the wildcard character, consistent
with other SHOW commands like SHOW NAMESPACES and SHOW FUNCTIONS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the test passed, the last test code change commit looks suspicious to me.

assert(utf8Row.getString(3) == "ACCENT_SENSITIVE")
assert(utf8Row.getString(4) == "CASE_SENSITIVE")

val likeResult = sql("SHOW COLLATIONS LIKE 'UNICODE*'").collect()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you double-check this, @viirya ? LIKE should use % instead of *.

* is used for another syntax like REGEXP.

Copy link
Copy Markdown
Member Author

@viirya viirya Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this!

For context: other existing SHOW commands like SHOW NAMESPACES and SHOW FUNCTIONS also use *. This SHOW COLLATIONS fix makes it consistent with SQL LIKE convention, unlike those commands.

ShowNamespacesSuiteBase.scala:88 — SHOW NAMESPACES LIKE '1'
ShowFunctionsParserSuite.scala:50 — SHOW FUNCTIONS LIKE 'funct*'
ShowTablesParserSuite.scala:36 — SHOW TABLES LIKE 'test'
ShowTablesParserSuite.scala:54 — SHOW TABLE EXTENDED LIKE 'test'
ShowTablesSuiteBase.scala:338 — SHOW TABLE EXTENDED LIKE '$viewName*'

I also take a look at our document for SHOW TABLES LIKE https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html. It said the command uses regex_pattern after LIKE:

Syntax
SHOW TABLES [ { FROM | IN } database_name ] [ LIKE regex_pattern ]

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. So, we support both % and *?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @viirya . After reading the doc once more, I realized my misunderstanding. Sorry for making you confused.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can we have a new document for SHOW COLLATIONS like SHOW TABLES LIKE https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me add a document for it. Thanks for the reminder.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a ticket to make it follow the standard SQL LIKE pattern SPARK-45880 (#43751) but was not landed.

viirya and others added 3 commits March 31, 2026 09:15
Convert SQL LIKE wildcard % to glob * before passing to filterPattern,
so SHOW COLLATIONS LIKE 'UNICODE%' works correctly. Revert test to
use % per SQL LIKE convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add sql-ref-syntax-aux-show-collations.md following the same
structure as other SHOW command docs (description, syntax,
parameters, output schema, examples, related statements).
Also add entry to the SQL syntax index in sql-ref-syntax.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
parser.parsePlan("SHOW COLLATIONS"),
ShowCollationsCommand(None))
comparePlans(
parser.parsePlan("SHOW COLLATIONS LIKE 'UNICODE%'"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't support %, shall we avoid % in the test case, @viirya ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Let me remove it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dongjoon-hyun
Copy link
Copy Markdown
Member

I updated the PR description too.

-Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE%').
+Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*').

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Mar 31, 2026

I updated the PR description too.

-Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE%').
+Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*').

Thank you @dongjoon-hyun

@dongjoon-hyun
Copy link
Copy Markdown
Member

Merged to master for Apache Spark 4.2.0.

zhengruifeng pushed a commit to zhengruifeng/spark that referenced this pull request Apr 1, 2026
### What changes were proposed in this pull request?

Add SHOW COLLATIONS SQL syntax to list all Spark built-in collations. Supports optional LIKE pattern filtering (e.g. SHOW COLLATIONS LIKE 'UNICODE*').

Output schema: NAME, LANGUAGE, COUNTRY, ACCENT_SENSITIVITY, CASE_SENSITIVITY, PAD_ATTRIBUTE, ICU_VERSION — matching the existing collations() TVF but without the constant CATALOG/SCHEMA columns.

Implementation follows the ShowCatalogsCommand pattern as collations are engine-global and not tied to any catalog or namespace.

### Why are the changes needed?

SHOW COLLATIONS is a SQL command supported by MySQL and its derivatives (MariaDB, TiDB) for listing available collations. Spark currently only exposes this information via a table-valued function (SELECT * FROM collations()), which is inconsistent with how other catalog objects are queried (SHOW CATALOGS, SHOW TABLES, etc.) and unfamiliar to users coming from MySQL-compatible databases. This change adds a more intuitive SQL syntax consistent with Spark's existing SHOW command family.

### Does this PR introduce _any_ user-facing change?

Yes, this adds `SHOW COLLATIONS` command.

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

Closes apache#55099 from viirya/SPARK-49543-show-collations.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@viirya viirya deleted the SPARK-49543-show-collations branch April 1, 2026 04:56
@viirya
Copy link
Copy Markdown
Member Author

viirya commented Apr 1, 2026

Thank you @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants