Skip to content

[SPARK-46830][SQL] Fix collation strength for parameter markers in EXECUTE IMMEDIATE#55219

Open
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
ilicmarkodb:param_mark_collation
Open

[SPARK-46830][SQL] Fix collation strength for parameter markers in EXECUTE IMMEDIATE#55219
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
ilicmarkodb:param_mark_collation

Conversation

@ilicmarkodb
Copy link
Copy Markdown
Contributor

@ilicmarkodb ilicmarkodb commented Apr 6, 2026

What changes were proposed in this pull request?

Fix parameter marker collation strength in EXECUTE IMMEDIATE and parameterized queries so that parameters get implicit collation strength instead of explicit.

  • In ParameterHandler.convertToSql, collated string parameters (including those nested in arrays, maps, structs) are now serialized as CAST('value' AS STRING COLLATE X) instead of 'value' COLLATE X, giving them implicit collation strength when re-parsed.
  • Null parameters bypass the implicit wrapping since null values have NullType children in CAST which gives Default (not Implicit) strength regardless.
  • Add DataTypeUtils.hasNonDefaultStringCharOrVarcharType to recursively check if a type contains any explicitly collated STRING/CHAR/VARCHAR.
  • Add ElementAt collation context propagation in CollationTypeCoercion.findCollationContext to extract collation context from map value type or array element type.
  • Add ElementAt key coercion rule in CollationTypeCoercion to cast the lookup key to match a collated map's key type (analogous to existing GetMapValue rule).

Why are the changes needed?

Previously, parameters in EXECUTE IMMEDIATE and parameterized queries had explicit collation strength. This caused incorrect behavior — for example, a parameter with COLLATE UTF8_LCASE would win over a column's collation instead of producing an INDETERMINATE_COLLATION_IN_EXPRESSION error, which is the correct behavior for implicit-strength collations meeting a different column collation.

Additionally, element_at() on maps with collated keys failed with DATATYPE_MISMATCH.MAP_FUNCTION_DIFF_TYPES because CollationTypeCoercion lacked a coercion rule for ElementAt (unlike GetMapValue which already had one).

Does this PR introduce any user-facing change?

Yes. Parameters in EXECUTE IMMEDIATE and parameterized queries now have implicit collation strength, matching the behavior of string literals. This means collation conflicts between parameters and columns with different collations will now correctly raise INDETERMINATE_COLLATION_IN_EXPRESSION instead of silently using the parameter's collation.

How was this patch tested?

Added 20 new tests in CollationSuite covering:

  • EXECUTE IMMEDIATE with literal parameters (with/without explicit COLLATE) vs query literals and columns
  • EXECUTE IMMEDIATE with variable parameters vs query literals and columns
  • EXECUTE IMMEDIATE with complex type parameters (ARRAY, MAP, STRUCT) including collation and strength
  • EXECUTE IMMEDIATE with null parameters (plain NULL, typed NULL, variable NULL)
  • Parameterized queries (spark.sql() API) vs columns and with collation strength
  • Two parameters with different collations producing indeterminate collation errors

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

@ilicmarkodb ilicmarkodb force-pushed the param_mark_collation branch 3 times, most recently from cbc6d6b to 0a5c9f6 Compare April 6, 2026 16:23
@HyukjinKwon HyukjinKwon marked this pull request as draft April 6, 2026 22:29
@ilicmarkodb ilicmarkodb force-pushed the param_mark_collation branch 2 times, most recently from 707459a to aeb9912 Compare April 8, 2026 21:48
@ilicmarkodb ilicmarkodb changed the title temp [SPARK-XXXX][SQL] Fix collation strength for parameter markers in EXECUTE IMMEDIATE Apr 8, 2026
@ilicmarkodb ilicmarkodb changed the title [SPARK-XXXX][SQL] Fix collation strength for parameter markers in EXECUTE IMMEDIATE [SPARK-46830][SQL] Fix collation strength for parameter markers in EXECUTE IMMEDIATE Apr 8, 2026
@ilicmarkodb ilicmarkodb marked this pull request as ready for review April 8, 2026 22:50
@ilicmarkodb ilicmarkodb force-pushed the param_mark_collation branch 8 times, most recently from ac6a447 to 649df55 Compare April 9, 2026 13:30
@ilicmarkodb ilicmarkodb force-pushed the param_mark_collation branch from 649df55 to 1434668 Compare April 9, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant