Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-43841][SQL] Handle candidate attributes with no prefix in `Str…
…ingUtils#orderSuggestedIdentifiersBySimilarity` ### What changes were proposed in this pull request? In `StringUtils#orderSuggestedIdentifiersBySimilarity`, handle the case where the candidate attributes have a mix of empty and non-empty prefixes. ### Why are the changes needed? The following query throws a `StringIndexOutOfBoundsException`: ``` with v1 as ( select * from values (1, 2) as (c1, c2) ), v2 as ( select * from values (2, 3) as (c1, c2) ) select v1.c1, v1.c2, v2.c1, v2.c2, b from v1 full outer join v2 using (c1); ``` The query should fail anyway, since `b` refers to a non-existent column. But it should fail with a helpful error message, not with a `StringIndexOutOfBoundsException`. `StringUtils#orderSuggestedIdentifiersBySimilarity` assumes that a list of suggested attributes with a mix of prefixes will never have an attribute name with an empty prefix. But in this case it does (`c1` from the `coalesce` has no prefix, since it is not associated with any relation or subquery): ``` +- 'Project [c1#5, c2#6, c1#7, c2#8, 'b] +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2) +- Join FullOuter, (c1#5 = c1#7) :- SubqueryAlias v1 : +- CTERelationRef 0, true, [c1#5, c2#6] +- SubqueryAlias v2 +- CTERelationRef 1, true, [c1#7, c2#8] ``` Because of this, `orderSuggestedIdentifiersBySimilarity` returns a sorted list of suggestions like this: ``` ArrayBuffer(.c1, v1.c2, v2.c2) ``` `UnresolvedAttribute.parseAttributeName` chokes on an attribute name that starts with a '.'. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes apache#41353 from bersprockets/unresolved_column_issue. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
- Loading branch information