Skip to content

Commit

Permalink
[SPARK-43841][SQL] Handle candidate attributes with no prefix in `Str…
Browse files Browse the repository at this point in the history
…ingUtils#orderSuggestedIdentifiersBySimilarity`

### What changes were proposed in this pull request?

In `StringUtils#orderSuggestedIdentifiersBySimilarity`, handle the case where the candidate attributes have a mix of empty and non-empty prefixes.

### Why are the changes needed?

The following query throws a `StringIndexOutOfBoundsException`:
```
with v1 as (
 select * from values (1, 2) as (c1, c2)
),
v2 as (
  select * from values (2, 3) as (c1, c2)
)
select v1.c1, v1.c2, v2.c1, v2.c2, b
from v1
full outer join v2
using (c1);
```
The query should fail anyway, since `b` refers to a non-existent column. But it should fail with a helpful error message, not with a `StringIndexOutOfBoundsException`.

`StringUtils#orderSuggestedIdentifiersBySimilarity` assumes that a list of suggested attributes with a mix of prefixes will never have an attribute name with an empty prefix. But in this case it does (`c1` from the `coalesce` has no prefix, since it is not associated with any relation or subquery):
```
+- 'Project [c1#5, c2#6, c1#7, c2#8, 'b]
   +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2)
      +- Join FullOuter, (c1#5 = c1#7)
         :- SubqueryAlias v1
         :  +- CTERelationRef 0, true, [c1#5, c2#6]
         +- SubqueryAlias v2
            +- CTERelationRef 1, true, [c1#7, c2#8]
```
Because of this, `orderSuggestedIdentifiersBySimilarity` returns a sorted list of suggestions like this:
```
ArrayBuffer(.c1, v1.c2, v2.c2)
```
`UnresolvedAttribute.parseAttributeName` chokes on an attribute name that starts with a '.'.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes apache#41353 from bersprockets/unresolved_column_issue.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
  • Loading branch information
bersprockets authored and czxm committed Jun 12, 2023
1 parent 1a4b999 commit f648748
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ object StringUtils extends Logging {
sorted.map(_._2)
} else {
// More than one relation
sorted.map(x => s"${x._1}.${x._2}")
sorted.map(x => if (x._1.isEmpty) s"${x._2}" else s"${x._1}.${x._2}")
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,4 +138,11 @@ class StringUtilsSuite extends SparkFunSuite with SQLHelper {
assert(quoteIfNeeded("_") === "_")
assert(quoteIfNeeded("") === "``")
}

test("SPARK-43841: mix of multipart and single-part identifiers") {
val baseString = "b"
val testStrings = Seq("c1", "v1.c2", "v2.c2") // mix of multipart and single-part
val expectedOutput = Seq("c1", "v1.c2", "v2.c2")
assert(orderSuggestedIdentifiersBySimilarity(baseString, testStrings) === expectedOutput)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -884,6 +884,33 @@ class QueryCompilationErrorsSuite
)
}
}

test("SPARK-43841: Unresolved attribute in select of full outer join with USING") {
withTempView("v1", "v2") {
sql("create or replace temp view v1 as values (1, 2) as (c1, c2)")
sql("create or replace temp view v2 as values (2, 3) as (c1, c2)")

val query =
"""select b
|from v1
|full outer join v2
|using (c1)
|""".stripMargin

checkError(
exception = intercept[AnalysisException] {
sql(query)
},
errorClass = "UNRESOLVED_COLUMN.WITH_SUGGESTION",
parameters = Map(
"proposal" -> "`c1`, `v1`.`c2`, `v2`.`c2`",
"objectName" -> "`b`"),
context = ExpectedContext(
fragment = "b",
start = 7, stop = 7)
)
}
}
}

class MyCastToString extends SparkUserDefinedFunction(
Expand Down

0 comments on commit f648748

Please sign in to comment.