Skip to content

Commit c456cfe

Browse files
AgenticSparkMaxGekk
authored andcommitted
[SPARK-21529][4.X][SQL] Improve the error message for unsupported Hive union type
### What changes were proposed in this pull request? Backport of #56775 to `branch-4.x` because the original change conflicts on this branch. Detect unsupported Hive `uniontype<...>` values when converting Hive `FieldSchema` types to Spark SQL types and raise a dedicated `UNSUPPORTED_HIVE_TYPE` error instead of the generic `CANNOT_RECOGNIZE_HIVE_TYPE` parser error. This is a cherry-pick of the merged master commit c90cad6. The only conflict was in `error-conditions.json`: `branch-4.x` does not have the `UNSUPPORTED_HIVE_FUNCTION_TYPE` / `UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA` entries that exist on master, so the new `UNSUPPORTED_HIVE_TYPE` entry is placed directly between `UNSUPPORTED_GROUPING_EXPRESSION` and `UNSUPPORTED_INSERT`. The Scala changes apply unchanged. ### Why are the changes needed? Spark SQL does not support Hive union types. Today the failure message comes from the parser path and does not clearly identify that the Hive union type is unsupported. ### Does this PR introduce _any_ user-facing change? Yes. Reading a Hive table column that uses `uniontype<...>` now reports `UNSUPPORTED_HIVE_TYPE` with the offending Hive type and column name. ### How was this patch tested? Cherry-picked from the merged master commit c90cad6, which passed CI and review as #56775. The production Scala hunks apply unchanged on `branch-4.x`; only the `error-conditions.json` entry placement differed and was re-validated (valid JSON, alphabetical ordering, one structural token per line). CI here runs `HiveClientImplSuite` and the `SparkThrowableSuite` "Error conditions are correctly formatted" golden check. ### Was this patch authored or co-authored using generative AI tooling? Yes. GitHub Copilot assisted with preparing and validating this change. Closes #56929 from AgenticSpark/agenticspark/SPARK-21529-branch-4.x. Authored-by: AgenticSpark <jianglie2023@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent 5f39cc2 commit c456cfe

4 files changed

Lines changed: 67 additions & 0 deletions

File tree

common/utils/src/main/resources/error/error-conditions.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8641,6 +8641,12 @@
86418641
],
86428642
"sqlState" : "42K0E"
86438643
},
8644+
"UNSUPPORTED_HIVE_TYPE" : {
8645+
"message" : [
8646+
"Cannot read the Hive type <fieldType> of the column <fieldName> because Spark SQL does not support this data type."
8647+
],
8648+
"sqlState" : "0A000"
8649+
},
86448650
"UNSUPPORTED_INSERT" : {
86458651
"message" : [
86468652
"Can't insert into the target."

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1696,6 +1696,13 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE
16961696
cause = e)
16971697
}
16981698

1699+
def unsupportedHiveTypeError(fieldType: String, fieldName: String): Throwable = {
1700+
new SparkUnsupportedOperationException(
1701+
errorClass = "UNSUPPORTED_HIVE_TYPE",
1702+
messageParameters = Map(
1703+
"fieldType" -> toSQLType(fieldType),
1704+
"fieldName" -> toSQLId(fieldName)))
1705+
}
16991706
def getTablesByTypeUnsupportedByHiveVersionError(): SparkUnsupportedOperationException = {
17001707
new SparkUnsupportedOperationException(
17011708
errorClass = "GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION")

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1139,6 +1139,11 @@ private[hive] object HiveClientImpl extends Logging {
11391139
CatalystSqlParser.parseDataType(typeStr)
11401140
} catch {
11411141
case e: ParseException =>
1142+
// Hive's union type (uniontype<...>) is not supported by Spark SQL and makes the parser
1143+
// fail with a generic message. Detect it and report a clearer error (SPARK-21529).
1144+
if (hc.getType.toLowerCase(Locale.ROOT).contains("uniontype<")) {
1145+
throw QueryExecutionErrors.unsupportedHiveTypeError(hc.getType, hc.getName)
1146+
}
11421147
throw QueryExecutionErrors.cannotRecognizeHiveTypeError(e, typeStr, hc.getName)
11431148
}
11441149
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.sql.hive.client
19+
20+
import org.apache.hadoop.hive.metastore.api.FieldSchema
21+
22+
import org.apache.spark.{SparkFunSuite, SparkUnsupportedOperationException}
23+
24+
class HiveClientImplSuite extends SparkFunSuite {
25+
26+
test("SPARK-21529: a clear error is raised for an unsupported Hive union type") {
27+
val column = new FieldSchema("c", "uniontype<int,string>", null)
28+
checkError(
29+
exception = intercept[SparkUnsupportedOperationException] {
30+
HiveClientImpl.fromHiveColumn(column)
31+
},
32+
condition = "UNSUPPORTED_HIVE_TYPE",
33+
parameters = Map(
34+
"fieldType" -> "\"UNIONTYPE<INT,STRING>\"",
35+
"fieldName" -> "`c`"))
36+
}
37+
38+
test("SPARK-21529: a Hive union type nested in a struct is detected") {
39+
val column = new FieldSchema("c", "struct<a:uniontype<int,string>>", null)
40+
checkError(
41+
exception = intercept[SparkUnsupportedOperationException] {
42+
HiveClientImpl.fromHiveColumn(column)
43+
},
44+
condition = "UNSUPPORTED_HIVE_TYPE",
45+
parameters = Map(
46+
"fieldType" -> "\"STRUCT<A:UNIONTYPE<INT,STRING>>\"",
47+
"fieldName" -> "`c`"))
48+
}
49+
}

0 commit comments

Comments
 (0)