Skip to content

Commit

Permalink
[SPARK-45891][SQL][FOLLOW-UP] Added length check to the is_variant_nu…
Browse files Browse the repository at this point in the history
…ll expression

### What changes were proposed in this pull request?

Added a check in the `is_variant_null` expression where the length of the value field is verified to be greater than zero. If the length is zero, a `MALFORMED_VARIANT` exception is thrown.

### Why are the changes needed?

Earlier, `is_variant_null` was simply checking if the first byte of a variant value was zero. However, if the value field is empty, the first byte logically doesn't exist and therefore, it could result in undefined behavior. Such a case should ideally never be seen but it could appear in the case of data corruption.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Additional unit test to check if the zero-length variant throws an exception.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46311 from harshmotw-db/is_variant_null_fix.

Authored-by: Harsh Motwani <harsh.motwani@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
harshmotw-db authored and dongjoon-hyun committed May 2, 2024
1 parent 04f3a93 commit 0fc7c4a
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,13 @@ object VariantExpressionEvalUtils {
false
} else {
val variantValue = input.getValue
// Variant NULL is denoted by basic_type == 0 and val_header == 0
variantValue(0) == 0
}
if (variantValue.isEmpty) {
throw QueryExecutionErrors.malformedVariant()
} else {
// Variant NULL is denoted by basic_type == 0 and val_header == 0
variantValue(0) == 0
}
}
}

/** Cast a Spark value from `dataType` into the variant type. */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2727,6 +2727,11 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE
messageParameters = Map("path" -> path, "functionName" -> toSQLId(functionName)))
}

def malformedVariant(): Throwable = new SparkRuntimeException(
"MALFORMED_VARIANT",
Map.empty
)

def invalidCharsetError(functionName: String, charset: String): RuntimeException = {
new SparkIllegalArgumentException(
errorClass = "INVALID_PARAMETER_VALUE.CHARSET",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,13 @@ class VariantExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
check(expectedResult4, smallObject, smallMetadata)
}

test("is_variant_null invalid input") {
checkErrorInExpression[SparkRuntimeException](
IsVariantNull(Literal(new VariantVal(Array(), Array(1, 2, 3)))),
"MALFORMED_VARIANT"
)
}

private def parseJson(input: String): VariantVal =
VariantExpressionEvalUtils.parseJson(UTF8String.fromString(input))

Expand Down

0 comments on commit 0fc7c4a

Please sign in to comment.