Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1593,6 +1593,15 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val PARQUET_IGNORE_VARIANT_ANNOTATION =
buildConf("spark.sql.parquet.ignoreVariantAnnotation")
.internal()
.doc("When true, ignore the variant logical type annotation and treat the Parquet " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mark this conf as .internal()? I think the main use case is to simplify debugging issues with the raw variant bytes, but let me know if there's a reason for this conf that I'm missing. Assuming my understanding is right, maybe we can also mention the intended use case in the doc comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

"column in the same way as the underlying struct type")
.version("4.1.0")
Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a bug fix, this should be 4.0.2, @harshmotw-db and @cloud-fan .

Copy link
Contributor

@cloud-fan cloud-fan Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the parquet version we use in Spark 4.0 has the variant logical type. I'll leave it to @harshmotw-db

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the parquet version we use in Spark 4.0 has the variant logical type. I'll leave it to @harshmotw-db

Thanks. We can continue our discussion if we are not sure. AFAIK, it means there is no regression at Apache Spark 4.1.0 from Apache Spark 4.0.0.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, for the improvement, this should be 4.2.0 according to the Apache Spark community policy, @harshmotw-db and @cloud-fan .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Spark 4.1 has upgraded the parquet version which has logical variant type, I think 4.1 should support reading parquet files with native variant type fields?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we can say that it's still simply unsupported feature like we did in Apache Spark 4.0.0 variant. It's too late if this is an improvement, @cloud-fan .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR practically is a fix already. This PR added a temporary workaround for reading variant data mainly for testing purposes (see this line). Essentially, the existing code behaves as if ignoreVariantAnnotation = false. This PR just implements this code more formally so we actually do make sure that the target type matches the actual parquet type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you revise the PR title more properly which looks like a fix literally, @harshmotw-db ?

Copy link
Contributor Author

@harshmotw-db harshmotw-db Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the ParquetRowConverter fix is essential since currently, when VARIANT_ALLOW_READING_SHREDDED = false, the reader is broken when the parquet schema is struct<metadata, value> instead of struct<value, metadata>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, in practice it is a fix. I need to head out for an hour and I will change the PR title after that

.booleanConf
.createWithDefault(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this should be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly for debugging purposes if we need to extract the raw variant bytes by specifying the schema as say struct<value: Binary, metadata: Binary>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for that purpose, let's remove this configuration. You can use logDebug instead.

Copy link
Contributor Author

@harshmotw-db harshmotw-db Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but I don't think logDebug would be helpful here if we want to extract variant columns into a custom schema in a Spark DataFrame. This config is a good tool to debug issues in a Parquet file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why you think it that way? You told me that It's mainly for debugging purposes, right?

Correct me if I'm wrong but I don't think logDebug would be helpful here if we want to extract variant columns into a custom schema in a Spark DataFrame. This config is a good tool to debug issues in a Parquet file

Copy link
Contributor Author

@harshmotw-db harshmotw-db Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new test variant logical type annotation - ignore variant annotation to demonstrate this point.

So, if the ignoreVariantAnnotation config is enabled, you can read a parquet file with an underlying variant column into a struct of binaries schema. So for a variant column v, you could run:
spark.read.format("parquer").schema("v struct<value: BINARY, metadata: BINARY>").load(...) and it would load the value and metadata columns into these fields even though the data is logically not a struct of two binaries but is instead a variant. People could use this to debug the physical variant values.

If the config is disabled, which is the default, this read would give an error and you would need to read variant columns into a variant schema.


val PARQUET_FIELD_ID_READ_ENABLED =
buildConf("spark.sql.parquet.fieldId.read.enabled")
.doc("Field ID is a native field of the Parquet schema spec. When enabled, Parquet readers " +
Expand Down Expand Up @@ -5585,7 +5594,7 @@ object SQLConf {
"When false, it only reads unshredded variant.")
.version("4.0.0")
.booleanConf
.createWithDefault(false)
.createWithDefault(true)

val PUSH_VARIANT_INTO_SCAN =
buildConf("spark.sql.variant.pushVariantIntoScan")
Expand Down Expand Up @@ -7802,6 +7811,8 @@ class SQLConf extends Serializable with Logging with SqlApiConf {

def parquetAnnotateVariantLogicalType: Boolean = getConf(PARQUET_ANNOTATE_VARIANT_LOGICAL_TYPE)

def parquetIgnoreVariantAnnotation: Boolean = getConf(SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION)

def ignoreMissingParquetFieldId: Boolean = getConf(SQLConf.IGNORE_MISSING_PARQUET_FIELD_ID)

def legacyParquetNanosAsLong: Boolean = getConf(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -876,7 +876,11 @@ private[parquet] class ParquetRowConverter(
}
}

/** Parquet converter for unshredded Variant */
/**
* Parquet converter for unshredded Variant. We use this converter when the
* `spark.sql.variant.allowReadingShredded` config is set to false. This option just exists to
* fall back to legacy logic which will eventually be removed.
*/
private final class ParquetUnshreddedVariantConverter(
parquetType: GroupType,
updater: ParentContainerUpdater)
Expand All @@ -890,29 +894,27 @@ private[parquet] class ParquetRowConverter(
// We may allow more than two children in the future, so consider this unsupported.
throw QueryCompilationErrors.invalidVariantWrongNumFieldsError()
}
val valueAndMetadata = Seq("value", "metadata").map { colName =>
val Seq(value, metadata) = Seq("value", "metadata").map { colName =>
val idx = (0 until parquetType.getFieldCount())
.find(parquetType.getFieldName(_) == colName)
if (idx.isEmpty) {
throw QueryCompilationErrors.invalidVariantMissingFieldError(colName)
}
val child = parquetType.getType(idx.get)
.find(parquetType.getFieldName(_) == colName)
.getOrElse(throw QueryCompilationErrors.invalidVariantMissingFieldError(colName))
val child = parquetType.getType(idx)
if (!child.isPrimitive || child.getRepetition != Type.Repetition.REQUIRED ||
child.asPrimitiveType().getPrimitiveTypeName != BINARY) {
child.asPrimitiveType().getPrimitiveTypeName != BINARY) {
throw QueryCompilationErrors.invalidVariantNullableOrNotBinaryFieldError(colName)
}
child
idx
}
Array(
// Converter for value
newConverter(valueAndMetadata(0), BinaryType, new ParentContainerUpdater {
val result = new Array[Converter with HasParentContainerUpdater](2)
result(value) =
newConverter(parquetType.getType(value), BinaryType, new ParentContainerUpdater {
override def set(value: Any): Unit = currentValue = value
}),

// Converter for metadata
newConverter(valueAndMetadata(1), BinaryType, new ParentContainerUpdater {
})
result(metadata) =
newConverter(parquetType.getType(metadata), BinaryType, new ParentContainerUpdater {
override def set(value: Any): Unit = currentMetadata = value
}))
})
result
}

override def getConverter(fieldIndex: Int): Converter = converters(fieldIndex)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,18 @@ class ParquetToSparkSchemaConverter(
caseSensitive: Boolean = SQLConf.CASE_SENSITIVE.defaultValue.get,
inferTimestampNTZ: Boolean = SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.defaultValue.get,
nanosAsLong: Boolean = SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.defaultValue.get,
useFieldId: Boolean = SQLConf.PARQUET_FIELD_ID_READ_ENABLED.defaultValue.get) {
useFieldId: Boolean = SQLConf.PARQUET_FIELD_ID_READ_ENABLED.defaultValue.get,
val ignoreVariantAnnotation: Boolean =
SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION.defaultValue.get) {

def this(conf: SQLConf) = this(
assumeBinaryIsString = conf.isParquetBinaryAsString,
assumeInt96IsTimestamp = conf.isParquetINT96AsTimestamp,
caseSensitive = conf.caseSensitiveAnalysis,
inferTimestampNTZ = conf.parquetInferTimestampNTZEnabled,
nanosAsLong = conf.legacyParquetNanosAsLong,
useFieldId = conf.parquetFieldIdReadEnabled)
useFieldId = conf.parquetFieldIdReadEnabled,
ignoreVariantAnnotation = conf.parquetIgnoreVariantAnnotation)

def this(conf: Configuration) = this(
assumeBinaryIsString = conf.get(SQLConf.PARQUET_BINARY_AS_STRING.key).toBoolean,
Expand All @@ -75,7 +78,9 @@ class ParquetToSparkSchemaConverter(
inferTimestampNTZ = conf.get(SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.key).toBoolean,
nanosAsLong = conf.get(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.key).toBoolean,
useFieldId = conf.getBoolean(SQLConf.PARQUET_FIELD_ID_READ_ENABLED.key,
SQLConf.PARQUET_FIELD_ID_READ_ENABLED.defaultValue.get))
SQLConf.PARQUET_FIELD_ID_READ_ENABLED.defaultValue.get),
ignoreVariantAnnotation = conf.getBoolean(SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION.key,
SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION.defaultValue.get))

/**
* Converts Parquet [[MessageType]] `parquetSchema` to a Spark SQL [[StructType]].
Expand Down Expand Up @@ -202,15 +207,17 @@ class ParquetToSparkSchemaConverter(
case primitiveColumn: PrimitiveColumnIO => convertPrimitiveField(primitiveColumn, targetType)
case groupColumn: GroupColumnIO if targetType.contains(VariantType) =>
if (SQLConf.get.getConf(SQLConf.VARIANT_ALLOW_READING_SHREDDED)) {
val col = convertGroupField(groupColumn)
// We need the underlying file type regardless of the config.
val col = convertGroupField(groupColumn, ignoreVariantAnnotation = true)
col.copy(sparkType = VariantType, variantFileType = Some(col))
} else {
convertVariantField(groupColumn)
}
case groupColumn: GroupColumnIO if targetType.exists(VariantMetadata.isVariantStruct) =>
val col = convertGroupField(groupColumn)
val col = convertGroupField(groupColumn, ignoreVariantAnnotation = true)
col.copy(sparkType = targetType.get, variantFileType = Some(col))
case groupColumn: GroupColumnIO => convertGroupField(groupColumn, targetType)
case groupColumn: GroupColumnIO =>
convertGroupField(groupColumn, ignoreVariantAnnotation, targetType)
}
}

Expand Down Expand Up @@ -349,6 +356,7 @@ class ParquetToSparkSchemaConverter(

private def convertGroupField(
groupColumn: GroupColumnIO,
ignoreVariantAnnotation: Boolean,
sparkReadType: Option[DataType] = None): ParquetColumn = {
val field = groupColumn.getType.asGroupType()

Expand All @@ -373,9 +381,21 @@ class ParquetToSparkSchemaConverter(

Option(field.getLogicalTypeAnnotation).fold(
convertInternal(groupColumn, sparkReadType.map(_.asInstanceOf[StructType]))) {
// Temporary workaround to read Shredded variant data
case v: VariantLogicalTypeAnnotation if v.getSpecVersion == 1 && sparkReadType.isEmpty =>
convertInternal(groupColumn, None)
case v: VariantLogicalTypeAnnotation if v.getSpecVersion == 1 =>
if (ignoreVariantAnnotation) {
convertInternal(groupColumn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reason why we need to maintain this logic for pure debugging purpose.

Copy link
Contributor Author

@harshmotw-db harshmotw-db Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
ParquetSchemaConverter.checkConversionRequirement(
sparkReadType.forall(_.isInstanceOf[VariantType]),
s"Invalid Spark read type: expected $field to be variant type but found " +
s"${if (sparkReadType.isEmpty) { "None" } else {sparkReadType.get.sql} }")
if (SQLConf.get.getConf(SQLConf.VARIANT_ALLOW_READING_SHREDDED)) {
val col = convertInternal(groupColumn)
col.copy(sparkType = VariantType, variantFileType = Some(col))
} else {
convertVariantField(groupColumn)
}
}

// A Parquet list is represented as a 3-level structure:
//
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,9 @@ case object SparkShreddingUtils {
def parquetTypeToSparkType(parquetType: ParquetType): DataType = {
val messageType = ParquetTypes.buildMessage().addField(parquetType).named("foo")
val column = new ColumnIOFactory().getColumnIO(messageType)
new ParquetToSparkSchemaConverter().convertField(column.getChild(0)).sparkType
// We need the underlying file type regardless of the ignoreVariantAnnotation config.
val converter = new ParquetToSparkSchemaConverter(ignoreVariantAnnotation = true)
converter.convertField(column.getChild(0)).sparkType
}

class SparkShreddedResult(schema: VariantSchema) extends VariantShreddingWriter.ShreddedResult {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ import org.apache.parquet.hadoop.util.HadoopInputFile
import org.apache.parquet.schema.{LogicalTypeAnnotation, PrimitiveType, Type}
import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName

import org.apache.spark.sql.{QueryTest, Row}
import org.apache.spark.SparkException
import org.apache.spark.sql.{AnalysisException, QueryTest, Row}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType
import org.apache.spark.sql.test.SharedSparkSession
Expand Down Expand Up @@ -160,64 +161,126 @@ class ParquetVariantShreddingSuite extends QueryTest with ParquetTest with Share
Seq(false, true).foreach { annotateVariantLogicalType =>
Seq(false, true).foreach { shredVariant =>
Seq(false, true).foreach { allowReadingShredded =>
withSQLConf(SQLConf.VARIANT_WRITE_SHREDDING_ENABLED.key -> shredVariant.toString,
SQLConf.VARIANT_INFER_SHREDDING_SCHEMA.key -> shredVariant.toString,
SQLConf.VARIANT_ALLOW_READING_SHREDDED.key ->
(allowReadingShredded || shredVariant).toString,
SQLConf.PARQUET_ANNOTATE_VARIANT_LOGICAL_TYPE.key ->
annotateVariantLogicalType.toString) {
def validateAnnotation(g: Type): Unit = {
if (annotateVariantLogicalType) {
assert(g.getLogicalTypeAnnotation == LogicalTypeAnnotation.variantType(1))
} else {
assert(g.getLogicalTypeAnnotation == null)
Seq(false, true).foreach { ignoreVariantAnnotation =>
withSQLConf(SQLConf.VARIANT_WRITE_SHREDDING_ENABLED.key -> shredVariant.toString,
SQLConf.VARIANT_INFER_SHREDDING_SCHEMA.key -> shredVariant.toString,
SQLConf.VARIANT_ALLOW_READING_SHREDDED.key ->
(allowReadingShredded || shredVariant).toString,
SQLConf.PARQUET_ANNOTATE_VARIANT_LOGICAL_TYPE.key ->
annotateVariantLogicalType.toString,
SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION.key -> ignoreVariantAnnotation.toString) {
def validateAnnotation(g: Type): Unit = {
if (annotateVariantLogicalType) {
assert(g.getLogicalTypeAnnotation == LogicalTypeAnnotation.variantType(1))
} else {
assert(g.getLogicalTypeAnnotation == null)
}
}
withTempDir { dir =>
// write parquet file
val df = spark.sql(
"""
| select
| id * 2 i,
| to_variant_object(named_struct('id', id)) v,
| named_struct('i', (id * 2)::string,
| 'nv', to_variant_object(named_struct('id', 30 + id))) ns,
| array(to_variant_object(named_struct('id', 10 + id))) av,
| map('v2', to_variant_object(named_struct('id', 20 + id))) mv
| from range(0,3,1,1)""".stripMargin)
df.write.mode("overwrite").parquet(dir.getAbsolutePath)
val file = dir.listFiles().find(_.getName.endsWith(".parquet")).get
val parquetFilePath = file.getAbsolutePath
val inputFile = HadoopInputFile.fromPath(new Path(parquetFilePath),
new Configuration())
val reader = ParquetFileReader.open(inputFile)
val footer = reader.getFooter
val schema = footer.getFileMetaData.getSchema
val vGroup = schema.getType(schema.getFieldIndex("v"))
validateAnnotation(vGroup)
assert(vGroup.asGroupType().getFields.asScala.toSeq
.exists(_.getName == "typed_value") == shredVariant)
val nsGroup = schema.getType(schema.getFieldIndex("ns")).asGroupType()
val nvGroup = nsGroup.getType(nsGroup.getFieldIndex("nv"))
validateAnnotation(nvGroup)
val avGroup = schema.getType(schema.getFieldIndex("av")).asGroupType()
val avList = avGroup.getType(avGroup.getFieldIndex("list")).asGroupType()
val avElement = avList.getType(avList.getFieldIndex("element"))
validateAnnotation(avElement)
val mvGroup = schema.getType(schema.getFieldIndex("mv")).asGroupType()
val mvList = mvGroup.getType(mvGroup.getFieldIndex("key_value")).asGroupType()
val mvValue = mvList.getType(mvList.getFieldIndex("value"))
validateAnnotation(mvValue)
// verify result
val result = spark.read.format("parquet")
.schema("v variant, ns struct<nv variant>, av array<variant>, " +
"mv map<string, variant>")
.load(dir.getAbsolutePath)
.selectExpr("v:id::int i1", "ns.nv:id::int i2", "av[0]:id::int i3",
"mv['v2']:id::int i4")
checkAnswer(result, Array(Row(0, 30, 10, 20), Row(1, 31, 11, 21),
Row(2, 32, 12, 22)))
reader.close()
}
}
withTempDir { dir =>
// write parquet file
val df = spark.sql(
"""
| select
| id * 2 i,
| to_variant_object(named_struct('id', id)) v,
| named_struct('i', (id * 2)::string,
| 'nv', to_variant_object(named_struct('id', 30 + id))) ns,
| array(to_variant_object(named_struct('id', 10 + id))) av,
| map('v2', to_variant_object(named_struct('id', 20 + id))) mv
| from range(0,3,1,1)""".stripMargin)
df.write.mode("overwrite").parquet(dir.getAbsolutePath)
val file = dir.listFiles().find(_.getName.endsWith(".parquet")).get
val parquetFilePath = file.getAbsolutePath
val inputFile = HadoopInputFile.fromPath(new Path(parquetFilePath),
new Configuration())
val reader = ParquetFileReader.open(inputFile)
val footer = reader.getFooter
val schema = footer.getFileMetaData.getSchema
val vGroup = schema.getType(schema.getFieldIndex("v"))
validateAnnotation(vGroup)
assert(vGroup.asGroupType().getFields.asScala.toSeq
.exists(_.getName == "typed_value") == shredVariant)
val nsGroup = schema.getType(schema.getFieldIndex("ns")).asGroupType()
val nvGroup = nsGroup.getType(nsGroup.getFieldIndex("nv"))
validateAnnotation(nvGroup)
val avGroup = schema.getType(schema.getFieldIndex("av")).asGroupType()
val avList = avGroup.getType(avGroup.getFieldIndex("list")).asGroupType()
val avElement = avList.getType(avList.getFieldIndex("element"))
validateAnnotation(avElement)
val mvGroup = schema.getType(schema.getFieldIndex("mv")).asGroupType()
val mvList = mvGroup.getType(mvGroup.getFieldIndex("key_value")).asGroupType()
val mvValue = mvList.getType(mvList.getFieldIndex("value"))
validateAnnotation(mvValue)
// verify result
val result = spark.read.format("parquet")
.schema("v variant, ns struct<nv variant>, av array<variant>, " +
"mv map<string, variant>")
.load(dir.getAbsolutePath)
.selectExpr("v:id::int i1", "ns.nv:id::int i2", "av[0]:id::int i3",
"mv['v2']:id::int i4")
checkAnswer(result, Array(Row(0, 30, 10, 20), Row(1, 31, 11, 21), Row(2, 32, 12, 22)))
reader.close()
}
}
}
}
}

test("variant logical type annotation - ignore variant annotation") {
Seq(true, false).foreach { ignoreVariantAnnotation =>
withSQLConf(SQLConf.PARQUET_ANNOTATE_VARIANT_LOGICAL_TYPE.key -> "true",
SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION.key -> ignoreVariantAnnotation.toString
) {
withTempDir { dir =>
// write parquet file
val df = spark.sql(
"""
| select
| id * 2 i,
| 1::variant v,
| named_struct('i', (id * 2)::string, 'nv', 1::variant) ns,
| array(1::variant) av,
| map('v2', 1::variant) mv
| from range(0,1,1,1)""".stripMargin)
df.write.mode("overwrite").parquet(dir.getAbsolutePath)
// verify result
val normal_result = spark.read.format("parquet")
.schema("v variant, ns struct<nv variant>, av array<variant>, " +
"mv map<string, variant>")
.load(dir.getAbsolutePath)
.selectExpr("v::int i1", "ns.nv::int i2", "av[0]::int i3",
"mv['v2']::int i4")
checkAnswer(normal_result, Array(Row(1, 1, 1, 1)))
val struct_result = spark.read.format("parquet")
.schema("v struct<value binary, metadata binary>, " +
"ns struct<nv struct<value binary, metadata binary>>, " +
"av array<struct<value binary, metadata binary>>, " +
"mv map<string, struct<value binary, metadata binary>>")
.load(dir.getAbsolutePath)
.selectExpr("v", "ns.nv", "av[0]", "mv['v2']")
if (ignoreVariantAnnotation) {
checkAnswer(
struct_result,
Seq(Row(
Row(Array[Byte](12, 1), Array[Byte](1, 0, 0)),
Row(Array[Byte](12, 1), Array[Byte](1, 0, 0)),
Row(Array[Byte](12, 1), Array[Byte](1, 0, 0)),
Row(Array[Byte](12, 1), Array[Byte](1, 0, 0))
))
)
} else {
val exception = intercept[SparkException]{
struct_result.collect()
}
checkError(
exception = exception.getCause.asInstanceOf[AnalysisException],
condition = "_LEGACY_ERROR_TEMP_3071",
parameters = Map("msg" -> "Invalid Spark read type[\\s\\S]*"),
matchPVals = true
)
}
}
}
Expand Down