Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -820,7 +820,9 @@ class VariantSuite extends QueryTest with SharedSparkSession with ExpressionEval
// The initial size of the buffer backing a cached dataframe column is 128KB.
// See `ColumnBuilder`.
val numKeys = 128 * 1024
val keyIterator = (0 until numKeys).iterator
// We start in long range because the shredded writer writes int64 by default which wouldn't
// match narrower binaries.
val keyIterator = (Int.MaxValue + 1L until Int.MaxValue + 1L + numKeys).iterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression because it increases the memory requirement heavily, doesn't it, @harshmotw-db ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun it's still numKeys elements, it just changes the starting value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Initially, I worried about the size because numKeys was 128k. After I computed the actual values, I also realized that it doesn't increase effectively.

val numKeys = 128 * 1024
val keyIterator1 = (0 until numKeys).iterator
val keyIterator2 = (Int.MaxValue + 1L until Int.MaxValue + 1L + numKeys).iterator
val entries1 = Array.fill(numKeys)(s"""\"${keyIterator1.next()}\": \"test\"""")
val entries2 = Array.fill(numKeys)(s"""\"${keyIterator2.next()}\": \"test\"""")
val jsonStr1 = s"{${entries1.mkString(", ")}}"
val jsonStr2 = s"{${entries2.mkString(", ")}}"

scala> jsonStr1.length
val res1: Int = 2248186

scala> jsonStr2.length
val res2: Int = 2883584

Thanks.

val entries = Array.fill(numKeys)(s"""\"${keyIterator.next()}\": \"test\"""")
val jsonStr = s"{${entries.mkString(", ")}}"
val query = s"""select named_struct(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,8 @@ class ParquetVariantShreddingSuite extends QueryTest with ParquetTest with Share
"struct<value binary, typed_value int>>>"
withSQLConf(SQLConf.VARIANT_WRITE_SHREDDING_ENABLED.key -> true.toString,
SQLConf.VARIANT_ALLOW_READING_SHREDDED.key -> true.toString,
SQLConf.VARIANT_FORCE_SHREDDING_SCHEMA_FOR_TEST.key -> schema) {
SQLConf.VARIANT_FORCE_SHREDDING_SCHEMA_FOR_TEST.key -> schema,
SQLConf.PARQUET_IGNORE_VARIANT_ANNOTATION.key -> true.toString) {
df.write.mode("overwrite").parquet(dir.getAbsolutePath)

// Verify that we can read the full variant.
Expand Down