Skip to content

Commit

Permalink
[SPARK-28371][SQL] Make Parquet "StartsWith" filter null-safe
Browse files Browse the repository at this point in the history
Parquet may call the filter with a null value to check whether nulls are
accepted. While it seems Spark avoids that path in Parquet with 1.10, in
1.11 that causes Spark unit tests to fail.

Tested with Parquet 1.11 (and new unit test).

Closes #25140 from vanzin/SPARK-28371.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 7f9da2b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
Marcelo Vanzin authored and dongjoon-hyun committed Jul 13, 2019
1 parent 1a6a67f commit 98aebf4
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
Expand Up @@ -541,7 +541,7 @@ private[parquet] class ParquetFilters(
}

override def keep(value: Binary): Boolean = {
UTF8String.fromBytes(value.getBytes).startsWith(
value != null && UTF8String.fromBytes(value.getBytes).startsWith(
UTF8String.fromBytes(strToBinary.getBytes))
}
}
Expand Down
Expand Up @@ -955,6 +955,14 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
}
}

// SPARK-28371: make sure filter is null-safe.
withParquetDataFrame(Seq(Tuple1[String](null))) { implicit df =>
checkFilterPredicate(
'_1.startsWith("blah").asInstanceOf[Predicate],
classOf[UserDefinedByInstance[_, _]],
Seq.empty[Row])
}

import testImplicits._
// Test canDrop() has taken effect
testStringStartsWith(spark.range(1024).map(_.toString).toDF(), "value like 'a%'")
Expand Down

0 comments on commit 98aebf4

Please sign in to comment.