Describe the bug
For a null operand against an empty IN list, Comet always returns false. Spark returns null when spark.sql.legacy.nullInEmptyListBehavior=true. Comet ignores the config and the null operand.
Steps to reproduce
Empty IN lists are not expressible in SQL, so build via the DataFrame API; ConvertToLocalRelation and OptimizeIn must be disabled or the IN is folded away. Add to a suite extending CometTestBase (config exists in Spark 4+):
test("null IN empty list under legacy behavior") {
assume(org.apache.comet.CometSparkSessionExtensions.isSpark40Plus)
withSQLConf(
"spark.sql.optimizer.excludedRules" ->
("org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation," +
"org.apache.spark.sql.catalyst.optimizer.OptimizeIn"),
"spark.sql.legacy.nullInEmptyListBehavior" -> "true") {
val data: Seq[(Integer, Integer)] =
Seq((Integer.valueOf(1), Integer.valueOf(1)), (null, Integer.valueOf(2)))
withParquetTable(data, "t") {
val df = sql("SELECT _1 AS a FROM t").select(col("a"), col("a").isin())
checkSparkAnswer(df)
}
}
}
== Results ==
!== Spark Answer - 2 == == Comet Answer - 2 ==
struct<a:int,(a IN ()):boolean> struct<a:int,(a IN ()):boolean>
[1,false] [1,false]
![null,null] [null,false]
Expected behavior
null IN () returns null under legacy behavior, matching Spark. (Under ANSI-on, the Spark 4 default, false is expected and Comet already matches, so only the legacy path is affected.)
Additional context
Found while enabling CometLocalTableScanExec by default (#4393), but reproduces over a Parquet scan. Upstream test: EmptyInSuite "IN with empty list".
Describe the bug
For a null operand against an empty IN list, Comet always returns
false. Spark returnsnullwhenspark.sql.legacy.nullInEmptyListBehavior=true. Comet ignores the config and the null operand.Steps to reproduce
Empty IN lists are not expressible in SQL, so build via the DataFrame API;
ConvertToLocalRelationandOptimizeInmust be disabled or the IN is folded away. Add to a suite extendingCometTestBase(config exists in Spark 4+):Expected behavior
null IN ()returnsnullunder legacy behavior, matching Spark. (Under ANSI-on, the Spark 4 default,falseis expected and Comet already matches, so only the legacy path is affected.)Additional context
Found while enabling
CometLocalTableScanExecby default (#4393), but reproduces over a Parquet scan. Upstream test:EmptyInSuite"IN with empty list".