feat: add native support for cardinality() expression#4376
Conversation
…p inputs test: update cardinality test suite
| SELECT cardinality(struct_arr) FROM test_cardinality | ||
|
|
||
| -- literal array and map arguments (spark_answer_only: CreateArray/CreateMap not yet natively supported) | ||
| query spark_answer_only |
There was a problem hiding this comment.
actually we should support literals for array 🤔
There was a problem hiding this comment.
Please avoid tests when not selecting from the parquet table, Comet is not needed for such scalar queries
There was a problem hiding this comment.
Sorry about those mistakes, I've got those removed now!
|
I found the below when looking into the build errors: CometIfSuite:
I don't think these are related to any changes I've made but If I am missing something on this just let me know and I can dig into it further. Thank you for the review on this I really appreciate it! |
|
Apologies, it looks like the errors are related to CometMapExpressionSuite:
I'll dig into this more on my side, thanks again for the help! |
|
I dug into this a bit more, and the underlying cardinality support is coming through Size. Enabling MapType there does allow cardinality/size on existing map-typed inputs, but constructor cases like cardinality(map(...)) still fall back because CreateMap is not supported yet. Given that, I’m going to hold off on developing this for now and wait until CreateMap support is in place so the feature boundary is cleaner and the map support story is more consistent. |
Which issue does this PR close?
Closes #.
Rationale for this change
cardinalityis a commonly used Spark SQL function (since 2.4.0) that was not yet supported natively by Comet, causing fallback to Spark execution. It returns the number of elements in an array or the number of key-value pairs in a map.Per the Spark docs,
cardinalityreturns-1for null input only when bothspark.sql.legacy.sizeOfNull=trueANDspark.sql.ansi.enabled=false. With default settings it always returnsNULLfor null input. In Spark's implementation,cardinalityis a direct alias for theSizeexpression withlegacySizeOfNull = falsehardcoded at parse time — the two-config gate never applies tocardinality.What changes are included in this PR?
Adds native support for
cardinality(expr)for both array and map inputs by extending the existingCometSizeserde. There was no separateCardinalityclass to wire — bothcardinalityandsizeproduce aSizenode in the logical plan. Two gaps in the Scala layer and one in the Rust layer preventedcardinalityfrom offloading:CometSize.getSupportLevelwas returningUnsupportedforMapTypeinputsCometSize.convertwas readingSQLConf.get.legacySizeOfNull(the global config) rather thanexpr.legacySizeOfNull(the instance field set at parse time), socardinalitycould incorrectly return-1forNULLwhen the legacy config was enabledSparkSizeFuncin the Rust layer usedSignature::uniformwith exact type stubs, causing DataFusion's type coercion to reject real map columns with concrete inner field typesChanges:
CometSizeinarrays.scalato acceptMapTypeasCompatible()CometSize.convertto useexpr.legacySizeOfNullinstead of the global configSparkSizeFuncsignature toSignature::any(1)so real map and array types are accepted regardless of inner field typesSparkSizeFuncto appendnullinstead of hardcoded-1spark_answer_onlyfrom thesize(arr), size(m)column query insize.sqlsince map inputs now run nativelySizerow inexpressions.mdto note it also covers thecardinalitySQL aliasThe
implement-comet-expressionandwire-datafusion-functionskills were used to scaffold this implementation.How are these changes tested?
New Comet SQL Test at
spark/src/test/resources/sql-tests/expressions/array/cardinality.sql. Covers:NULLarray and map inputs via column reference (assertsNULLis returned, not-1)array<array<int>>)array<struct<a: int>>)spark_answer_only—CreateArray/CreateMapnot yet natively supported by Comet, but correctness is verified against Spark)The existing
size.sqltest now also covers the map-input path natively (removedspark_answer_only).