Describe the bug
bit_length and octet_length are wired as plain CometScalarFunction("bit_length") / CometScalarFunction("octet_length") in QueryPlanSerde.scala with no BinaryType guard, so both report Compatible(None) for BinaryType input. However DataFusion's BitLengthFunc and OctetLengthFunc use a Signature::coercible(... logical_string() ...) and reject Binary at execution time. The net effect: bit_length(<binary>) and octet_length(<binary>) plan successfully under Comet, then surface as a native execution error rather than falling back cleanly to Spark.
For contrast, length (also handled by Comet) explicitly guards BinaryType in CometLength.getSupportLevel and falls back to Spark.
Surfaced by the string-expressions audit in #4461.
Steps to reproduce
CREATE TABLE t(b binary) USING parquet;
INSERT INTO t VALUES (X'48656c6c6f');
SELECT bit_length(b) FROM t;
SELECT octet_length(b) FROM t;
Spark: returns 40 and 5.
Comet: native execution error from DataFusion's BitLengthFunc / OctetLengthFunc signature check.
Expected behavior
Either guard BinaryType in dedicated CometBitLength / CometOctetLength serdes (mirroring CometLength), or wire to a Comet-side UDF that supports both Utf8 and Binary (the underlying arrow::compute::bit_length / length kernels do support Binary natively).
Additional context
- Wiring:
QueryPlanSerde.scala lines 176 (bit_length) and 187 (octet_length).
- Existing guard pattern:
CometLength in spark/src/main/scala/org/apache/comet/serde/strings.scala.
- Spark accepts
(StringType|BinaryType) -> IntegerType for both expressions across 3.4.3, 3.5.8, and 4.0.1.
Describe the bug
bit_lengthandoctet_lengthare wired as plainCometScalarFunction("bit_length")/CometScalarFunction("octet_length")inQueryPlanSerde.scalawith noBinaryTypeguard, so both reportCompatible(None)forBinaryTypeinput. However DataFusion'sBitLengthFuncandOctetLengthFuncuse aSignature::coercible(... logical_string() ...)and reject Binary at execution time. The net effect:bit_length(<binary>)andoctet_length(<binary>)plan successfully under Comet, then surface as a native execution error rather than falling back cleanly to Spark.For contrast,
length(also handled by Comet) explicitly guardsBinaryTypeinCometLength.getSupportLeveland falls back to Spark.Surfaced by the string-expressions audit in #4461.
Steps to reproduce
Spark: returns
40and5.Comet: native execution error from DataFusion's
BitLengthFunc/OctetLengthFuncsignature check.Expected behavior
Either guard
BinaryTypein dedicatedCometBitLength/CometOctetLengthserdes (mirroringCometLength), or wire to a Comet-side UDF that supports bothUtf8andBinary(the underlyingarrow::compute::bit_length/lengthkernels do support Binary natively).Additional context
QueryPlanSerde.scalalines 176 (bit_length) and 187 (octet_length).CometLengthinspark/src/main/scala/org/apache/comet/serde/strings.scala.(StringType|BinaryType) -> IntegerTypefor both expressions across 3.4.3, 3.5.8, and 4.0.1.