Describe the bug
CometStringRepeat delegates to DataFusion repeat. DataFusion's repeat throws on negative n, while Spark's UTF8String.repeat returns the empty string for n <= 0. Comet currently reports Compatible for this expression (with a getCompatibleNotes caveat), so users with repeat(s, -1) get a runtime exception under Comet instead of the empty string Spark would produce.
Surfaced by the string-expressions audit in #4461.
Steps to reproduce
SELECT repeat('abc', -1);
Spark: returns ''.
Comet: throws ArrowError("Invalid argument error: repeat requires a non-negative number of repetitions") at execution.
Expected behavior
Either match Spark by returning '', or promote CometStringRepeat to Incompatible(Some(...)) so the path falls back unless explicitly enabled via spark.comet.expression.StringRepeat.allowIncompatible=true.
Additional context
- Comet serde:
spark/src/main/scala/org/apache/comet/serde/strings.scala (CometStringRepeat)
- Spark reference:
UTF8String.repeat(n) short-circuits for n <= 0
- The current
getCompatibleNotes text mentions the divergence but the support level is still Compatible, so the path is taken silently.
Describe the bug
CometStringRepeatdelegates to DataFusionrepeat. DataFusion'srepeatthrows on negativen, while Spark'sUTF8String.repeatreturns the empty string forn <= 0. Comet currently reportsCompatiblefor this expression (with agetCompatibleNotescaveat), so users withrepeat(s, -1)get a runtime exception under Comet instead of the empty string Spark would produce.Surfaced by the string-expressions audit in #4461.
Steps to reproduce
Spark: returns
''.Comet: throws
ArrowError("Invalid argument error: repeat requires a non-negative number of repetitions")at execution.Expected behavior
Either match Spark by returning
'', or promoteCometStringRepeattoIncompatible(Some(...))so the path falls back unless explicitly enabled viaspark.comet.expression.StringRepeat.allowIncompatible=true.Additional context
spark/src/main/scala/org/apache/comet/serde/strings.scala(CometStringRepeat)UTF8String.repeat(n)short-circuits forn <= 0getCompatibleNotestext mentions the divergence but the support level is stillCompatible, so the path is taken silently.