Skip to content

[VL] "invalid regular expression" error on split function #11016

@wForget

Description

@wForget

Backend

VL (Velox)

Bug description

sql:

set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding;
select split('text', '(?<=\\}),(?=\\{)');

error:

Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: invalid regular expression:invalid perl operator: (?<
Retriable: False
Context: Top-level Expression: split(text:VARCHAR, (?<=\}),(?=\{):VARCHAR, -1:INTEGER)
Function: operator()
File: /work/ep/build-velox/build/velox_ep/velox/functions/lib/Re2Functions.cpp
Line: 66
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_14VeloxUserErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox9functions6detail7ReCache13findOrCompileERKNS0_10StringViewE
# 4  _ZN8facebook5velox9functions8sparksql5SplitINS0_4exec10VectorExecEE5splitERNS4_11ArrayWriterINS0_7VarcharEEERKNS0_10StringViewESD_i
# 5  _ZNK8facebook5velox4exec21SimpleFunctionAdapterINS0_4core9UDFHolderINS0_9functions8sparksql5SplitINS1_10VectorExecEEES8_NS0_5ArrayINS0_7VarcharEEENS0_15ConstantCheckerIJSB_SB_iEEEJSB_SB_iEEEE31unpackSpecializeForAllEncodingsILi0EJEEEvRNSG_12ApplyContextERKSt6vectorISt10shared_ptrINS0_10BaseVectorEESaISN_EEDpRT0_
# 6  _ZNK8facebook5velox4exec21SimpleFunctionAdapterINS0_4core9UDFHolderINS0_9functions8sparksql5SplitINS1_10VectorExecEEES8_NS0_5ArrayINS0_7VarcharEEENS0_15ConstantCheckerIJSB_SB_iEEEJSB_SB_iEEEE5applyERKNS0_17SelectivityVectorERSt6vectorISt10shared_ptrINS0_10BaseVectorEESaISN_EERKSL_IKNS0_4TypeEERNS1_7EvalCtxERSN_
# 7  _ZN8facebook5velox4exec4Expr13applyFunctionERKNS0_17SelectivityVectorERNS1_7EvalCtxERSt10shared_ptrINS0_10BaseVectorEE
# 8  _ZN8facebook5velox4exec4Expr24applyFunctionWithPeelingERKNS0_17SelectivityVectorERNS1_7EvalCtxERSt10shared_ptrINS0_10BaseVectorEE
# 9  _ZN8facebook5velox4exec4Expr11evalAllImplERKNS0_17SelectivityVectorERNS1_7EvalCtxERSt10shared_ptrINS0_10BaseVectorEE
# 10 _ZN8facebook5velox4exec4Expr4evalERKNS0_17SelectivityVectorERNS1_7EvalCtxERSt10shared_ptrINS0_10BaseVectorEEPKNS1_7ExprSetE
# 11 _ZN8facebook5velox4exec7ExprSet4evalEiibRKNS0_17SelectivityVectorERNS1_7EvalCtxERSt6vectorISt10shared_ptrINS0_10BaseVectorEESaISB_EE
# 12 _ZN8facebook5velox4exec13FilterProject7projectERKNS0_17SelectivityVectorERNS1_7EvalCtxE
# 13 _ZN8facebook5velox4exec13FilterProject9getOutputEv
# 14 _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE8_clEv
# 15 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 16 _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEERPNS1_8OperatorERNS1_14BlockingReasonE
# 17 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 18 _ZN6gluten24WholeStageResultIterator4nextEv
# 19 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 20 0x00007fbe2d78d427

Gluten version

main branch

Spark version

Spark-3.5.x

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriage

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions