Backend
VL (Velox)
Bug description
Description
When a Project contains decimal arithmetic of the form <bigint_aggregate> / <integer-valued decimal literal>, the entire Project is incorrectly falled back to JVM by ColumnarPartialProjectRule, losing native Velox acceleration.
The root cause is in CheckOverflowTransformer: it uses original.child.dataType (Spark's declared type) instead of child.dataType (the Gluten transformer's actual output type) when deciding whether to insert a cast, generating a substrait plan that fails Velox SimpleFunction type validation.
Reproducer
CREATE TABLE t1 (val BIGINT) USING parquet;
CREATE TABLE t2 (val BIGINT) USING parquet;
INSERT INTO t1 VALUES (200);
INSERT INTO t2 VALUES (100), (100), (100), (100), (100);
SELECT
a.val,
(a.val - COALESCE(SUM(b.val), 0) / 5.0)
/ (COALESCE(SUM(b.val), 0) / 5.0) AS growth_rate
FROM t1 a CROSS JOIN t2 b
GROUP BY a.val;
Spark UI shows Project (JVM) instead of ProjectExecTransformer.
Trigger conditions (all required)
- Aggregate over an integer column (e.g.
SUM(bigint_col))
- Divided by an integer-valued decimal literal (e.g.
5.0, 3.0, 2.0)
- Literal's
BigDecimal.isValidLong is true (triggers rescaleLiteral)
- Non-constant column reference +
GROUP BY (prevents constant folding)
Gluten version
main branch
Spark version
Spark-3.3.x
Spark configurations
No response
System information
No response
Relevant logs
Backend
VL (Velox)
Bug description
Description
When a Project contains decimal arithmetic of the form
<bigint_aggregate> / <integer-valued decimal literal>, the entire Project is incorrectly falled back to JVM byColumnarPartialProjectRule, losing native Velox acceleration.The root cause is in
CheckOverflowTransformer: it usesoriginal.child.dataType(Spark's declared type) instead ofchild.dataType(the Gluten transformer's actual output type) when deciding whether to insert a cast, generating a substrait plan that fails Velox SimpleFunction type validation.Reproducer
Spark UI shows
Project(JVM) instead ofProjectExecTransformer.Trigger conditions (all required)
SUM(bigint_col))5.0,3.0,2.0)BigDecimal.isValidLongis true (triggersrescaleLiteral)GROUP BY(prevents constant folding)Gluten version
main branch
Spark version
Spark-3.3.x
Spark configurations
No response
System information
No response
Relevant logs