Skip to content

[VL] Project with decimal arithmetic falls back to JVM when aggregate result is divided by integer-valued decimal literal #12260

@Xtpacz

Description

@Xtpacz

Backend

VL (Velox)

Bug description

Description

When a Project contains decimal arithmetic of the form <bigint_aggregate> / <integer-valued decimal literal>, the entire Project is incorrectly falled back to JVM by ColumnarPartialProjectRule, losing native Velox acceleration.

The root cause is in CheckOverflowTransformer: it uses original.child.dataType (Spark's declared type) instead of child.dataType (the Gluten transformer's actual output type) when deciding whether to insert a cast, generating a substrait plan that fails Velox SimpleFunction type validation.

Reproducer

CREATE TABLE t1 (val BIGINT) USING parquet;
CREATE TABLE t2 (val BIGINT) USING parquet;
INSERT INTO t1 VALUES (200);
INSERT INTO t2 VALUES (100), (100), (100), (100), (100);

SELECT
    a.val,
    (a.val - COALESCE(SUM(b.val), 0) / 5.0)
        / (COALESCE(SUM(b.val), 0) / 5.0) AS growth_rate
FROM t1 a CROSS JOIN t2 b
GROUP BY a.val;

Spark UI shows Project (JVM) instead of ProjectExecTransformer.

Trigger conditions (all required)

  1. Aggregate over an integer column (e.g. SUM(bigint_col))
  2. Divided by an integer-valued decimal literal (e.g. 5.0, 3.0, 2.0)
  3. Literal's BigDecimal.isValidLong is true (triggers rescaleLiteral)
  4. Non-constant column reference + GROUP BY (prevents constant folding)

Gluten version

main branch

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions