Skip to content

[Python][Gandiva] Offset is ignored in Gandiva projector #4420

@zeyuanxy

Description

@zeyuanxy

I used the test case in https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25, and found an issue when I was using the slice operator input_batch[1:]. It seems that the offset is ignored in the Gandiva projector.

import pyarrow as pa
import pyarrow.gandiva as gandiva

builder = gandiva.TreeExprBuilder()

field_a = pa.field('a', pa.int32())
field_b = pa.field('b', pa.int32())

schema = pa.schema([field_a, field_b])

field_result = pa.field('res', pa.int32())

node_a = builder.make_field(field_a)
node_b = builder.make_field(field_b)

condition = builder.make_function("greater_than", [node_a, node_b],
                                  pa.bool_())
if_node = builder.make_if(condition, node_a, node_b, pa.int32())

expr = builder.make_expression(if_node, field_result)

projector = gandiva.make_projector(
    schema, [expr], pa.default_memory_pool())

a = pa.array([10, 12, -20, 5], type=pa.int32())
b = pa.array([5, 15, 15, 17], type=pa.int32())
e = pa.array([10, 15, 15, 17], type=pa.int32())
input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b'])

r, = projector.evaluate(input_batch[1:])
print(r)

If we use the full record batch input_batch, the expected output is [10, 15, 15, 17]. So if we use input_batch[1:], the expected output should be [15, 15, 17], however this script returned [10, 15, 15]. It seems that the projector ignores the offset and always reads from 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions