Skip to content

Projection pushdown produces incorrect results when column names are reused #2462

@jonmmease

Description

@jonmmease

Describe the bug
In certain situations where projections are used to overwrite a column's value, the projection pushdown optimization pass causes the incorrect column value to be produced.

To Reproduce
Here is a failing test that will be included in the follow-on PR

#[tokio::test]
async fn select_with_alias_overwrite() -> Result<()> {
    let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]);

    let batch = RecordBatch::try_new(
        Arc::new(schema.clone()),
        vec![Arc::new(Int32Array::from_slice(&[1, 10, 10, 100]))],
    )
    .unwrap();

    let ctx = SessionContext::new();
    let provider = MemTable::try_new(Arc::new(schema), vec![vec![batch]]).unwrap();
    ctx.register_table("t", Arc::new(provider)).unwrap();

    let df = ctx
        .table("t")
        .unwrap()
        .select(vec![col("a").alias("a")])
        .unwrap()
        .select(vec![(col("a").eq(lit(10))).alias("a")])
        .unwrap()
        .select(vec![col("a")])
        .unwrap();

    let results = df.collect().await.unwrap();

    #[rustfmt::skip]
        let expected = vec![
        "+-------+",
        "| a     |",
        "+-------+",
        "| false |",
        "| true  |",
        "| true  |",
        "| false |",
        "+-------+",
    ];
    assert_batches_eq!(expected, &results);

    Ok(())
}

presently, the values produced for column a are the original values [1, 10, 10, 100] rather than the expected values [false, true, true, false].

PR to follow shortly

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions