Skip to content

Enabling expand_views_at_output config changes column names #18818

@nuno-faria

Description

@nuno-faria

Describe the bug

When enabling the expand_views_at_output config to convert UTF8View to UTF8Large, the names of the converted columns change, being prefixed with the relation name. I think the cause is that a CAST is added to change the type, meaning Expr::qualified_name will return "table.column" instead of just column:

// when we have a CAST we end up at the last match arm
pub fn qualified_name(&self) -> (Option<TableReference>, String) {
    match self {
        Expr::Column(Column {
            relation,
            name,
            spans: _,
        }) => (relation.clone(), name.clone()),
        Expr::Alias(Alias { relation, name, .. }) => (relation.clone(), name.clone()),
        _ => (None, self.schema_name().to_string()),
    }
}

// which in turn calls
SchemaDisplay(self)

// which for cast simply calls SchemaDisplay(self) of the inner expression
Expr::Cast(Cast { expr, .. }) | Expr::TryCast(TryCast { expr, .. }) => {
    write!(f, "{}", SchemaDisplay(expr))
}

// which for Column calls
impl fmt::Display for Column {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.flat_name())
    }
}

// which includes the relation + name, unlike the original qualified_name for a regular Column

I think one approach would be to update qualified_name and adding a match for casts. I would be happy to fix this, if it is indeed a bug a not expected behavior.

To Reproduce

use datafusion::error::Result;
use datafusion::prelude::{ParquetReadOptions, SessionContext};

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.sql("copy (select 1 as k, 'a' as v) to 't.parquet'")
        .await?
        .collect()
        .await?;
    ctx.register_parquet("t", "t.parquet", ParquetReadOptions::new())
        .await?;

    let df = ctx.sql("select * from t").await?;
    df.clone().show().await?;
    println!("{:?}", df.collect().await?[0].schema());

    ctx.sql("set datafusion.optimizer.expand_views_at_output = true")
        .await?
        .collect()
        .await?;

    let df = ctx.sql("select * from t").await?;
    df.clone().show().await?;
    println!("{:?}", df.collect().await?[0].schema());

    Ok(())
}

k remains the same but v changes:

+---+---+
| k | v |
+---+---+
| 1 | a |
+---+---+
Schema { fields: [Field { name: "k", data_type: Int64 }, Field { name: "v", data_type: Utf8View }], metadata: {} }
+---+-----+
| k | t.v |
+---+-----+
| 1 | a   |
+---+-----+
Schema { fields: [Field { name: "k", data_type: Int64 }, Field { name: "t.v", data_type: LargeUtf8 }], metadata: {} }

Expected behavior

Maintaining the original column names.

Additional context

Tested on main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions