-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When enabling the expand_views_at_output config to convert UTF8View to UTF8Large, the names of the converted columns change, being prefixed with the relation name. I think the cause is that a CAST is added to change the type, meaning Expr::qualified_name will return "table.column" instead of just column:
// when we have a CAST we end up at the last match arm
pub fn qualified_name(&self) -> (Option<TableReference>, String) {
match self {
Expr::Column(Column {
relation,
name,
spans: _,
}) => (relation.clone(), name.clone()),
Expr::Alias(Alias { relation, name, .. }) => (relation.clone(), name.clone()),
_ => (None, self.schema_name().to_string()),
}
}
// which in turn calls
SchemaDisplay(self)
// which for cast simply calls SchemaDisplay(self) of the inner expression
Expr::Cast(Cast { expr, .. }) | Expr::TryCast(TryCast { expr, .. }) => {
write!(f, "{}", SchemaDisplay(expr))
}
// which for Column calls
impl fmt::Display for Column {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", self.flat_name())
}
}
// which includes the relation + name, unlike the original qualified_name for a regular ColumnI think one approach would be to update qualified_name and adding a match for casts. I would be happy to fix this, if it is indeed a bug a not expected behavior.
To Reproduce
use datafusion::error::Result;
use datafusion::prelude::{ParquetReadOptions, SessionContext};
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
ctx.sql("copy (select 1 as k, 'a' as v) to 't.parquet'")
.await?
.collect()
.await?;
ctx.register_parquet("t", "t.parquet", ParquetReadOptions::new())
.await?;
let df = ctx.sql("select * from t").await?;
df.clone().show().await?;
println!("{:?}", df.collect().await?[0].schema());
ctx.sql("set datafusion.optimizer.expand_views_at_output = true")
.await?
.collect()
.await?;
let df = ctx.sql("select * from t").await?;
df.clone().show().await?;
println!("{:?}", df.collect().await?[0].schema());
Ok(())
}k remains the same but v changes:
+---+---+
| k | v |
+---+---+
| 1 | a |
+---+---+
Schema { fields: [Field { name: "k", data_type: Int64 }, Field { name: "v", data_type: Utf8View }], metadata: {} }
+---+-----+
| k | t.v |
+---+-----+
| 1 | a |
+---+-----+
Schema { fields: [Field { name: "k", data_type: Int64 }, Field { name: "t.v", data_type: LargeUtf8 }], metadata: {} }
Expected behavior
Maintaining the original column names.
Additional context
Tested on main.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working