Skip to content

Implementing CatalogProvider breaks quoted CTE names #18932

@colinmarc

Description

@colinmarc

Describe the bug

We use a custom CatalogProvider. We've discovered an edge case where a quoted CTE name breaks using a custom catalog provider, but not when using MemoryCatalogProvider. In other words, a query like:

with foobar as (select count(*) from orders) select * from "foobar"

Hits the catalog provider for foobar, even though it's a CTE table, and the catalog doesn't have any record of in it advance, obviously. Not quoting "foobar" or just using ctx.register_table("orders", ...) both cause the query to work.

To Reproduce

Here is some code that reproduces:

#[derive(Debug)]
struct MyCatalog;

#[derive(Debug)]
struct MySchema(String);

impl CatalogProvider for MyCatalog {
    fn as_any(&self) -> &dyn Any {
        self
    }

    fn schema_names(&self) -> Vec<String> {
        Vec::new()
    }

    fn schema(&self, name: &str) -> Option<Arc<dyn SchemaProvider>> {
        Some(Arc::new(MySchema(name.to_owned())))
    }
}

#[async_trait::async_trait]
impl SchemaProvider for MySchema {
    fn as_any(&self) -> &dyn Any {
        self
    }

    fn table_names(&self) -> Vec<String> {
        todo!()
    }

    async fn table(
        &self,
        name: &str,
    ) -> Result<Option<Arc<dyn datafusion::catalog::TableProvider>>, DataFusionError> {
        if name == "orders" {
            let schema = Arc::new(Schema::new(vec![
                Field::new("id", DataType::Int32, false),
                Field::new("name", DataType::Utf8, false),
            ]));
            return Ok(Some(Arc::new(EmptyTable::new(schema))));
        } else {
            panic!("Table not found: {name}")
        }
    }

    fn table_exist(&self, name: &str) -> bool {
        name == "orders"
    }
}

#[tokio::test]
async fn quoted_cte() -> anyhow::Result<()> {
    let ctx = SessionContext::new();
    ctx.register_catalog("datafusion", Arc::new(MyCatalog));

    // If I comment out the line above and use this instead, both queries work.
    //
    // let schema = Arc::new(Schema::new(vec![
    //     Field::new("id", DataType::Int32, false),
    //     Field::new("name", DataType::Utf8, false),
    // ]));
    // let table = Arc::new(EmptyTable::new(schema));
    // ctx.register_table("orders", table)?;
    
    // NO QUOTING: This always works either way.
    let res = ctx
        .sql("with foobar as (select * from orders) select * from foobar")
        .await?
        .collect()
        .await?;

    println!("{}", pretty_format_batches(&res)?);

    // QUOTING: This doesn't work with `MyCatalog`.
    let res = ctx
        .sql("with barbaz as (select * from orders) select * from \"barbaz\"")
        .await?
        .collect()
        .await?;

    println!("{}", pretty_format_batches(&res)?);

    Ok(())
}

Expected behavior

Both queries in the example should work with a custom catalog provider.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions