Skip to content

Count distinct errors with "... is declared as non-nullable but contains null values" #4828

@jonmmease

Description

@jonmmease

Describe the bug
This is a MRE for a query that results in this error:

InvalidArgumentError("Column 'COUNT(DISTINCT tbl.col2)[count distinct]' is declared as non-nullable but contains null values")

The input table (tbl) looks like this:

+------+------+
| col1 | col2 |
+------+------+
| A    |      |
| B    | b    |
| C    | c    |
+------+------+

This is the query

SELECT count(DISTINCT "col2"), min(0) AS "zero"
FROM "tbl"
GROUP BY "col1"

The error goes away without the min(0) AS "zero" column expression

To Reproduce
Steps to reproduce the behavior:

#[cfg(test)]
mod tests {
    use std::sync::Arc;
    use datafusion::arrow::array::{ArrayRef, StringArray};
    use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
    use datafusion::arrow::record_batch::RecordBatch;
    use datafusion::arrow::util::pretty::pretty_format_batches;
    use datafusion::datasource::MemTable;
    use datafusion::prelude::SessionContext;

    #[tokio::test]
    async fn count_distinct_error() {
        let col1 = Arc::new(StringArray::from(
            vec!["A", "B", "C"]
        )) as ArrayRef;
        let col2 = Arc::new(StringArray::from(
            vec![None, Some("b"), Some("c")]
        )) as ArrayRef;

        let schema = Arc::new(Schema::new(vec![
            Field::new("col1", DataType::Utf8, true),
            Field::new("col2", DataType::Utf8, true)
        ])) as SchemaRef;

        let batch = RecordBatch::try_new(schema.clone(), vec![col1, col2]).unwrap();
        let mem_table = MemTable::try_new(schema, vec![vec![batch]]).unwrap();

        // Create context and register table
        let ctx = SessionContext::new();
        ctx.register_table("tbl", Arc::new(mem_table)).unwrap();

        // Print input table
        let res = ctx.sql("SELECT * from tbl").await.unwrap().collect().await.unwrap();
        let formatted = pretty_format_batches(res.as_slice()).unwrap();
        println!("{}", formatted);

        // Perform query
        let res = ctx.sql(r#"
SELECT count(DISTINCT "col2"), min(0) AS "zero"
FROM "tbl"
GROUP BY "col1"
    "#).await.unwrap().collect().await.unwrap();

        let formatted = pretty_format_batches(res.as_slice()).unwrap();
        println!("{}", formatted);
    }
}

Expected behavior
Expect the query to complete without errors

Additional context
This may be a duplicate of #4040, but the query setup there was more complex. If a maintainer thinks it's the same bug I'd be happy to move this MRE over to that issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions