Skip to content

COUNT DISTINCT does not support for Boolean #202

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12339

If you try to run a COUNT (DISTINCT ..) query on a boolean column you get the following panic:

thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22

While there is unlikely to be a big usecase for this, it would be nice for completeness sake. At the very least we should add a proper error message rather than a panic

Reproducer:
{code}
echo "true" > /tmp/foo.csv
./target/debug/datafusion-cli

CREATE EXTERNAL TABLE t (a boolean) STORED AS CSV LOCATION '/tmp/foo.csv';

0 rows in set. Query took 0 seconds.

select count(distinct a) from t;

thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
ArrowError(ExternalError(Canceled))
{code}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions