Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COUNT DISTINCT does not support dictionary types #249

Closed
alamb opened this issue May 3, 2021 · 0 comments · Fixed by #256
Closed

COUNT DISTINCT does not support dictionary types #249

alamb opened this issue May 3, 2021 · 0 comments · Fixed by #256
Labels
datafusion Changes in the datafusion crate enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 3, 2021

As in #199

Describe the bug
Distinct aggregates such as select count(distinct col) ... error when col is a dictionary

To Reproduce

            let mut ctx = ExecutionContext::new();

            // input data looks like:
            // A, 1
            // B, 2
            // A, 2
            // A, 4
            // C, 1
            // A, 1

            let dict_array: DictionaryArray<K> =
                vec!["A", "B", "A", "A", "C", "A"].into_iter().collect();
            let dict_array = Arc::new(dict_array);

            let val_array: Int64Array = vec![1, 2, 2, 4, 1, 1].into();
            let val_array = Arc::new(val_array);

            let schema = Arc::new(Schema::new(vec![
                Field::new("dict", dict_array.data_type().clone(), false),
                Field::new("val", val_array.data_type().clone(), false),
            ]));

            let batch = RecordBatch::try_new(schema.clone(), vec![dict_array, val_array])
                .unwrap();

            let provider = MemTable::try_new(schema.clone(), vec![vec![batch]]).unwrap();
            ctx.register_table("t", Arc::new(provider)).unwrap();


            // Now, use dict as an aggregate
            let results =
                plan_and_collect(&mut ctx, "SELECT val, count(distinct dict) FROM t GROUP BY val")
                    .await
                    .expect("ran plan correctly");

            let expected = vec![
                "+-----+-------------+",
                "| val | COUNT(dict) |",
                "+-----+-------------+",
                "| 1   | 2           |",
                "| 2   | 2           |",
                "| 4   | 1           |",
                "+-----+-------------+",
            ];
            assert_batches_sorted_eq!(expected, &results);

Fail with an error such as:

thread 'execution::context::tests::group_by_dictionary' panicked at 'ran plan correctly: ArrowError(ExternalError(NotImplemented("Can\'t create a scalar of array of type \"Dictionary(Int8, Utf8)\"")))', datafusion/src/execution/context.rs:1774:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Expected behavior
test passes

Additional context
Add any other context about the problem here.

@alamb alamb added bug Something isn't working datafusion Changes in the datafusion crate enhancement New feature or request and removed bug Something isn't working labels May 3, 2021
@alamb alamb closed this as completed in #256 May 6, 2021
alamb pushed a commit that referenced this issue Jul 12, 2024
findepi pushed a commit to findepi/datafusion that referenced this issue Jul 16, 2024
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this issue Jul 17, 2024
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this issue Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant