Skip to content

Add count and count_distinct select aggregates#45

Merged
aisrael merged 1 commit intomainfrom
feat/aggregate-count
Apr 5, 2026
Merged

Add count and count_distinct select aggregates#45
aisrael merged 1 commit intomainfrom
feat/aggregate-count

Conversation

@aisrael
Copy link
Copy Markdown
Owner

@aisrael aisrael commented Apr 5, 2026

Summary

This branch extends REPL select() aggregates with count(:column) (non-null values) and count_distinct(:column) (distinct non-null values), alongside the existing sum, avg, min, and max forms. They work for global summaries (select-only aggregates) and for grouped pipelines after group_by().

Implementation

  • Pipeline spec and DataFusion: New SelectItem::Count and SelectItem::CountDistinct variants wired through apply_select_spec_to_dataframe using DataFusion aggregate helpers.
  • REPL planning: Parsing, validation, and display updated so these names are treated like other single-column aggregates.
  • Docs: README and docs/REPL.md describe the new functions and distinguish read(...) |> count() (row count) from count(:col) inside select().
  • Tests: Gherkin scenarios in features/repl/aggregates.feature and Rust unit tests cover global and grouped usage.

Made with Cursor

@aisrael aisrael merged commit cd88647 into main Apr 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant