feature - implement RFC 023 approximate aggregates (#40)#53
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the portable RFC 023 approximate aggregate slice from current
main:approx_count_distinct(...)andapprox_percentile(..., accuracy=10000)as registry-backed aggregate helpers.approx_percentilekeeps percentile and accuracy parameters.approx_distinct,approx_percentile_cont) without changing emitted InQL Substrait names.Replaces closed stale PR #50 with a clean branch based on the merged RFC 022 mainline.
Type of change
docs/rfcs/*)Area(s)
Key details
approx_count_distinct(expr)andapprox_percentile(expr, percentile, accuracy=10000). Approximate percentile output names include percentile and accuracy so distinct estimates over the same input column do not collide.Testing / verification
make fmt INCAN=/Users/danny/Development/encero/incan/target/debug/incanincan test tests/test_approximate_functions.incnincan test tests/test_function_registry.incnincan test tests/test_substrait_plan.incnincan test tests/test_prism.incnincan test tests/test_session_aggregates.incnmake fmt-check INCAN=/Users/danny/Development/encero/incan/target/debug/incanmake test-stylemake registry-metadata INCAN=/Users/danny/Development/encero/incan/target/debug/incanmake build INCAN=/Users/danny/Development/encero/incan/target/debug/incanmake test INCAN=/Users/danny/Development/encero/incan/target/debug/incan(186 passed)make smoke-consumer INCAN=/Users/danny/Development/encero/incan/target/debug/incanmake pre-commit INCAN=/Users/danny/Development/encero/incan/target/debug/incanManual verification notes:
smoke-consumerneeded a network-enabled rerun after sandbox DNS blocked crates.io lockfile generation..agents/state/and not in the central Incan findings corpus.Docs impact
If docs updated:
docs/language/reference/functions/approximate.md,docs/language/reference/builders/aggregates.md,docs/rfcs/023_approximate_sketch_functions.md,docs/rfcs/025_typed_sketch_logical_values.md,docs/rfcs/README.md,docs/release_notes/v0_1.mdChecklist
Closes #40.
Refs #51.