Skip to content

202307-notebook Count Distinct (inc HLL + Theta)#14523

Closed
petermarshallio wants to merge 23 commits intoapache:masterfrom
petermarshallio:202307-approxCountDistinct
Closed

202307-notebook Count Distinct (inc HLL + Theta)#14523
petermarshallio wants to merge 23 commits intoapache:masterfrom
petermarshallio:202307-approxCountDistinct

Conversation

@petermarshallio
Copy link
Contributor

@petermarshallio petermarshallio commented Jul 4, 2023

A notebook on counting unique values in a data set, using both non-approximate and approximate techniques - both HLL and Theta.


This PR has:

  • been self-reviewed.
  • been tested in a test Druid cluster
  • been tested in the notebooks Docker

Added a quote and an INSERT
Fixed issue with the creation of HLL using GROUP BY (with thanks to @hellmarbecker)

Added a note on how to do it with native.
Few updates reviewing Gian's original deck
Changed the router Url back
Added some more stuff on error rates
Added a query that uses merges
Added @hellmarbecker 's blog
Added the display tables function after ingestion and on initial connection to prove it's working.
Fixed wrong call to the wrong things... and moved some stuff into Learn More.
Copy link
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook is off to a great start! I need to re-review when I can do the steps. Flight data is not loading for me atm.

petermarshallio and others added 4 commits July 10, 2023 09:09
…proxCountDistinct.ipynb

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Next set of feedback updates from @techdocsmith
@petermarshallio
Copy link
Contributor Author

@techdocsmith this has been updated with your comments.

Revised with the new standard intro
Stylistic changes and some re-pathing. Also aligned the intro bit to the other intro bits.
Aligned to template and feedback from @vtlim on the UNION ALL notebook
Removed the ingestion part and split into a different branch
Completed the portion on Theta sketches, and added some more links out.
Merged some content from the Thetasketch tutorial by @hellmarbecker
@petermarshallio petermarshallio mentioned this pull request Aug 2, 2023
3 tasks
@petermarshallio petermarshallio marked this pull request as draft August 30, 2023 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants