202307-notebook Count Distinct (inc HLL + Theta)#14523
Closed
petermarshallio wants to merge 23 commits intoapache:masterfrom
Closed
202307-notebook Count Distinct (inc HLL + Theta)#14523petermarshallio wants to merge 23 commits intoapache:masterfrom
petermarshallio wants to merge 23 commits intoapache:masterfrom
Conversation
Initial draft
Added a quote and an INSERT
Fixed issue with the creation of HLL using GROUP BY (with thanks to @hellmarbecker) Added a note on how to do it with native.
Few updates reviewing Gian's original deck
Changed the router Url back Added some more stuff on error rates Added a query that uses merges Added @hellmarbecker 's blog
Updated prereqs - thanks to @techdocsmith / @writer-jill
Added the display tables function after ingestion and on initial connection to prove it's working.
Fixed wrong call to the wrong things... and moved some stuff into Learn More.
techdocsmith
requested changes
Jul 7, 2023
Contributor
techdocsmith
left a comment
There was a problem hiding this comment.
This notebook is off to a great start! I need to re-review when I can do the steps. Flight data is not loading for me atm.
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
examples/quickstart/jupyter-notebooks/notebooks/03-query/03-approxCountDistinct.ipynb
Outdated
Show resolved
Hide resolved
…proxCountDistinct.ipynb Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Applying suggestions from @techdocsmith
…marshallio/druid into 202307-approxCountDistinct
Next set of feedback updates from @techdocsmith
Contributor
Author
|
@techdocsmith this has been updated with your comments. |
Revised with the new standard intro
Stylistic changes and some re-pathing. Also aligned the intro bit to the other intro bits.
Aligned to template and feedback from @vtlim on the UNION ALL notebook
Removed the ingestion part and split into a different branch
Completed the portion on Theta sketches, and added some more links out.
Merged some content from the Thetasketch tutorial by @hellmarbecker
3 tasks
Amendments suggested by @writer-jill - notably reduction of commentary
Adding SQL to ingest into a specific table.
Switch to standard table name Text fixes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A notebook on counting unique values in a data set, using both non-approximate and approximate techniques - both HLL and Theta.
This PR has: