How does the HISTOGRAM function work? #3684
Replies: 3 comments 1 reply
-
Can we determine the number of bins ourselves? For sure, Thanks for this interesting issue and question. Histograms can certainly be organized on arbitrary scales. Having specialized tooling is quite welcome! But the way to apply them to dates is an interesting issues. Googling on "histogram()" "sql" I found nothing both formal stuff except for this: https://docs.influxdata.com/flux/v0.x/stdlib/universe/histogram/ Which did not look convincingly generic really is possibly not what our lovely duck HISTOGRAM functions expects. I suppose that the HISTOGRAM keyword originates from a relatively recent player that in the field of big data sql or nonsql data engine that I have be in touch with:-) Additional pointers on the way to handle the beast on arbitrary binning would be welcome! |
Beta Was this translation helpful? Give feedback.
-
Thank you for responding, @FrancoisLepoutre. I am guessing that I am aiming to have some of the flexibility that the numpy histogram function has. |
Beta Was this translation helpful? Give feedback.
-
It looks like the answer is pretty simple... Since the syntax is quite straightforward. I should have run a test instead for searching for an existing SQL functionality. Takes less time to open the console and run the test than googling for a generic answer. Duckdb HISTOGRAM takes an arbitrary sql expression - I just tested plain ones, I did NOT test subqueries - and delivers an histogram as a MAP. Looks pretty simple and workable. Binning buckets are not under control. Each distinct numeric delivers a single MAP entry. Of course, delivering complex binnings such as the ones possibly delivered by numpy would be a great plus. But you can anyway subcontract the functionality to numpy in case of specific needs and feed the result back into the engine in a snap. This fast bidirectional feeding link between duckdb and most common data matrix engines à la numpy, pandas, and ,of course, arrow is anyway what sets it apart! Some sort of high-end binning functionality for numerical information and timestamps would be a great plus. But that HISTOGRAM keyword is already quite useful as is. The SQL grammar is already quite rich. Thanks the duckdb team! |
Beta Was this translation helpful? Give feedback.
-
Hi, y'all!
I am a new user and have been loving DuckDB. I have one question and the docs have not been able to answer it: how does exactly the HISTOGRAM function work? Meaning, how does it choose the number of bins? Can we determine the number of bins ourselves?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions