New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLSpout group by hostname and get top N results #609

Closed
jnioche opened this Issue Sep 21, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@jnioche
Member

jnioche commented Sep 21, 2018

Similar to what is done in ES with aggregations. At the moment we rely on shard numbers only but this does not guarantee a good mix of URL sources.

See https://stackoverflow.com/questions/2129693/using-limit-within-group-by-to-get-n-results-per-group

@jnioche jnioche added the enhancement label Sep 21, 2018

@jnioche jnioche closed this in 27e6932 Sep 24, 2018

@jnioche jnioche added this to the 1.11 milestone Sep 24, 2018

@jnioche

This comment has been minimized.

Member

jnioche commented Sep 24, 2018

Uses Window functions

https://mariadb.com/kb/en/library/window-functions-overview/
http://www.mysqltutorial.org/mysql-window-functions/mysql-rank-function/

These are supported in recent versions of MariaDB or MySQL

The sharding mechanism can still be used. A new parameter mysql.max.urls.per.bucket has been introduced.

@jnioche jnioche added the SQL label Oct 3, 2018

jnioche added a commit that referenced this issue Oct 3, 2018

SQLSpout query returns nextFetchDate and order by ranking to guarante…
…e good host diversity when limiting the number results, fixes #609
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment