Skip to content

Conversation

@ymer
Copy link

@ymer ymer commented Apr 10, 2016

Return a list of the most common elements in a counter and their counts, ordered from the most common to the least.

most_common(counter([1, 6, 2, 2, 5, 5, 5]))

4-element Array{Pair{Int64,Int64},1}:
5=>3
2=>2
6=>1
1=>1

merge(a, a2) # return a new accumulator/counter that combines the
# values/counts in both a and a2

most_common(a) # Return a list of the most common elements and their counts from the most common to the least.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixing the indentation here should get rid of the weirdness in the rich diff

@kmsquire
Copy link
Member

For parity with Julia Base, it might be nice to create a version of select which does this. See http://docs.julialang.org/en/release-0.4/stdlib/sort/#Base.select! (and http://docs.julialang.org/en/release-0.4/stdlib/sort/#Base.select). Or use select to define most_common, if that makes sense.

There are related functions, nsmallest and nlargest, which are implemented for heaps in this package as well.

@ymer
Copy link
Author

ymer commented Apr 11, 2016

Could the heap nlargest be extended to take a "by" keyword?

@kmsquire
Copy link
Member

Well, it could. This might not be a good idea if performance is an issue (see http://docs.julialang.org/en/release-0.4/manual/performance-tips/#declare-types-of-keyword-arguments, although there isn't much explanation as to why keywords can be slow).

I'm curious what your use case would be, since nlargest would already choose the largest values according to the value, and that's what you're proposing here.

Another idea, which I like a little better, would be to implement a separate HeapCounter type, which created a counter based on heaps instead of dictionaries.

@ymer
Copy link
Author

ymer commented Apr 11, 2016

I think that the users would often want all the values in order, rather than the n most common. And most_common could be somewhat more suitable for this than nlargest.

Here is another suggestion:

most_common(ct) = sort(collect(ct.map), by=p->p[2], rev=true)

most_common(ct, n) = select(collect(ct.map), 1:n, by=p->p[2], rev=true)

@oxinabox
Copy link
Member

It may also be worth special casing most_common(ct, 1) which I think is another common use-case.
select is really slow compared to other methods for finding the single most common.

Also good idea. I didn't even know this is missing. This is one of the main uses of the Counter type.

Copy link
Contributor

@rawls238 rawls238 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ymer any updates on this? It would be nice to merge this in once some of the comments are resolved

@kmsquire
Copy link
Member

Bump.

@kmsquire
Copy link
Member

This seems to have been satisfied by #375 (using nlargest and nsmallest).

@kmsquire kmsquire closed this Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants