Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.algorithm.hashGroup #9966

Open
dlangBugzillaToGithub opened this issue Mar 30, 2013 · 4 comments
Open

std.algorithm.hashGroup #9966

dlangBugzillaToGithub opened this issue Mar 30, 2013 · 4 comments

Comments

@dlangBugzillaToGithub
Copy link

bearophile_hugs reported this on 2013-03-30T14:32:38Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=9842

CC List

  • hsteoh

Description

A little Strems/LINQ challenge, "Grouping by a Criterium":

Group the elements of a collection of strings by their length.

A C# LINQ solution:

string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};
var groups = names.GroupBy(c => c.Length);


Using the groupBy written by Andrei (https://github.com/D-Programming-Language/phobos/pull/1186 ) a D solution is:

    auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names
                  .schwartzSort!q{ a.length }
                  .groupBy!q{ a.length == b.length };


But group/groupBy work by sorting. Hash-based O(n) hashGroup/hashGroupBy are also conceivable, potentially faster, and leading to simpler code, because you don't need to sort the items first:

    auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names.hashGroupBy!q{ a.length == b.length };


(Alternative names for hashGroup/hashGroupBy are possible.)
@dlangBugzillaToGithub
Copy link
Author

hsteoh commented on 2014-11-05T05:49:57Z

Isn't this just the same as data.hashSort.group? For hash sort to work, the key must map to an integral value and must be bounded by known min/max values.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2014-11-05T10:01:42Z

(In reply to hsteoh from comment #1)
> Isn't this just the same as data.hashSort.group? For hash sort to work, the
> key must map to an integral value and must be bounded by known min/max
> values.

The output and working is different. The hashGroupBy doesn't return a range of pairs as the function "group".

If you write:

auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
auto grps = names.hashGroupBy!(n => n.length);

Now grps is:

assert(grps == [3:["Sam"], 4:["Samu", "Ravi"], 5:["Ratna"], 6:["Samuel", "Barsha"]]);

The keys can be anything, not just integers in an interval.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2014-12-17T09:54:27Z

Another example usage. Here the "properDivs" function returns the proper divisors of the given number n. The function "classify" returns one of the three classes (Deficient number, Perfect number, Abundant number). Then a hashGroup is handy to create an associative array to count how many numbers there are of each class in the [1,10000] interval:


void main() {
    import std.stdio, std.algorithm, std.range;
 
    static immutable properDivs = (in uint n) pure nothrow @safe =>
        iota(1, (n + 1) / 2 + 1).filter!(x => n % x == 0 && n != x);
 
    enum Class { deficient, perfect, abundant }
 
    static Class classify(in uint n) pure nothrow @safe {
        immutable p = properDivs(n).sum;
        with (Class)
            return (p < n) ? deficient : ((p == n) ? perfect : abundant);
    }
 
    iota(1, 10000).map!classify.hashGroup.writeln;
}

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2015-01-06T16:13:14Z

Now I think a "hashGroup" suffices (hashGroup can also be used without template argument, it uses an x=>x identity function on default).

In Phobos we have std.array.assocArray that allows to build a built-in associative array from a range of key-value Phobos Tuples (it will become mostly legacy when we will have a good associative array in Phobos and built-in tuples in D, that is the opposite of the data structures we have today).

But there is another very common pattern of creation of associative arrays, where you want to "accumulate" inside the values.

The basic form of accumulation is counting:

//A
const a = [1, 2, 1, 3, 5, 1, 2];
uint[int] aa1;
foreach (x; a)
    aa1[x]++;
aa1.writeln; // [1:3, 5:1, 2:2, 3:1]

Another example is just emumeration in an array:

//B
const b = [-1, 2, 1, 3, 5, -1, -2];
int[][int] aa2;
foreach (x; b)
    aa2[x.abs] ~= x;
aa2.writeln; // [1:[-1, 1, -1], 5:[5], 2:[2, -2], 3:[3]]

Or even in a set (here implemented with another associative array of int-bools):

//C
bool[int][int] aa3;
foreach (x; b)
    aa3[x.abs][x] = true;
aa3.writeln; // [1:[1:true, -1:true], 5:[5:true], 2:[2:true, -2:true], 3:[3:true]]


There are ways to perform those operations with Phobos today, but the code is so bad looking that it's better to skip this.


So I suggest to add a std.array.hashGroup function to Phobos that as argument takes a range, and as template argument takes one or two functions.

The first function is a mapping key unary function.

The second function is optional and takes as argument the sub-range of grouped values.

The patterns above become:

//A
a.hashGroup!(id, count).writeln;

//B
a.hashGroup!abs.writeln;

//C
a.hashGroup!(abs, items => items.zip(true.repeat).assocArray)).writeln;


Where "id" is the identity function (not present in Phobos):

//A
a.hashGroup!(x => x, count).writeln;

Once Phobos has a Set data structure with a constructor helper function that accepts a range as argument you can also write just:

//C
a.hashGroup!(abs, set).writeln;

@LightBender LightBender removed the P4 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants