lists for n files which lines are part of which files from those n. Also groups this output so you get list of all lines part of file one but not file two etc.
As an example, say you have email addresses which appear on three lists, and you want to find out who is on the first and second lists, but not the third, this is your tool.
Easily scales to millions of lines, although there are some trivial optimization possibilities left.
This tool has previously been described in the blogpost I'm a C++ dinosaur but I'm ok. Since C++ 2011 and 2014, I no longer feel like a dinosaur using C++ though!
Three files, 'een':
aap noot mies wim
wim zus jet gys
aap NOOT mies wim zus jet gys
$ grouper -i een twee drie een twee drie aap 1 0 1 gys 0 1 1 jet 0 1 1 mies 1 0 1 noot 1 0 1 wim 1 1 1 zus 0 1 1 Per group output Group (size=3): 1 0 1 aap mies noot Group (size=3): 0 1 1 gys jet zus Group (size=1): 1 1 1 wim