If the input isn't completely sorted, duplicates could sneak in. This combines the usage information for common extensions.
Ensure extensions are unique.
Input might not be as alphabetical as it says.
I see one duplicate with ZOO. You could sort the input before the parse step using the unix sort command.
A couple other things I noticed.
1) Some of the snippets end in two periods, e.g.
A file with this extension may be a Zoo Tycoon saved game for Any game in the Zoo Tycoon (first) series..
2) Some of the snippets contain external links (which they shouldn't), e.g. ASD.
The input per extension is spread over multiple lines, so sorting directly won't work. I'll take a look at cleaning up links/excess punctuation.
@whee I'm working on integrating this fathead and pushing it live but I need the parse script to use UTF-8 encoding for output.txt. I'd really appreciate it if you could update this for me.