All wordlists are frequency sorted and deduped based on each collection.
I had to leave out a chunk of data due to it been in its original email:hash
format
However, these would have most likely been dupes so I considered it acceptable.
Total Size: Size of the frequency sorted and deduped original file
Only commons: Size of the file after it was tested with the rest, and outputted only the common lines
% common: How much of the original file was the same after checking for commons
% decrease: How much of the original file wasn't common (e.g. only occured once / unique)
Total size: 3.98 GB
Only commons: 3.95 GB
% common: 99%
% decrease: 1%
Total size: 8.75 GB
Only commons: 6.92 GB
% common: 79%
% decrease: 21%
Total size: 1.40 GB
Only commons: 1.06 GB
% common: 76%
% decrease: 24%
Total size: 5.13 GB
Only commons: 4.95 GB
% common: 96%
% decrease: 4%
Total size: 1.94 GB
Only commons: 1.91 GB
% common: 98%
% decrease: 2%
Total size: 2.79 GB
Only Commons: 2.61 GB
% common: 93%
% decrease: 7%