simhash-minhash-algorithm Simple script implementing Simhash algorithmn for near duplicate document detection Uses 'text' column in 'emails.csv' to perform near duplicate document detection.