Clustering of strings using Fuzzy String matching and KMeans Algorithm.
python string_clustering.py json_file_name field_name no_of_clusters
- json_file_name: Name of the input JSON file
- field_name : Name of the JSON field
- no_of_clusters: Number of Clusters into which the string has to be clustered. * If the input file is present in another direcoty enter the full path, D:/FuzzyStringMatch/data/sample_data.json
csv or tsv files can also be used. Use Pandas read_csv function.
python string_clustering.py D:/FuzzyStringcomparision/data/sample_data.json field04 25
- This generates a output file named Report.txt with the strings from the JSON field field04 clustured together into 25 different clusters.