Skip to content

Cluster Strings#6

Merged
MaartenGr merged 8 commits intomasterfrom
feature-cluster
Dec 7, 2020
Merged

Cluster Strings#6
MaartenGr merged 8 commits intomasterfrom
feature-cluster

Conversation

@MaartenGr
Copy link
Owner

@MaartenGr MaartenGr commented Dec 7, 2020

Gives the ability to cluster one list of strings (#5) by following this pattern:

from polyfuzz import PolyFuzz
one_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
model = PolyFuzz("TF-IDF")
model.match(one_list, one_list)

You can then cluster the strings that the original strings were mapped to with (single linkage clustering):

model.group(link_min_similarity=0.75, group_all_strings=True)

The resulting clusters can be accessed with:

model.get_clusters()

@MaartenGr
Copy link
Owner Author

Removed python 3.6 from the workflow for now as Numpy drops that in a newer version and it tries grabbing the newer version although I specifically state it should grab an earlier version...

@MaartenGr MaartenGr merged commit 74945f8 into master Dec 7, 2020
@MaartenGr MaartenGr deleted the feature-cluster branch December 11, 2020 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant