Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no clip labels question #19

Open
sneccc opened this issue Mar 9, 2023 · 3 comments
Open

no clip labels question #19

sneccc opened this issue Mar 9, 2023 · 3 comments

Comments

@sneccc
Copy link

sneccc commented Mar 9, 2023

When i use no clip labels and at the same time i use estimate_k = True i get only 2-3 clusters, is there a way to increase this number and force more cluster that have similar features, without disabling estimate_k ? if i disable estimate_k i have to guess moreless how many clusters i need and end up with too many clusters

@LumenPallidium
Copy link
Owner

I think you would need to modify the get_best_kmeans function, which relies on the silhouette score. Some other options for clustering metrics can be found here:
https://scikit-learn.org/stable/modules/clustering.html#clustering-evaluation

@sneccc
Copy link
Author

sneccc commented Jun 20, 2023

@LumenPallidium in hierarchical clustering cant we use clip and tags like we compare the first tags ex "art,realism,design" then each image picks one label and goes down the tree, if it pick art it now compares for example "watercolor,pointilist,oilpainting,graphitti etc" etc so we can define a strcuture and tell the clip to pick the best of options and at the end it organizes everything nicelly, like an image of a lion could be in [realism -> wild photography -> lion ] folder

i was thinking like we define in a json that each node has children, so instead of clip comparing all the tags at the same time , we compare level by level of the tree until it reaches a final leaf

iam not sure how accurate hierarchical clustering is if it doesnt use clip, it tried it and in the 3d plot it looked off,
idk if needs more time to train, or any ajustments, it renames the files to just numbers, idk what it means from what i saw it should plot a Dendrogram no?
image

@LumenPallidium
Copy link
Owner

Hmm that is an interesting idea, I can look into it. I think it might look something like "given the nth level of the hierarchical cluster, what is the top-1 class". Not sure it could be guaranteed to follow a exact hierarchy in "class-space" though, but I will think about it some more.

As for the second part, I actually did not intend to use hierarchical clustering with plotting (since the end result is always a unique label for each point). It does rename the files to numbers (there are args, like n_symbols, for the HierarchicalClusterer class that can change this if you have many files), in such a way that they are sorted in a gradient based on their categorical similarity (very nice imo 😊). Attached is an example pic:
Screenshot 2023-06-20 at 10 13 46 PM

I'll document this some more, and perhaps add some warnings on run in case hierarchical clustering is used with the viz or k-means is used with reorganizing with rename (it should only be used when rename = False).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants