Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create clusters of unassigned faces with variable "similarity score" #945

Open
g3n35i5 opened this issue Aug 14, 2023 · 2 comments
Open
Labels
enhancement New feature or request
Projects

Comments

@g3n35i5
Copy link

g3n35i5 commented Aug 14, 2023

Describe the feature you'd like to request

First of all, I would like to thank all the developers for this awesome app! I use it in combination with Memories and it works really well. I suggested in pulsejet/memories#721 that it would be a cool feature to cluster unassigned faces by a variable similarity score. The developer made a few suggestions and comments, but pointed out that it would make more sense to implement this feature directly in recognize.

Describe the solution you'd like

A UI slider (with a sane default value, lets say 70%) that defines a similarity score for persons. The list/grid of unassigned faces is clustered by all persons that have a similarity index greater than the selected value with the slider.

Describe alternatives you've considered

N/A

@g3n35i5 g3n35i5 added the enhancement New feature or request label Aug 14, 2023
@github-actions github-actions bot added this to Backlog in Recognize Aug 14, 2023
@marcelklehr
Copy link
Member

Hello @g3n35i5

Thanks for your kind words and giving your input on the face clustering. We're always open to improving the clustering algorithm. We used to use an implementation of the DBSCAN algortihm, which basically takes the vector distance between two face description vectors to calculate if they should be in the same cluster or not. This did not lead to great results however. We now use HDBSCAN, which is more involved than DBSCAN in that it also factors in the density of face detections in the vector space. There are some variables that can be tweaked but they are non trivial. So far I've opted to go for a good default experience rather than implementing knobs for people to tweak the system themselves. We have calibrated the current variables on a dataset of 20.000 pictures from IMDB and I really think the current settings yield the best results we are going to get with the current face descriptor model - ie. if we want better results, I think we need to invest more effort than simply changing a variable.

Also, feel free to chime into the discussion thread at #754 ♥️

@g3n35i5
Copy link
Author

g3n35i5 commented Aug 16, 2023

Thanks @marcelklehr for your quick feedback! I can completely understand why under these circumstances further feature development from your side has a lower priority than improving the algorithm itself. Unfortunately, I can only provide limited input on the topic of clustering algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Recognize
Backlog
Development

No branches or pull requests

2 participants