Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maxsum returns wrong similarlities #92

Closed
kunihik0 opened this issue Feb 23, 2022 · 4 comments
Closed

Maxsum returns wrong similarlities #92

kunihik0 opened this issue Feb 23, 2022 · 4 comments

Comments

@kunihik0
Copy link
Contributor

kunihik0 commented Feb 23, 2022

Hi, thank you for good KeyBert!
I think the method of getting the distance by index in _maxsum.py is wrong . The problem is that the similarity between the sentence and the candidate is specified with a different index than the actual one, so the wrong similarity is returned.
I think the following points need to be changed.

return [(words_vals[idx], round(float(distances[0][idx]), 4)) for idx in candidate]
to
return [(words_vals[idx], round(float(distances[0][words_idx[idx]]), 4)) for idx in candidate]

https://github.com/MaartenGr/KeyBERT/blob/master/keybert/_maxsum.py#:~:text=return%20%5B(words_vals%5Bidx%5D%2C%20round(float(distances%5B0%5D%5Bidx%5D)%2C%204))%20for%20idx%20in%20candidate%5D

@MaartenGr
Copy link
Owner

Sorry for the late response and thank you for tracking this down! I'll make sure it gets fixed in the next release.

@kunihik0
Copy link
Contributor Author

kunihik0 commented Mar 2, 2022

Thank you for your reply.
I had never heard of fork before but I have learned a bit about it.
So I would like you to use this pull request if possible after you have checked.

@MaartenGr
Copy link
Owner

Thank you for the pull request! It was merged and will be added in the next pypi release.

@kunihik0
Copy link
Contributor Author

Thanks for checking and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants