-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyBERT with llm and embedding model: endless loop during extract_keywords #190
Comments
Strange, not what I would expect. Could you share the full code? That actually makes it easier for me to reproduce the issue and perhaps find out what is going wrong here. Before I test things out, it might be related to the |
First, thank you for your prompt reply. I include the full code below. But I did a few experiments, and yes, the issue is related to the threshold. Now, here is the interesting thing. I tested several thresholds, going down from 0.7 which works. Keywords are generated for threshold 0.57 (about 2-3 sec for the docs as in the sample below), but not for threshold 0.56. It's a strange intersting fact!
|
Thanks for sharing the code that currently works, it will definitely help those having the same issue! That's definitely interesting. It seems that that particular embedding model has a specific distribution of similarities when applying cosine similarity. |
Intriguing indeed! Thanks again for your help! |
Cannot get any output if I use both llm and model for embeddings in KeyBERT. It runs more than 10 min on an input of 10-15 tokens (arxiv paper title for example). I have no issues with KeyBERT if I don't specify the embeddings model. I use a Google Colab Pro with A100 or V100 GPU and high_RAM.
Sample code, based on the Medium blogpost and the Github code, I omitted imports, prompt and docs samples for clarity:
The text was updated successfully, but these errors were encountered: