-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird Behaviour for Finetuning Embeddings #2646
Comments
Hello! This all sounds quite reasonable. (I'm assuming you're using Sentence Transformers before v3.0 here:) What is the format of your InputExample exactly? Is it this one? InputExample(texts=["Represent this sentence for searching relevant passages: my query", "my positive", "my negative 1", "my negative 2", "my negative 3", "my negative 4", "my negative 5"]) Note in particular the prompt for the query, which the https://huggingface.co/BAAI/bge-large-en-v1.5 model recommends. Could that be the reason? Other than that, your setup seems like it should work well, and you should have enough training samples to see a meaningful improvement. Lastly, MultipleNegativesRankingLoss uses the provided negatives as well as the "in-batch negatives", e.g. all positives and negatives from other queries in the same batch. If you have a lot of exact overlap across your training samples, then it's possible that a lot of these "in-batch negatives" are actually relevant to your query. In that case, you'll start training with false negatives, which can be bad for performance.
In essence, GISTEmbedLoss is MultipleNegativesRankingLoss, but it uses a "guide model" to ignore some in-batch negatives if the guide model deems thinks that
|
@tomaarsen thanks for the detailed explanation.
Update : I found out this similar issue -#2358 Thanks in advance!! |
Hi @tomaarsen . Thanks in advance |
Hello! I'm afraid not, I've been busy with the upcoming v3.0 release. You can try any of the example scripts to see if you can reproduce this somehow, or you can switch to another model. I don't have any great other ideas, but the prompt/input format is quite important, so that might be it.
|
Hi ,
Background -
I am trying to finetune the BGE-Large model - 'BAAI/bge-large-en-v1.5' on custom domain specific dataset.
I am using data in format - triplets - (anchor, postive sample, negative samples) , i have (1 anchor, 1 pos , 5 negative samples) in a single data point
Loss used - MultipleNegativesRankingLoss (earlier tried Triplet loss too)
Warmup steps - 10% of training data
Training samples - 13k samples (each sample - 1 anchor, 1 positive, 5 negative samples)
Learning rate - 2e-6 (tried with 3e-5, 2e-5 too)
scheduler - Tried linear warmup, cosine, cosine with hardrestarts.
The problem is that the expectations after finetuning is that the model in retrieval should retrieve these positives on top and negatives as low as possible in ranking , but what is observed is that these positives pairs are even pushed down in rank with the finetuned model, the pairs which were coming on top or top10 are even degraded and are not in even top 5 instead outside top 50.
To add to it, it is observed that after 1 epoch when we try checking , the results get little better , but after 1 epoch it then further degrades and the pairs which were coming for eg - (7k positive pairs with their anchor for retrieval was having rank as 1 gets degraded to just 3k having 1 and rest going even below top 50).
Been struggling to come up why is it and how to better it.
Thanks in advance
The text was updated successfully, but these errors were encountered: