-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine Tuning Models #350
Comments
@nreimers can you please help us out here? Any help will be highly appreciated, Thanks a lot 😄 |
Hi @farhaanbukhsh Do you want multi-label classification for single sentences? Then I can recommend to use transformers package and to fine-tune the model. Sentence-Transformers is not needed for that. You want to fine-tune sentence embeddings with a training set where you only have labels and single sentences, and then use the sentence embeddings for example with cosine similarity for tasks like clustering? This is a non-trivial case and sadly there is no single "do this" answer. It heavily depends on what your dataset looks like and how you want to use the sentence embeddings afterwards. You can have a look here, how a dataset with (label, single_sentence) can be used with BatchHardTripletLoss: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py You have sentence pairs and labels? Then you can maybe use the same setup as for NLI: Best |
yes, exactly what we are trying to do.
Thanks a ton @nreimers for replying, means a lot ❤️ What I have is a simple csv file where for a given sentence I have assigned labels, for example: The heater from amazon was damaged. Labels: ['defective product', 'electric appliance'] so for this do you think https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py can be used? |
Yes, for this BatchHardTripletLoss seems a good choice. For triplet loss, have a look at this good blog article: With BatchHardTripletLoss, sentences (texts) that have the same labels will become close in vector space, while sentences with a different label will be further away. At the end you will have clusters in your vector space, e.g. all sentences talking about 'defective product' will be in one space, while sentences with label 'great price' while be somewhere else in vector space. Best |
Thanks a loads @nreimers 💯 |
Hey @nreimers , we faced a problem the script: Doesn't work out of the box, the data source: https://cogcomp.seas.upenn.edu/Data/QA/QC/TREC_10.label has just one sentence while for evaluator = TripletEvaluator.from_input_examples(val, name='dev') to work it needs 3 sentence I guess there is some kind of a confusion to use if you could point me the right data source to be used for |
Hi @farhaanbukhsh The triplets for dev & test have to be created before they are bassed to TripletEvaluator Best |
Thanks a tone for all the work here ❤️ |
@nreimers do I need a GPU to run Getting this error:
coming from |
Fixed that with the latest push, but you would need to install the framework from sources. Or you use one of the two other batch hard losses. They work without GPU in the latest release |
Got it :) thanks again |
Hey @nreimers , I just have few follow up questions, I went through the TripletLoss blog that you pointed out, https://omoindrot.github.io/triplet-loss. This did help a lot, so the questions are:
I also went through https://towardsdatascience.com/image-similarity-using-triplet-loss-3744c0f67973 and here it is mentioned we have to curate the Triplet dataset and supply it to the tunning algorithm. So is human curation required here? |
Hi @farhaanbukhsh For training, the application uses what is called "Batch All / Hard / SemiHard Triplets" (also explained in the link). You generate a mini batch with n sentences. It then checks out of the n x n x n possible combinations, which are valid triplets such that anchor and positive have the same label and the negative a different label. It then uses these valid triplets to compute the loss. The quality (difficulty) of the triplets are quite important to get good results. If the triplets are too easy, the model will not learning something. With the Batch*Triplets approach you already create large combinations of triplets, hence, there designing difficult triplets is not really needed. Human curation for training is not needed. Best |
Hi! Has it been placed somewhere else? can't seem to open the link |
Hi @nreimers ! IThank you for the advice written on this thread. It appears that the file https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_batch_hard_trec_continue_training.py is giving a 404 error. @farhaanbukhsh We are working on a similar task as you - e.x given a sentence "The tomatoes were rotten" and a set of possible labels ["fruit", "vegetable", "root"] I want to assign one label to the sentence. Did you end up using the triple loss methodology successfully? I was also looking at this thread. I have 2 questions:
Thank you in advance! |
@thefirebanks The purpose of SentenceTransformer is to create a model that can generate meaningful embeddings for text. If your task is classification, then using sentence embeddings is in most cases the wrong approach. In that case, CrossEncoder work much better: https://www.sbert.net/examples/training/cross-encoder/README.html Currently there is no implemented way to include that layer in the model. But you can save it with torch.save() and load it with torch.load() and apply it on top of the sentence embeddings. But as mentioned, if your task is classification, then using CrossEncoders achieve much better performances. |
I have read this thread, and your answers were helpful @nreimers. Thank you! |
You can also pass a single text to the CrossEncoder class. It will work withouut changes |
Any idea how to predict the class of a single text and how to find the model accuracy? Thanks! |
@noghte My guess would be to use the class as the second sentence in the pair, and then just look at the predictions (classes) to evaluate the accuracy. Something that we did before looking at Cross Encoders is the latent embedding approach in this blog: https://joeddav.github.io/blog/2020/05/29/ZSL.html Our implementation for a policy project is here. We will try CrossEncoders soon! But we realized the following:
|
@noghte Just wanted to check-in, how did you manage to do your classification task in the end with CrossEncoders? Specifically the label prediction for a single sentence? |
@thefirebanks Have a look here: Instead of passing two sentences, you just pass one sentence |
@nreimers Thank you for your quick reply! At the top of the file you linked, it says:
When I try giving one sentence, I build the train dataloader in this way:
where and did something similar for the
However, while training the accuracy remains at 0.0 even after 10 epochs. Am I missing something? |
@thefirebanks I could not use CrossEncoders. Just ignored fine-tuning for now. However, I am planning to try your latent embedding approach later: https://joeddav.github.io/blog/2020/05/29/ZSL.html |
@thefirebanks As evaluator, you can use: CESoftmaxAccuracyEvaluator |
Thank you, using CESoftmaxAccuracyEvaluator worked! @noghte I simply replicated the script that @nreimers linked to, and it worked - the only modification was that I needed to switch |
Hi @nreimers I want to use the sentence-transformers with my multi-label classification problems (can be say classification document with multi-label) as I have the pattern of data frame as below: “sentence 1.sentence2…sentenceN” [‘label1’, ‘label2’, ‘label3’, ’label4’] The first column is the text composed by many sentences and the second column correspond to multi-label (actually there are many labels in total ex: label1, label2,.., labelN ) that each text(many sentences) are assigned to. What might be the right call for using the sentence-transformers in my case to fine-tuning the model? Thank you so much for your reply and your recommendation. |
Hi @ing-david You could try to use the cross-encoder with an MSE loss and have an output vector [0, 1, 1, 0, 0, 1] that represents your multi labels. Not sure if this will be possible with the cross-encoder class. Otherwise you could also embed labels in a vector space and sentences in a vector space and then use a bi-encoder so that sentences with a label are close in vector space to their label. But this is also not easy to be done with sentence-transformers and would require that you take a deeper look at the respective classes and how to use them for your task |
Hi @nreimers, Thanks so much for your work! I'm looking to use the embeddings for clustering in order to better understand a distribution that will undergo regression. I have continuous labels for ~3000 short passages (<300 words). It seems that the most straightforward way to fine-tune with a labeled set is to use the paired-passages with cosine similarity. Do you have an intuition on whether I might be able to bootstrap those labels by passing my data through the bi-encoder once, and then use my continuous labels to weight these pair labels in order to not drive the distribution too far away from the target? Thanks, |
P.s. - My motivation for clustering (unsupervised) is that the labels of my dataset are very noisy, and I'd like to shed some light on the noise. |
Hi @BillManka What you need are labels to text pairs. Just labels for single texts are not sufficient. Bootstrapping the labels for the text pairs is not really necessary. |
Aha. Honestly, I think I was just looking for a quick/dirty hack to tune the embeddings to my data a bit more. |
I.e. weight the cosine similarity label by some scaled version of the distance between the pair members in the supervised space. I have no theory at all to justify it, it would be completely exploratory. |
We want to fine-tune 'bert-large-nli-stsb-mean-tokens' on multi-label classification task. So that we can use the output model to get embeddings out.
We have a bunch of sentences classified into labels. The prime question that I want clarification on:
The closest we have come to is simpletransformers
But is there a better way to go ahead? Thanks in advance for the help
The text was updated successfully, but these errors were encountered: