Fine Tuning Models #350

farhaanbukhsh · 2020-08-11T07:46:54Z

We want to fine-tune 'bert-large-nli-stsb-mean-tokens' on multi-label classification task. So that we can use the output model to get embeddings out.
We have a bunch of sentences classified into labels. The prime question that I want clarification on:

What are the steps to fine tune the models provided by sentence transformer ?
Is there a script or a repo which does this and I can cross reference here ?

The closest we have come to is simpletransformers

But is there a better way to go ahead? Thanks in advance for the help

farhaanbukhsh · 2020-08-11T13:23:31Z

@nreimers can you please help us out here? Any help will be highly appreciated, Thanks a lot 😄

nreimers · 2020-08-11T13:52:06Z

Hi @farhaanbukhsh
I am not sure what you want / what type of data you have?

Do you want multi-label classification for single sentences? Then I can recommend to use transformers package and to fine-tune the model. Sentence-Transformers is not needed for that.

You want to fine-tune sentence embeddings with a training set where you only have labels and single sentences, and then use the sentence embeddings for example with cosine similarity for tasks like clustering? This is a non-trivial case and sadly there is no single "do this" answer. It heavily depends on what your dataset looks like and how you want to use the sentence embeddings afterwards. You can have a look here, how a dataset with (label, single_sentence) can be used with BatchHardTripletLoss: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py

You have sentence pairs and labels? Then you can maybe use the same setup as for NLI:
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_nli.py

Best
Nils Reimers

farhaanbukhsh · 2020-08-11T14:02:20Z

Hi @farhaanbukhsh
I am not sure what you want / what type of data you have?

Do you want multi-label classification for single sentences? Then I can recommend to use transformers package and to fine-tune the model. Sentence-Transformers is not needed for that.

You want to fine-tune sentence embeddings with a training set where you only have labels and single sentences, and then use the sentence embeddings for example with cosine similarity for tasks like clustering?

yes, exactly what we are trying to do.

This is a non-trivial case and sadly there is no single "do this" answer. It heavily depends on what your dataset looks like and how you want to use the sentence embeddings afterwards. You can have a look here, how a dataset with (label, single_sentence) can be used with BatchHardTripletLoss: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py

You have sentence pairs and labels? Then you can maybe use the same setup as for NLI:
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_nli.py

Best
Nils Reimers

Thanks a ton @nreimers for replying, means a lot ❤️

What I have is a simple csv file where for a given sentence I have assigned labels, for example:

The heater from amazon was damaged. Labels: ['defective product', 'electric appliance'] so for this do you think https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py

can be used?

nreimers · 2020-08-11T14:25:11Z

Yes, for this BatchHardTripletLoss seems a good choice.

For triplet loss, have a look at this good blog article:
https://omoindrot.github.io/triplet-loss

With BatchHardTripletLoss, sentences (texts) that have the same labels will become close in vector space, while sentences with a different label will be further away. At the end you will have clusters in your vector space, e.g. all sentences talking about 'defective product' will be in one space, while sentences with label 'great price' while be somewhere else in vector space.

Best
Nils Reimers

farhaanbukhsh · 2020-08-12T05:38:03Z

Yes, for this BatchHardTripletLoss seems a good choice.

For triplet loss, have a look at this good blog article:
https://omoindrot.github.io/triplet-loss

With BatchHardTripletLoss, sentences (texts) that have the same labels will become close in vector space, while sentences with a different label will be further away. At the end you will have clusters in your vector space, e.g. all sentences talking about 'defective product' will be in one space, while sentences with label 'great price' while be somewhere else in vector space.

Best
Nils Reimers

Thanks a loads @nreimers 💯

farhaanbukhsh · 2020-08-12T10:07:40Z

Hey @nreimers , we faced a problem the script:

https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py

Doesn't work out of the box, the data source: https://cogcomp.seas.upenn.edu/Data/QA/QC/TREC_10.label

has just one sentence while for

evaluator = TripletEvaluator.from_input_examples(val, name='dev')

to work it needs 3 sentence I guess there is some kind of a confusion to use TripletReader and InputExample being used .

if you could point me the right data source to be used for TripletLoss it would be lot of help. Thanks a lot in advance

nreimers · 2020-08-12T11:11:02Z

Hi @farhaanbukhsh
I fixed that example, it should be working now.

The triplets for dev & test have to be created before they are bassed to TripletEvaluator

Best
Nils

farhaanbukhsh · 2020-08-12T11:23:08Z

Hi @farhaanbukhsh
I fixed that example, it should be working now.

The triplets for dev & test have to be created before they are bassed to TripletEvaluator

Best
Nils

Thanks a tone for all the work here ❤️

farhaanbukhsh · 2020-08-12T11:35:38Z

@nreimers do I need a GPU to run BatchSemiHardTripletLoss.py or I can run without it ?

Getting this error:

AssertionError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

coming from /sentence_transformers/losses/BatchSemiHardTripletLoss.py", line 60, in batch_semi_hard_triplet_loss

nreimers · 2020-08-12T11:58:30Z

Fixed that with the latest push, but you would need to install the framework from sources.

Or you use one of the two other batch hard losses. They work without GPU in the latest release

farhaanbukhsh · 2020-08-12T12:02:19Z

Got it :) thanks again

farhaanbukhsh · 2020-08-13T08:30:57Z

Hey @nreimers ,

I just have few follow up questions, I went through the TripletLoss blog that you pointed out, https://omoindrot.github.io/triplet-loss. This did help a lot, so the questions are:

In the script https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py, we have randomized forming a Triplet, is this done for demonstration purpose or we can use this technique off the shelf?
In case we can't then we need to form our own set of triplets right ? Anyhow is this https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/datasets/wikipedia-sections-triplets.zip dataset related to triplet loss and can it be used?

I also went through https://towardsdatascience.com/image-similarity-using-triplet-loss-3744c0f67973 and here it is mentioned we have to curate the Triplet dataset and supply it to the tunning algorithm. So is human curation required here?

nreimers · 2020-08-13T09:15:16Z

Hi @farhaanbukhsh
The random triplets are formed only for the dev & test set. For the train set, this is not used.
Random triplet for dev & test set are rather simple to distinguish. Depending on your application, you might need better triplets to fully evaluate the performance of your model.

For training, the application uses what is called "Batch All / Hard / SemiHard Triplets" (also explained in the link). You generate a mini batch with n sentences. It then checks out of the n x n x n possible combinations, which are valid triplets such that anchor and positive have the same label and the negative a different label. It then uses these valid triplets to compute the loss.

The quality (difficulty) of the triplets are quite important to get good results. If the triplets are too easy, the model will not learning something. With the Batch*Triplets approach you already create large combinations of triplets, hence, there designing difficult triplets is not really needed.

Human curation for training is not needed.

Best
Nils Reimers

ironllamagirl · 2020-08-25T21:31:34Z

Hi!
Just wanted to inquire about this file https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_batch_hard_trec_continue_training.py

Has it been placed somewhere else? can't seem to open the link

nreimers · 2020-08-25T21:32:46Z

https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_batch_hard_trec_continue_training.py

thefirebanks · 2021-01-11T05:18:25Z

Hi @nreimers ! IThank you for the advice written on this thread. It appears that the file https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_batch_hard_trec_continue_training.py is giving a 404 error.

@farhaanbukhsh We are working on a similar task as you - e.x given a sentence "The tomatoes were rotten" and a set of possible labels ["fruit", "vegetable", "root"] I want to assign one label to the sentence. Did you end up using the triple loss methodology successfully? I was also looking at this thread.

I have 2 questions:

When we fined tuned a model using the code from https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli.py, but we found that the resulting model only works in yielding embeddings and not necessarily classifiying text. Is there a way of including the linear layer used for classification in fine tuning, in the saved model? that way when we load the model we can do something like model.predict() instead of just model.encode()?
If not, should we load the model embeddings/fine tune using huggingface?

Thank you in advance!

nreimers · 2021-01-11T08:02:42Z

@thefirebanks
It was renamed: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_batch_hard_trec.py

The purpose of SentenceTransformer is to create a model that can generate meaningful embeddings for text. If your task is classification, then using sentence embeddings is in most cases the wrong approach. In that case, CrossEncoder work much better: https://www.sbert.net/examples/training/cross-encoder/README.html

Currently there is no implemented way to include that layer in the model. But you can save it with torch.save() and load it with torch.load() and apply it on top of the sentence embeddings.

But as mentioned, if your task is classification, then using CrossEncoders achieve much better performances.

noghte · 2021-01-25T18:23:12Z

@thefirebanks
It was renamed: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_batch_hard_trec.py

The purpose of SentenceTransformer is to create a model that can generate meaningful embeddings for text. If your task is classification, then using sentence embeddings is in most cases the wrong approach. In that case, CrossEncoder work much better: https://www.sbert.net/examples/training/cross-encoder/README.html

Currently there is no implemented way to include that layer in the model. But you can save it with torch.save() and load it with torch.load() and apply it on top of the sentence embeddings.

But as mentioned, if your task is classification, then using CrossEncoders achieve much better performances.

I have read this thread, and your answers were helpful @nreimers. Thank you!
I also have a sentence classification task, in which each row of my dataset is a long text with a label. But the CrossEncoder example you provide requires sentence pairs. I will try to figure it out, but if you have an example, it would be great to share the link.

nreimers · 2021-01-25T18:30:43Z

You can also pass a single text to the CrossEncoder class. It will work withouut changes

noghte · 2021-02-01T17:22:48Z

You can also pass a single text to the CrossEncoder class. It will work without changes
I could fine-tune a BERT model using CrossEncoder. However, for prediction, it seems it needs sentence pairs:

You pass to model.predict a list of sentence pairs. Note, Cross-Encoder do not work on individual sentence, you have to pass sentence pairs.
source: https://www.sbert.net/examples/applications/cross-encoder/README.html#cross-encoders-usage

Any idea how to predict the class of a single text and how to find the model accuracy?

Thanks!

thefirebanks · 2021-02-04T05:53:52Z

@noghte My guess would be to use the class as the second sentence in the pair, and then just look at the predictions (classes) to evaluate the accuracy. Something that we did before looking at Cross Encoders is the latent embedding approach in this blog:

https://joeddav.github.io/blog/2020/05/29/ZSL.html

Our implementation for a policy project is here.

We will try CrossEncoders soon! But we realized the following:

Fine-tuned sentence embeddings by S-BERT given as inputs to more commonly used classifiers such as Random Forests or SVMs can actually give a really good performance in some texts. Hope this helps!

thefirebanks · 2021-04-11T04:52:53Z

@noghte Just wanted to check-in, how did you manage to do your classification task in the end with CrossEncoders? Specifically the label prediction for a single sentence?

nreimers · 2021-04-11T11:42:42Z

@thefirebanks Have a look here:
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/cross-encoder/training_nli.py

Instead of passing two sentences, you just pass one sentence

thefirebanks · 2021-04-11T14:56:38Z

@nreimers Thank you for your quick reply!

At the top of the file you linked, it says:

It does NOT produce a sentence embedding and does NOT work for individual sentences.

When I try giving one sentence, I build the train dataloader in this way:

train_samples = []
for sent, label in zip(X_train, y_train):
    label_id = label2int[label]
    train_samples.append(InputExample(texts=[sent], label=label_id))

train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=train_batch_size)

where sent = "The quick fox did something" and label = 1.

and did something similar for the dev_samples:

dev_samples = []
for sent, label in zip(X_dev, y_dev):
    label_id = label2int[label]
    dev_samples.append(InputExample(texts=[sent], label=label_id))

However, while training the accuracy remains at 0.0 even after 10 epochs.

Am I missing something?

noghte · 2021-04-11T20:36:30Z

@noghte Just wanted to check-in, how did you manage to do your classification task in the end with CrossEncoders? Specifically the label prediction for a single sentence?

@thefirebanks I could not use CrossEncoders. Just ignored fine-tuning for now. However, I am planning to try your latent embedding approach later: https://joeddav.github.io/blog/2020/05/29/ZSL.html
Please let me know if you succeed in using CrossEncoders for your task.

nreimers · 2021-04-12T07:09:09Z

@thefirebanks
Likely there is some issue how you define the evaluation or which data you pass to it.

As evaluator, you can use: CESoftmaxAccuracyEvaluator

thefirebanks · 2021-04-13T15:30:37Z

Thank you, using CESoftmaxAccuracyEvaluator worked!

@noghte I simply replicated the script that @nreimers linked to, and it worked - the only modification was that I needed to switch CEBinaryAccuracyEvaluator by CESoftmaxAccuracyEvaluator

Sok-Vichea · 2021-06-15T10:05:24Z

Hi @nreimers I want to use the sentence-transformers with my multi-label classification problems (can be say classification document with multi-label) as I have the pattern of data frame as below:

“sentence 1.sentence2…sentenceN” [‘label1’, ‘label2’, ‘label3’, ’label4’]
“sentence 1.sentence2…sentenceN”. [‘label1’, ‘label2’, ‘label3’]
.......

The first column is the text composed by many sentences and the second column correspond to multi-label (actually there are many labels in total ex: label1, label2,.., labelN ) that each text(many sentences) are assigned to. What might be the right call for using the sentence-transformers in my case to fine-tuning the model? Thank you so much for your reply and your recommendation.

nreimers · 2021-06-16T08:07:05Z

Hi @ing-david
hmm, that is not straight forward, as sentence-transformers was not designed for such a task.

You could try to use the cross-encoder with an MSE loss and have an output vector [0, 1, 1, 0, 0, 1] that represents your multi labels. Not sure if this will be possible with the cross-encoder class.

Otherwise you could also embed labels in a vector space and sentences in a vector space and then use a bi-encoder so that sentences with a label are close in vector space to their label. But this is also not easy to be done with sentence-transformers and would require that you take a deeper look at the respective classes and how to use them for your task

BillManka · 2021-06-25T17:33:42Z

Hi @nreimers,

Thanks so much for your work! I'm looking to use the embeddings for clustering in order to better understand a distribution that will undergo regression. I have continuous labels for ~3000 short passages (<300 words).

It seems that the most straightforward way to fine-tune with a labeled set is to use the paired-passages with cosine similarity. Do you have an intuition on whether I might be able to bootstrap those labels by passing my data through the bi-encoder once, and then use my continuous labels to weight these pair labels in order to not drive the distribution too far away from the target?

Thanks,
Bill

BillManka · 2021-06-25T17:41:44Z

P.s. - My motivation for clustering (unsupervised) is that the labels of my dataset are very noisy, and I'd like to shed some light on the noise.

nreimers · 2021-06-25T19:07:29Z

Hi @BillManka
Not sure what you mean.

What you need are labels to text pairs. Just labels for single texts are not sufficient. Bootstrapping the labels for the text pairs is not really necessary.

BillManka · 2021-06-25T19:28:00Z

Aha. Honestly, I think I was just looking for a quick/dirty hack to tune the embeddings to my data a bit more.
Thanks again!

BillManka · 2021-06-25T19:33:01Z

I.e. weight the cosine similarity label by some scaled version of the distance between the pair members in the supervised space. I have no theory at all to justify it, it would be completely exploratory.

farhaanbukhsh mentioned this issue Aug 12, 2020

Fine Tuning Embedding #353

Closed

kechxu mentioned this issue Oct 21, 2020

How to use Transformers to load distiluse-base-multilingual-cased for fine tuning purpose #513

Closed

thefirebanks mentioned this issue Feb 11, 2021

SBERT for classification wri-dssg-omdena/policy-data-analyzer#51

Closed

thefirebanks mentioned this issue Apr 11, 2021

Explore CrossEncoders wri-dssg-omdena/policy-data-analyzer#74

Open

kddubey mentioned this issue Sep 6, 2021

How to add handcrafted features？ #1147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine Tuning Models #350

Fine Tuning Models #350

farhaanbukhsh commented Aug 11, 2020 •

edited

Loading

farhaanbukhsh commented Aug 11, 2020

nreimers commented Aug 11, 2020

farhaanbukhsh commented Aug 11, 2020

nreimers commented Aug 11, 2020

farhaanbukhsh commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020

nreimers commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020 •

edited

Loading

nreimers commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020

farhaanbukhsh commented Aug 13, 2020

nreimers commented Aug 13, 2020

ironllamagirl commented Aug 25, 2020

nreimers commented Aug 25, 2020

thefirebanks commented Jan 11, 2021

nreimers commented Jan 11, 2021

noghte commented Jan 25, 2021

nreimers commented Jan 25, 2021

noghte commented Feb 1, 2021

thefirebanks commented Feb 4, 2021 •

edited

Loading

thefirebanks commented Apr 11, 2021

nreimers commented Apr 11, 2021

thefirebanks commented Apr 11, 2021 •

edited

Loading

noghte commented Apr 11, 2021

nreimers commented Apr 12, 2021

thefirebanks commented Apr 13, 2021

Sok-Vichea commented Jun 15, 2021

nreimers commented Jun 16, 2021

BillManka commented Jun 25, 2021 •

edited

Loading

BillManka commented Jun 25, 2021

nreimers commented Jun 25, 2021

BillManka commented Jun 25, 2021

BillManka commented Jun 25, 2021

Fine Tuning Models #350

Fine Tuning Models #350

Comments

farhaanbukhsh commented Aug 11, 2020 • edited Loading

farhaanbukhsh commented Aug 11, 2020

nreimers commented Aug 11, 2020

farhaanbukhsh commented Aug 11, 2020

nreimers commented Aug 11, 2020

farhaanbukhsh commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020

nreimers commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020 • edited Loading

nreimers commented Aug 12, 2020

farhaanbukhsh commented Aug 12, 2020

farhaanbukhsh commented Aug 13, 2020

nreimers commented Aug 13, 2020

ironllamagirl commented Aug 25, 2020

nreimers commented Aug 25, 2020

thefirebanks commented Jan 11, 2021

nreimers commented Jan 11, 2021

noghte commented Jan 25, 2021

nreimers commented Jan 25, 2021

noghte commented Feb 1, 2021

thefirebanks commented Feb 4, 2021 • edited Loading

thefirebanks commented Apr 11, 2021

nreimers commented Apr 11, 2021

thefirebanks commented Apr 11, 2021 • edited Loading

noghte commented Apr 11, 2021

nreimers commented Apr 12, 2021

thefirebanks commented Apr 13, 2021

Sok-Vichea commented Jun 15, 2021

nreimers commented Jun 16, 2021

BillManka commented Jun 25, 2021 • edited Loading

BillManka commented Jun 25, 2021

nreimers commented Jun 25, 2021

BillManka commented Jun 25, 2021

BillManka commented Jun 25, 2021

farhaanbukhsh commented Aug 11, 2020 •

edited

Loading

farhaanbukhsh commented Aug 12, 2020 •

edited

Loading

thefirebanks commented Feb 4, 2021 •

edited

Loading

thefirebanks commented Apr 11, 2021 •

edited

Loading

BillManka commented Jun 25, 2021 •

edited

Loading