Use model's own embedding to compute similarity smart tags. #603

Dref360 · 2023-07-23T18:28:08Z

Hello!

It would be useful to me if I could use the model's own embedding to compute similarity smart tags. For my particular use case, semantic embeddings are not useful.

Proposal

config.json

{
	"similarity" : {"faiss_encoder": "model" } // could be 'self-similar'? 
}

One can get the embedding of their HuggingFace model with:

inputs = tokenizer(...)
model : BertModelForSequenceClassification = ...
embedding = model.base_model(**inputs).last_hidden_state[:, 0, :] # Take first token embedding

A simpler approach is to load the same model with the feature-extractor task, but that might be more involved.

pipe = pipeline('feature-extraction', model='your_model', truncation=True)

EDIT: One can also do

pipe2 = pipeline('feature-extraction', model=pipe.model.base_model, tokenizer=pipe.tokenizer)
``` where pipe is the initial PipelineForSequenceClassification.

The text was updated successfully, but these errors were encountered:

gabegma · 2023-08-10T15:11:46Z

Hey @Dref360 - that's a great suggestion, we've had this request a few times. We'll aim to prioritize that in the coming weeks.

Dref360 · 2023-12-27T19:20:27Z

I didn't realize that sentence-transformer can now load model directly from the hub and convert them. The only caveat is that it uses mean pooling instead of the first token. Not a terrible issue AFAIK.

So if your model name is cardiffnlp/twitter-roberta-base-sentiment-latest, Sentence Transformers will extract the Roberta base when you do

model = SentenceTransformers('cardiffnlp/twitter-roberta-base-sentiment-latest').

It also uses authentication so it works with private repo as well.

Closing!

Dref360 added the enhancement New feature or request label Jul 23, 2023

Dref360 closed this as completed Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use model's own embedding to compute similarity smart tags. #603

Use model's own embedding to compute similarity smart tags. #603

Dref360 commented Jul 23, 2023 •

edited

Loading

gabegma commented Aug 10, 2023

Dref360 commented Dec 27, 2023 •

edited

Loading

Use model's own embedding to compute similarity smart tags. #603

Use model's own embedding to compute similarity smart tags. #603

Comments

Dref360 commented Jul 23, 2023 • edited Loading

Proposal

gabegma commented Aug 10, 2023

Dref360 commented Dec 27, 2023 • edited Loading

Dref360 commented Jul 23, 2023 •

edited

Loading

Dref360 commented Dec 27, 2023 •

edited

Loading