Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's the difference between USE and then SBERT? #64

Closed
Cumberbatch08 opened this issue Nov 29, 2019 · 4 comments
Closed

what's the difference between USE and then SBERT? #64

Cumberbatch08 opened this issue Nov 29, 2019 · 4 comments

Comments

@Cumberbatch08
Copy link

First, many thanks to your paper and code.
But I read the universal sentence encoder(USE) paper, the architecture is like simaese network, they also used the SNLI dataset.
But your result is well performed. So I'm very interested in your work.

@nreimers
Copy link
Member

Hi @Cumberbatch08
sadly the USE papers (at least the ones I know) are extremely high-level, not going really into the details. So it is unclear which architecture they exactly used and how the training was done (exact datasets, exact loss function etc.)

Differences:

  • USE and SBERT both use transformer networks. For USE, it is sadly not clear how many layers they use (most technical details are not provided). USE was trained from scratch (as far as I can tell from the paper), while SBERT uses the BERT / RoBERTa pre-trained wights and just fine-tunes them to produce sentence embeddings.

  • I think the main difference is in the pre-training. USE uses a wide variety of data sets (exact details not provided), specifically target for generating sentence embeddings. BERT was pre-trained on a book corpus and on Wikipedia for producing a language model (see the BERT paper). SBERT than fine-tunes BERT to produce sensible sentence embeddings.

  • USE is in TensorFlow and tuning for your use-case is not straightforward (source code not available, you only get the compiled model from tensorflow-hub). SBERT is based on pytorch and the goal of this repository is, that fine-tuning for your use-case is as simple as possible.

@Cumberbatch08
Copy link
Author

haha, yes, absolutely agreed what you said. The USE don't public much more details, such as the layers, dataset, loss etc.
I get some information about the architecture:
image
Just as you said, maybe the pretraining is important.

@Gurutva
Copy link

Gurutva commented Jan 27, 2022

what would be best USE (https://tfhub.dev/google/universal-sentence-encoder/4) or SBERT models (https://huggingface.co/sentence-transformers) for good semantic search results ?

@nreimers
Copy link
Member

@Gurutva SBERT works much better: https://arxiv.org/pdf/2104.08663v1.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants