
# Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co.

This notebook is implemented based on github repo of Sentence Transformers. Repo can be found [here](https://github.com/UKPLab/sentence-transformers)

If you want to explore through more functionalties of Sentence Transformers. Check this [link](https://www.sbert.net/) out



This framework provides an easy method to compute dense vector representations for **sentences**, **paragraphs**, and **images**. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various task. Text is embedding in vector space such that similar text is close and can efficiently be found using cosine similarity.

For the **full documentation**, see **[www.SBERT.net](https://www.sbert.net)**.


## Installation

We recommend **Python 3.6** or higher, **[PyTorch 1.6.0](https://pytorch.org/get-started/locally/)** or higher and **[transformers v4.6.0](https://github.com/huggingface/transformers)** or higher. The code does **not** work with Python 2.7.

In [1]:
!pip install -U sentence-transformers


Collecting sentence-transformers
  Downloading sentence-transformers-2.2.0.tar.gz (79 kB)
     |████████████████████████████████| 79 kB 947 kB/s            
[?25h  Preparing metadata (setup.py) ... [?25l- done
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25l- \ | done
[?25h  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.0-py3-none-any.whl size=120748 sha256=b6af51f180da3e7d763cab2afcfc9071b72c1359b273aead2f4ec4b2c4d45fea
  Stored in directory: /root/.cache/pip/wheels/83/c0/df/b6873ab7aac3f2465aa9144b6b4c41c4391cfecc027c8b07e7
Successfully built sentence-transformers
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-2.2.0



**PyTorch with CUDA**

If you want to use a GPU / CUDA, you must install PyTorch with the matching CUDA Version. Follow
[PyTorch - Get Started](https://pytorch.org/get-started/locally/) for further details how to install PyTorch.


## Getting Started

See [Quickstart](https://www.sbert.net/docs/quickstart.html) in our documenation.

[This example](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications/computing-embeddings/computing_embeddings.py) shows you how to use an already trained Sentence Transformer model to embed sentences for another task.

First download a pretrained model.

In [2]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Then provide some sentences to the model.


In [3]:
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']
sentence_embeddings = model.encode(sentences)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

And that's it already. We now have a list of numpy arrays with the embeddings.



In [4]:
for sentence, embedding in zip(sentences, sentence_embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")

Sentence: This framework generates embeddings for each input sentence
Embedding: [-1.37173515e-02 -4.28515710e-02 -1.56286471e-02  1.40537592e-02
  3.95537801e-02  1.21796258e-01  2.94334032e-02 -3.17523926e-02
  3.54959443e-02 -7.93140307e-02  1.75878201e-02 -4.04369719e-02
  4.97259349e-02  2.54912544e-02 -7.18700439e-02  8.14968720e-02
  1.47072750e-03  4.79627401e-02 -4.50335816e-02 -9.92174894e-02
 -2.81769689e-02  6.45046309e-02  4.44670655e-02 -4.76217382e-02
 -3.52952220e-02  4.38671447e-02 -5.28566167e-02  4.33043751e-04
  1.01921499e-01  1.64072178e-02  3.26996446e-02 -3.45987007e-02
  1.21339457e-02  7.94870779e-02  4.58348868e-03  1.57778561e-02
 -9.68207512e-03  2.87626125e-02 -5.05806208e-02 -1.55793829e-02
 -2.87907124e-02 -9.62278526e-03  3.15556601e-02  2.27349363e-02
  8.71448964e-02 -3.85027491e-02 -8.84718671e-02 -8.75495654e-03
 -2.12342795e-02  2.08923593e-02 -9.02077630e-02 -5.25732674e-02
 -1.05638998e-02  2.88311075e-02 -1.61454938e-02  6.17835717e-03
 -1.23234


## Application Examples

You can use this framework for:

- [Computing Sentence Embeddings](https://www.sbert.net/examples/applications/computing-embeddings/README.html)
- [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)
- [Clustering](https://www.sbert.net/examples/applications/clustering/README.html)
- [Paraphrase Mining](https://www.sbert.net/examples/applications/paraphrase-mining/README.html)
 - [Translated Sentence Mining](https://www.sbert.net/examples/applications/parallel-sentence-mining/README.html)
 - [Semantic Search](https://www.sbert.net/examples/applications/semantic-search/README.html)
 - [Retrieve & Re-Rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) 
 - [Text Summarization](https://www.sbert.net/examples/applications/text-summarization/README.html) 
- [Multilingual Image Search, Clustering & Duplicate Detection](https://www.sbert.net/examples/applications/image-search/README.html)

and many more use-cases.

For all examples, see [examples/applications](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications).
