# [Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co.](https://github.com/UKPLab/sentence-transformers)

This framework provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various task. Text is embedding in vector space such that similar text is close and can efficiently be found using cosine similarity.



The library provide an increasing number of [state-of-the-art pretrained models](https://www.sbert.net/docs/pretrained_models.html) for more than 100 languages, fine-tuned for various use-cases.

## Installation

```bash
pip install -U sentence-transformers
```

First download a pretrained model.

In [15]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import seaborn as sns

cm = sns.light_palette("blue", as_cmap=True)
model = SentenceTransformer('all-MiniLM-L6-v2')

Then provide some sentences to the model.

In [20]:
sentences = ["The company HuggingFace is based in New York City",
    "Apples are especially bad for your health", 
    "HuggingFace's headquarters are situated in Manhattan"]

sentence_embeddings = model.encode(sentences)

And that's it already. We now have a list of numpy arrays with the embeddings.

In [22]:
similarities = cosine_similarity(sentence_embeddings, sentence_embeddings)

pd.DataFrame(similarities, columns=[f's_{i}' for i in range(len(sentences))],
             index=sentences).style.background_gradient(cmap=cm)

Unnamed: 0,s_0,s_1,s_2
The company HuggingFace is based in New York City,1.0,0.027979,0.843274
Apples are especially bad for your health,0.027979,1.0,0.013139
HuggingFace's headquarters are situated in Manhattan,0.843274,0.013139,1.0


Moreover, This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task.

See [Training Overview](https://www.sbert.net/docs/training/overview.html) for an introduction how to train your own embedding models. We provide [various examples](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training) how to train models on various datasets.

Some highlights are:

- Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...
- Multi-Lingual and multi-task learning
- Evaluation during training to find optimal model
- [10+ loss-functions](https://www.sbert.net/docs/package_reference/losses.html) allowing to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss.