# Sentence Embeddings 📝

## Introduction
This notebook demonstrates how to use the `sentence-transformers` library to generate sentence embeddings for various NLP tasks such as similarity measurement, clustering, and more.

## Requirements
If you would like to run this code on your own machine, you can install the following:
``` 
    !pip install sentence-transformers
```
- The sentence-transformers library is particularly useful when working with tasks that require understanding or comparing the meaning of sentences or text. Here are some common scenarios where you can use this library:

  - Sentence Embeddings: The library can generate vector representations (embeddings) of sentences or paragraphs. This is useful for tasks like semantic search, clustering, or finding sentence similarities.

  Example use case: Semantic Search: You can encode a query and a set of documents, then compare their embeddings to find the most relevant documents.

  - Text Similarity: You can compare the semantic similarity between two sentences or     
    texts. This is useful for tasks like:

  - Duplicate detection: Find similar or duplicate sentences or paragraphs in datasets.
    Question answering: Match user queries with the most relevant answers from a set of candidate answers.
    Text Classification: You can use sentence embeddings as features for downstream machine learning models, including classification tasks like sentiment analysis or topic detection.

  - Paraphrase Detection: The library can be used to determine if two sentences are 
    paraphrases, i.e., they express the same meaning in different words.

  - Clustering: You can use embeddings to cluster similar sentences or documents. This  
    is useful when grouping related content or discovering topics in large text corpora.

## Setup
Let's start by installing and importing the necessary libraries.

### Install and Import Libraries

In [1]:
# Install the sentence-transformers library
%pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.3.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.3.1-py3-none-any.whl (268 kB)
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-3.3.1
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Suppress non-critical log messages
from transformers.utils import logging
logging.set_verbosity_error()

# Import necessary libraries
from sentence_transformers import SentenceTransformer, util
import torch

### Build the `sentence embedding` pipeline using 🤗 Transformers Library

#The SentenceTransformer class is highly useful for transforming textual data into a format that can be used for machine learning models, similarity tasks, or information retrieval.
from sentence_transformers import SentenceTransformer

### Build the Sentence Embedding Pipeline

We'll use the SentenceTransformer class to transform textual data into embeddings. These embeddings can be used for machine learning models, similarity tasks, or information retrieval.

## Load Sentence Embedding Model

Selection of the model based on the task that will be performed (similarity of sentences)

In [4]:
# Load the pre-trained model for generating sentence embeddings
model = SentenceTransformer("all-MiniLM-L6-v2")

More info on [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).

### Example 1: Check Similarity Among Sentences with No Similarity

In [None]:
# Define three sentences that have no similarity
sentences1 = [
    'The cat sits outside',
    'A man is playing guitar',
    'The movies are awesome'
]

# Encode the sentences to generate embeddings
embeddings1 = model.encode(sentences1, convert_to_tensor=True)

# Display the embeddings
print(embeddings1)

# Define three sentences that have no similarity
sentences1 = ['The cat sits outside',
              'A man is playing guitar',
              'The movies are awesome']

embeddings1

## Explanation
The embeddings generated for each sentence are low, indicating that the sentences have no similarity.

### Example 2: Check Similarity Among Another Set of Sentences with No Similarity

In [None]:
# Define another set of three sentences that have no similarity
sentences2 = [
    'The dog plays in the garden',
    'A woman watches TV',
    'The new movie is so great'
]

# Encode the sentences to generate embeddings
embeddings2 = model.encode(sentences2, convert_to_tensor=True)

# Display the embeddings
print(embeddings2)

sentences2 = ['The dog plays in the garden',
              'A woman watches TV',
              'The new movie is so great']

embeddings2 = model.encode(sentences2, 
                           convert_to_tensor=True)

print(embeddings2)

## Explanation
The embeddings generated for each sentence are low, indicating that the sentences have no similarity.

## Cosine Similarity
Calculate the cosine similarity between the embeddings of the two sets of sentences to measure how similar they are to each other.

### Calculate Cosine Similarity

* Calculate the cosine similarity between two sentences as a measure of how similar they are to each other.

In [None]:
# Calculate the cosine similarity between the embeddings of the two sets of sentences
cosine_scores = util.cos_sim(embeddings1, embeddings2)

# Display the cosine similarity scores
print(cosine_scores)

from sentence_transformers import util

cosine_scores = util.cos_sim(embeddings1,embeddings2)

print(cosine_scores)

## Display Cosine Similarity Scores

In [None]:
# Print the cosine similarity scores for each pair of sentences
for i in range(len(sentences1)):
    print(f"{sentences1[i]} \t\t {sentences2[i]} \t\t Score: {cosine_scores[i][i]:.4f}")

for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i],
                                                 sentences2[i],
                                                 cosine_scores[i][i]))

## Explanation
The similarity between the third pair of sentences is relatively high (score: 0.6571), indicating some level of similarity.

### Example 3: Check Similarity Among Sentences with Similarity

In [None]:
# Define two sentences that have similarity
sentences3 = [
    'She loves reading books',
    'She likes to read stories'
]

# Encode the sentences to generate embeddings
embeddings3 = model.encode(sentences3, convert_to_tensor=True)

# Display the embeddings
print(embeddings3)

# Define two sentences that have similarity
sentences3 = ['She loves reading books',
              'She likes to read stories']

embeddings3 = model.encode(sentences3, convert_to_tensor=True)

embeddings3

## Explanation
The embeddings generated for these sentences are high, indicating that there is a similarity between the sentences.

### Conclusion
This notebook demonstrated how to use the sentence-transformers library to generate sentence embeddings and measure similarity between sentences. The embeddings can be used for various NLP tasks such as clustering, text classification, and more.

### Try it yourself! 
- Try this model with your own sentences!