# Lesson 4: Sentence Embeddings

- In the classroom, the libraries are already installed for you.
- If you would like to run this code on your own machine, you can install the following:
``` 
    !pip install sentence-transformers
```

- Here is some code that suppresses warning messages.

In [1]:
from transformers.utils import logging
logging.set_verbosity_error()

### Build the `sentence embedding` pipeline using 🤗 Transformers Library

In [2]:
from sentence_transformers import SentenceTransformer

In [3]:
model = SentenceTransformer("all-MiniLM-L6-v2")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

More info on [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).

In [4]:
sentences1 = ['The cat sits outside',
              'A man is playing guitar',
              'The movies are awesome']

In [5]:
embeddings1 = model.encode(sentences1, convert_to_tensor=True)

In [6]:
embeddings1

tensor([[ 0.1392,  0.0030,  0.0470,  ...,  0.0641, -0.0163,  0.0636],
        [ 0.0227, -0.0014, -0.0056,  ..., -0.0225,  0.0846, -0.0283],
        [-0.1043, -0.0628,  0.0093,  ...,  0.0020,  0.0653, -0.0150]])

In [7]:
sentences2 = ['The dog plays in the garden',
              'A woman watches TV',
              'The new movie is so great']

In [8]:
embeddings2 = model.encode(sentences2, 
                           convert_to_tensor=True)

In [9]:
print(embeddings2)

tensor([[ 0.0163, -0.0700,  0.0384,  ...,  0.0447,  0.0254, -0.0023],
        [ 0.0054, -0.0920,  0.0140,  ...,  0.0167, -0.0086, -0.0424],
        [-0.0842, -0.0592, -0.0010,  ..., -0.0157,  0.0764,  0.0389]])


* Calculate the cosine similarity between two sentences as a measure of how similar they are to each other.

In [10]:
from sentence_transformers import util

In [11]:
cosine_scores = util.cos_sim(embeddings1,embeddings2)

In [12]:
print(cosine_scores)

tensor([[ 0.2838,  0.1310, -0.0029],
        [ 0.2277, -0.0327, -0.0136],
        [-0.0124, -0.0465,  0.6571]])


In [13]:
for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i],
                                                 sentences2[i],
                                                 cosine_scores[i][i]))

The cat sits outside 		 The dog plays in the garden 		 Score: 0.2838
A man is playing guitar 		 A woman watches TV 		 Score: -0.0327
The movies are awesome 		 The new movie is so great 		 Score: 0.6571


### Try it yourself! 
- Try this model with your own sentences!

In [14]:
# Create sentence to compare 
sentences3 = ['The bird sits on the tree',
              'A child is playing outside',
              'The movie is all sold out']

In [18]:
# Embed and convert to tensor 
embedding3 = model.encode(sentences3, convert_to_tensor=True)

In [19]:
# Show embedding3
print(embedding3)

tensor([[ 0.0628,  0.0382,  0.0174,  ...,  0.0241,  0.0233,  0.0656],
        [ 0.0293,  0.0223,  0.0333,  ...,  0.0441, -0.0017,  0.0919],
        [-0.0106, -0.0162, -0.0471,  ..., -0.1391, -0.0066,  0.0468]])


In [20]:
# Create second sentence to compare similarity 
sentences4 = ['The bug sits on the tree',
              'A adult is playing outside',
              'The movie is not all sold out']

In [24]:
#Embedd and convert to tensor flow 
embedding4 = model.encode(sentences4, 
                           convert_to_tensor=True)

In [25]:
#Print embedding
print(embedding4)

tensor([[ 0.0351,  0.0230,  0.0077,  ..., -0.0471,  0.0093,  0.1057],
        [ 0.0688,  0.0571,  0.0834,  ..., -0.0493, -0.0082,  0.0452],
        [ 0.0202, -0.0185, -0.0398,  ..., -0.1582,  0.0010,  0.0627]])


###  Compare sentence 3 & 4 and see the similarity difference 

In [29]:
cosine_scores1 = util.cos_sim(embedding3,embedding4)

In [31]:
print(cosine_scores1)

tensor([[0.6246, 0.1619, 0.0392],
        [0.2365, 0.6757, 0.0361],
        [0.0736, 0.0444, 0.9561]])


In [32]:
# Great for loop to compare the diffrences 
for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences3[i],
                                                 sentences4[i],
                                                 cosine_scores1[i][i]))

The bird sits on the tree 		 The bug sits on the tree 		 Score: 0.6246
A child is playing outside 		 A adult is playing outside 		 Score: 0.6757
The movie is all sold out 		 The movie is not all sold out 		 Score: 0.9561


### Insight 

The similar sentences scores are accurate showing a very close meaning to each of the sentences. However, the last sentences about the movies is all sold out and the movie is not all sold out is similar, but the meanig is very different. 

One is sold out while the other is not. 
