## Exploring Similarity Between Text Embedding Tutorial

We can use individual_nlp_explanation to explore how a pre-trained embedding model’s similarity landscape. 

We provide individual_nlp_explanation with our two texts we want to compare as well as our specified model of choice. The explainer function then returns a list of pairs of terms from each text which are seen be similar and dissimilar by the model

Let’s take a simplistic example and generate our similarity increasers and similarity decreasers

In [2]:
from decima2 import individual_nlp_explanation


In [13]:
text1 = "the cat sat on the mat and it purred"
text2 = "the dog lay upon the mat and it barked"
model_name = "bert-base-uncased"

In [16]:
similarity_increasers, similarity_decreasers = individual_nlp_explanation(text1, text2, model_name, output='text')


In [21]:
for i in range(len(similarity_increasers)):
    print("Pairs that increase similarity:", similarity_increasers[i])
print("")
for i in range(len(similarity_decreasers)):
    print("Pairs that decrease similarity:", similarity_decreasers[i])

Pairs that increase similarity: ('sat on the mat and it', 'the mat', np.float32(0.16405481))
Pairs that increase similarity: ('on the mat and it', 'the dog', np.float32(0.16475308))
Pairs that increase similarity: ('the cat sat on the', 'dog lay upon the mat and it', np.float32(0.1664756))
Pairs that increase similarity: ('the cat', 'dog lay upon the mat and it', np.float32(0.16682768))
Pairs that increase similarity: ('mat and it', 'lay upon', np.float32(0.1682499))
Pairs that increase similarity: ('on the mat and it', 'lay upon', np.float32(0.16850364))
Pairs that increase similarity: ('sat on', 'dog lay upon the mat and it', np.float32(0.1688664))
Pairs that increase similarity: ('cat sat on the mat and it', 'lay upon', np.float32(0.1700396))
Pairs that increase similarity: ('sat on the mat and it', 'the dog', np.float32(0.17831814))
Pairs that increase similarity: ('sat on the mat and it', 'lay upon', np.float32(0.18724602))

Pairs that decrease similarity: ('on the mat and it purr

From these similarity decreasers and increasers we can seee that the terms "and it purred" and "and it barked" are seen by the embedding model as dissimilar while the terms "sat on the mat and it" and "lay upon" are seen as most similar. 

This is interesting, the embedding model has clearly learned that barked and purred are different. Let's try and explore why the model thinks the other two terms are similar 

In [22]:
text1 = "sat on the mat and it"
text2 = "lay upon"
model_name = "bert-base-uncased"

In [23]:
similarity_increasers, similarity_decreasers = individual_nlp_explanation(text1, text2, model_name, output='text')


In [24]:
for i in range(len(similarity_increasers)):
    print("Pairs that increase similarity:", similarity_increasers[i])
print("")
for i in range(len(similarity_decreasers)):
    print("Pairs that decrease similarity:", similarity_decreasers[i])

Pairs that increase similarity: ('and it', 'upon', np.float32(0.019421458))
Pairs that increase similarity: ('and it', 'lay', np.float32(0.020947754))
Pairs that increase similarity: ('the mat and', 'lay', np.float32(0.12305528))
Pairs that increase similarity: ('the mat and', 'upon', np.float32(0.12549114))

Pairs that decrease similarity: ('sat on the mat', 'upon', np.float32(-0.106793344))
Pairs that decrease similarity: ('sat on the', 'upon', np.float32(-0.10494733))
Pairs that decrease similarity: ('sat on the', 'lay', np.float32(-0.1005885))
Pairs that decrease similarity: ('sat on the mat', 'lay', np.float32(-0.0981608))
Pairs that decrease similarity: ('sat on', 'upon', np.float32(-0.08655596))
Pairs that decrease similarity: ('sat on', 'lay', np.float32(-0.07990849))
Pairs that decrease similarity: ('mat and it', 'upon', np.float32(-0.072120726))
Pairs that decrease similarity: ('mat and it', 'lay', np.float32(-0.069006324))
Pairs that decrease similarity: ('the mat and it', '

By decomposing this term we can see that the embedding model has learned that 'the mat and','upon' or 'the mat and','lay' are similar but 'sat on the mat', 'upon' are dissimilar. Interestingly we can infer that the model has learned that we may want to 'lay upon a mat' but we would not 'sit upon the mat'.  