## FAISS (Facebook AI Semantic Similarity)

In [1]:
sentences = [
    "The cat is sleeping on the couch.",
    "I enjoy reading books in my free time.",
    "The sunsets on the beach are breathtaking.",
    "She plays the piano beautifully.",
    "The mountains are covered in snow.",
    "The conference was informative and engaging.",
    "Cooking is one of my favorite hobbies.",
    "The city skyline at night is stunning.",
    "He always makes me laugh with his jokes.",
    "Learning new things is exciting.",
    "The garden is full of colorful flowers.",
    "She took a long walk in the forest.",
    "Technology is shaping the future.",
    "The movie kept me on the edge of my seat.",
    "Traveling to new places broadens the mind.",
    "The painting depicts a peaceful countryside.",
    "He's an expert in his field of study.",
    "The rain pattered softly on the window.",
    "She wrote a heartfelt letter to her friend.",
    "The recipe calls for fresh ingredients.",
    "The team celebrated their victory enthusiastically.",
    "The novel explores themes of love and loss.",
    "He solved the puzzle in record time.",
    "The river winds its way through the valley.",
    "She gave a passionate speech about equality.",
    "The architecture of the building is impressive.",
    "The children played games in the park.",
    "The aroma of freshly baked bread filled the air.",
    "He captured a stunning photograph of the sunrise.",
    "She designed her own unique fashion line.",
    "The history museum is full of artifacts.",
    "The thunderstorm rumbled in the distance.",
    "He's known for his brilliant scientific discoveries.",
    "The laughter of children is infectious.",
    "She performed a graceful dance on stage.",
    "The recipe has been passed down through generations.",
    "The waves crashed against the rocky shore.",
    "He demonstrated a clever magic trick.",
    "The company is committed to sustainability.",
    "She sang a soulful song with emotion.",
    "The bakery sells a variety of delicious pastries.",
    "He embarked on a journey of self-discovery.",
    "The starry night sky is a sight to behold.",
    "She volunteered at the local animal shelter.",
    "The forest is home to diverse wildlife.",
    "He created a masterpiece with his paintbrush.",
    "The book's plot twists kept me guessing.",
    "She enjoys experimenting with new recipes.",
    "The athlete broke a world record.",
    "The scent of flowers perfumed the garden.",
    "He crafted a intricate sculpture from marble.",
    "The scientific experiment yielded unexpected results.",
    "She wrote a compelling story that moved readers.",
    "The old castle is shrouded in mystery.",
    "He composed a hauntingly beautiful melody.",
    "The urban skyline is a blend of modernity and tradition.",
    "She captured the essence of nature in her artwork.",
    "The children eagerly anticipated the circus.",
    "He composed a heartfelt love letter.",
    "The historical novel is set during a turbulent time.",
    "She danced with grace and elegance.",
    "The recipe calls for a pinch of salt.",
    "He discovered a hidden treasure in the attic.",
    "The city comes alive with festivals in the summer.",
    "She's a talented musician with a unique style.",
    "The garden flourishes with vibrant colors.",
    "He's a master storyteller with a vivid imagination.",
    "The scent of pine fills the air in the forest.",
    "She delivered an inspiring commencement speech.",
    "The play's dialogue is witty and clever.",
    "He captured the beauty of the landscape in his painting.",
    "The aroma of coffee lured me into the cafe.",
    "She's a skilled photographer who captures emotion.",
    "The thunderstorm created a symphony of sound.",
    "He crafted a delicate piece of jewelry.",
    "The novel's characters are relatable and engaging.",
    "The aroma of spices wafted from the kitchen.",
    "She painted a breathtaking sunset scene.",
    "The snowfall transformed the city into a winter wonderland.",
    "He's an accomplished chef with a passion for cooking.",
    "The dancers moved in perfect harmony.",
    "The recipe requires precise measurements.",
    "She explored the ancient ruins with wonder.",
    "The symphony orchestra played a moving composition.",
    "He wrote a thought-provoking essay on social issues.",
    "The fragrance of flowers fills the garden.",
    "She's a dedicated teacher who inspires her students.",
    "The city's architecture reflects its rich history.",
    "He captured a candid moment with his camera.",
    "The characters in the story undergo personal growth.",
    "The aroma of freshly brewed tea is comforting.",
    "She's a talented actress with a versatile range.",
    "The raindrops tapped rhythmically on the roof.",
    "He carved an intricate design into the wood.",
    "The novel's plot is full of twists and turns.",
    "She's a gifted poet who conveys emotions beautifully.",
    "The city's skyline is illuminated at night.",
    "He painted a realistic portrait of his subject.",
    "The fragrance of the ocean breeze is invigorating.",
    "She's an advocate for environmental conservation.",
    "The dance performance was captivating and dynamic.",
    "He sculpted a lifelike figure from clay.",
    "The mystery novel keeps readers guessing.",
    "She's a skilled artisan who creates unique pottery.",
    "The scent of fresh grass fills the air after rain.",
    "He's a brilliant mathematician known for his theories.",
    "The play's dialogue captures the essence of human relationships."
]

In [2]:
from sentence_transformers import SentenceTransformer
# initialize sentence transformer model
model = SentenceTransformer('bert-base-nli-mean-tokens')
# create sentence embeddings
sentence_embeddings = model.encode(sentences)
sentence_embeddings.shape

  from .autonotebook import tqdm as notebook_tqdm


(107, 768)

### IndexFlatL2

In [3]:
from IPython.display import Image

In [4]:
# get the image
Image(url="l2.png", width=500, height=500)

In [5]:
import faiss

In [6]:
d = sentence_embeddings.shape[1] #768
index_l2 = faiss.IndexFlatL2(d)  # for each sentence it will assign an index( 0,1,2,3,4)

In [7]:
index_l2.is_trained #l2 does not need to be trained, since there is no clustering process involved. (so it will return True, no trainig needed)

True

In [8]:
index_l2.add(sentence_embeddings)

In [9]:
index_l2.ntotal

107

In [10]:
k = 4
xq = model.encode(["where are the best actors in India"])

In [11]:

%%time
D, I = index_l2.search(xq, k)  # search
print(D)
print(I)

[[266.2829  281.84766 289.68085 293.21533]]
[[ 91  32 105  69]]
CPU times: total: 0 ns
Wall time: 997 µs


In [12]:
# Here we're returning indices `2`, `4`, `3`, and `0`, which returns:
[f"the similar sentences are : {sentences[i]}" for i in I[0]]

["the similar sentences are : She's a talented actress with a versatile range.",
 "the similar sentences are : He's known for his brilliant scientific discoveries.",
 "the similar sentences are : He's a brilliant mathematician known for his theories.",
 "the similar sentences are : The play's dialogue is witty and clever."]

### Partitioning The Index

In [13]:

Image(url="voronoi.png", width=500, height=500)

In [14]:
nlist = 50  # how many cells
quantizer = faiss.IndexFlatL2(d) # we are still using L2 for calculating distance
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist) # this is the orig indexing method

In [15]:
index_ivf.is_trained # we need to train , since there is clustering process involved


False

In [16]:
index_ivf.train(sentence_embeddings) # trainig the sentence embeddings(cliustering)

In [17]:
index_ivf.is_trained # training done

True

In [18]:
index_ivf.add(sentence_embeddings)
index_l2.ntotal

107

In [19]:
%%time
D, I = index_ivf.search(xq, k)  # search
print(I)

[[ 69  66  37 106]]
CPU times: total: 0 ns
Wall time: 997 µs


In [20]:
[f"the similar sentences are : {sentences[i]}" for i in I[0]]

["the similar sentences are : The play's dialogue is witty and clever.",
 "the similar sentences are : He's a master storyteller with a vivid imagination.",
 'the similar sentences are : He demonstrated a clever magic trick.',
 "the similar sentences are : The play's dialogue captures the essence of human relationships."]

In [21]:
index_ivf.nprobe = 10

In [22]:
%%time
D, I = index_ivf.search(xq, k)  # search
print(I)

[[ 91  32 105  69]]
CPU times: total: 0 ns
Wall time: 897 µs


In [23]:
[f"the similar sentences are : {sentences[i]}" for i in I[0]]

["the similar sentences are : She's a talented actress with a versatile range.",
 "the similar sentences are : He's known for his brilliant scientific discoveries.",
 "the similar sentences are : He's a brilliant mathematician known for his theories.",
 "the similar sentences are : The play's dialogue is witty and clever."]

## Quantization

In [24]:
Image(url="qunatisation_1.png", width=500, height=500)

In [25]:
Image(url="quantisation_2.jpeg", width=700, height=700)

In [26]:
m = 64  # number of centroid IDs in final compressed vectors
bits = 4 # number of bits in each centroid

# Each product quantizer has 2^nbits centroids. k-means is used to produce the centroids. 
# Here 2^4 = 16 (It should be less than total sentence embeddingds)

quantizer = faiss.IndexFlatL2(d)  # we keep the same L2 distance flat index
index_IVFPQ = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits) 

In [27]:
index_IVFPQ.is_trained 

False

In [28]:
index_IVFPQ.train(sentence_embeddings)   
index_IVFPQ.is_trained 

True

In [29]:
index_IVFPQ.add(sentence_embeddings)

In [30]:
index_IVFPQ.nprobe = 10  # align to previous IndexIVFFlat nprobe value

In [31]:
%%time
D, I = index_IVFPQ.search(xq, k)
print(I)

[[69 66 37 91]]
CPU times: total: 0 ns
Wall time: 998 µs


In [32]:
[f"the similar sentences are : {sentences[i]}" for i in I[0]]

["the similar sentences are : The play's dialogue is witty and clever.",
 "the similar sentences are : He's a master storyteller with a vivid imagination.",
 'the similar sentences are : He demonstrated a clever magic trick.',
 "the similar sentences are : She's a talented actress with a versatile range."]