## FAISS

**FAISS** (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. It contains alghorithms that search in sets of vectors of any size, up to ones that possibly do NOT fit in RAM. It also contains supporting code for evaluation and parameter tunning.

To use `FAISS` we also need to install FAISS package using `pip install faiss-cpu`

In [1]:
import os
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS
from langchain.embeddings import OllamaEmbeddings
from langchain.text_splitter import CharacterTextSplitter

file_path = 'levski.txt'
assert os.path.exists(file_path)

loader = TextLoader(file_path=file_path)
raw_documets = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=30)

documents = text_splitter.split_documents(raw_documets)

Created a chunk of size 387, which is longer than the specified 200
Created a chunk of size 625, which is longer than the specified 200


Create **FAISS** database.

In [2]:
embeddings = OllamaEmbeddings(model='gemma:2b')
db = FAISS.from_documents(documents=documents, embedding=embeddings)
db

  embeddings = OllamaEmbeddings(model='gemma:2b')


<langchain_community.vectorstores.faiss.FAISS at 0x10f38f620>

#### Quering from FAISS database

In [3]:
query = 'How many trophies did the team win?'
similar = db.similarity_search(query=query, k=2)

print(f'Similarities size {len(similar)}')
for s in similar:
    print(f'Similar doc: {s}')

Similarities size 2
Similar doc: page_content='The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association.' metadata={'source': 'levski.txt'}
Similar doc: page_content='Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league system in 1937.[1] On the international stage, Levski reached the quarter-finals of the UEFA Cup twice and the quarter-finals of 

Example with respect of **Retrieval**. What a **Retriever** do?
We can also convert the vectorstore as a Retriever. This allows us to easily use it in other Langchain methods, which largely works with Retrievers. 

Retrievers are like an interface, which will be able to retrieve the details from the vector store and provide the response. 

In [4]:
retriever = db.as_retriever()
retrieved = retriever.invoke(input=query, k=2)
retrieved

[Document(id='2fb99872-921b-4009-b56b-32fc7fe55ab1', metadata={'source': 'levski.txt'}, page_content="The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association."),
 Document(id='d40e4e5f-a33d-4f07-9f4f-47606bec91a0', metadata={'source': 'levski.txt'}, page_content="Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league system in 1937.[1] On the inter

#### Similarity search with score

There are some **FAISS** specific methods. One of them is **similarity_search_with_score**, which allows you to return not only the documents but also the distance score of the query to them. The returned score is L2 distance between. **A lower score is better**.

In [6]:
print(f'Query: {query}')

docs_and_score = db.similarity_search_with_score(query=query, k=3)
for i, (doc, score) in enumerate(docs_and_score):
    print(f'Similarity {i + 1}')
    print(f'Doc: {doc}')
    print(f'Score: {score}')

Query: How many trophies did the team win?
Similarity 1
Doc: page_content='The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association.' metadata={'source': 'levski.txt'}
Score: 2621.935546875
Similarity 2
Doc: page_content='Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league system in 1937.[1] On the international stage, Levski reached the quarter-

We can also pass **vectors** (preprocessed embeddings) to FAISS and it can query based on vector.

In [7]:
query_embeddings = embeddings.embed_query(query)
print(f'Query_embeddings type: {type(query_embeddings)} with length: {len(query_embeddings)}')
docs_and_score = db.similarity_search_with_score_by_vector(embedding=query_embeddings, k=3)
for i, (doc, score) in enumerate(docs_and_score):
    print(f'Similarity {i + 1}')
    print(f'Doc: {doc}')
    print(f'Score: {score}')

Query_embeddings type: <class 'list'> with length: 2048
Similarity 1
Doc: page_content='The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association.' metadata={'source': 'levski.txt'}
Score: 2621.935546875
Similarity 2
Doc: page_content='Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league system in 1937.[1] On the international stage, Levski reached

How to `save` and `load` the **FAISS Database**.
For saving we need to provide folder_path - where files will be saved and index_name for saving with specific index name.
If the the folder_path does not exists, it will create the folder and we will have 2 files inside it:
- <index_name>.faiss 
- <index_name>.pkl

In [None]:
db.save_local(folder_path='faiss_index')

To load the DB properly we need to pass `allow_dangerous_deserialization=True` because we are reading a pkl file. With that property we are saying that we realy trust this file.

In [10]:
import os
from langchain.vectorstores import FAISS

folder_path = 'faiss_index'
assert os.path.exists(folder_path)

faiss_db = FAISS.load_local(folder_path=folder_path, embeddings=embeddings,
                            allow_dangerous_deserialization=True)

In [11]:
docs = faiss_db.similarity_search_with_score(query=query, k=3)
docs

[(Document(id='2fb99872-921b-4009-b56b-32fc7fe55ab1', metadata={'source': 'levski.txt'}, page_content="The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association."),
  np.float32(2621.9355)),
 (Document(id='d40e4e5f-a33d-4f07-9f4f-47606bec91a0', metadata={'source': 'levski.txt'}, page_content="Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league sys