# FAISS
- Facebook AI Similarity Search (FAISS) is a library for efficient similarity search and clustering of dense vectors.
- It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM.
- It also contains supporting code for evaluation and parameter tuning.

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader= TextLoader("speech.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

In [3]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Some operating systems, such as Multics, Unix-like systems, \nCP/M, DOS, the classic Mac OS, and Windows, \nstore text files as a

In [5]:
embeddings = OllamaEmbeddings(model="llama3.2")
db = FAISS.from_documents(docs, embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x7fb188d3e470>

In [6]:
query = "What is the old alternative name for computer file?"
docs = db.similarity_search(query)
docs

[Document(metadata={'source': 'speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Some operating systems, such as Multics, Unix-like systems, \nCP/M, DOS, the classic Mac OS, and Windows, \nstore text files as a

In [7]:
docs[0].page_content

'A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'

# As a Retriever
- We can also convert the vectorstore into a Retriever class.
- This allows us to easily use it in other Langchain methods, which largely work with retrievers.

In [9]:
retriever = db.as_retriever() # Converting vectorstore db into a retriever
docs = retriever.invoke(query)
docs[0].page_content

'A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'

# Similarity Search with Score
- There are some FAISS specific methods. One of them is ``similarity_search_with_score``.
- It allows you to return not only the documents but also the distance score of the query to them.
- The returned distance score is L2 distance (Eucledian Distance).
- Therefore, a lower score means a higher similarity.   

In [10]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(metadata={'source': 'speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'),
  7585.451),
 (Document(metadata={'source': 'speech.txt'}, page_content='Some operating systems, such as Multics, Unix-like systems, \nCP/M, DOS, the classic Mac OS, and Windows, \nstore 

In [11]:
embeddings_vector = embeddings.embed_query(query)
embeddings_vector

[0.18007433414459229,
 0.08610104024410248,
 -1.681621789932251,
 -1.592403531074524,
 0.9804269075393677,
 1.3029626607894897,
 1.622125506401062,
 0.25360023975372314,
 -0.5313998460769653,
 -0.29496023058891296,
 0.2586638629436493,
 -0.2890552282333374,
 -0.3935975432395935,
 0.5047309994697571,
 1.1270891427993774,
 0.20069348812103271,
 2.3066675662994385,
 -0.34819430112838745,
 0.6074210405349731,
 -1.078925371170044,
 0.11416806280612946,
 -0.7263192534446716,
 0.49775224924087524,
 -0.01644899882376194,
 2.7088913917541504,
 -1.3185999393463135,
 -0.7186715006828308,
 0.7268364429473877,
 -0.13296543061733246,
 -0.016448918730020523,
 -0.444353312253952,
 0.2696534991264343,
 -1.21358323097229,
 1.374776005744934,
 -1.5850096940994263,
 0.7148683667182922,
 1.0685573816299438,
 -1.11404287815094,
 -1.335949420928955,
 1.4385600090026855,
 -0.26614174246788025,
 -0.2702455222606659,
 -0.10809791833162308,
 -2.4245378971099854,
 -2.0349626541137695,
 -0.5696098804473877,
 2.961

In [12]:
docs_score = db.similarity_search_by_vector(embeddings_vector)
docs_score

[Document(metadata={'source': 'speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Some operating systems, such as Multics, Unix-like systems, \nCP/M, DOS, the classic Mac OS, and Windows, \nstore text files as a

In [13]:
# Save Vector store Locally
db.save_local("faiss_index")


In [15]:
# Load the folder
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

In [16]:
docs = new_db.similarity_search(query)
docs

[Document(metadata={'source': 'speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Some operating systems, such as Multics, Unix-like systems, \nCP/M, DOS, the classic Mac OS, and Windows, \nstore text files as a