# Faiss

Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

- Environment Setup: To configure API keys and enable Langsmith tracking for the project.

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS

from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

- Text Processing: Loads and splits text documents into smaller chunks with overlap for efficient processing.

In [6]:
# Load text file and split into manageable chunks with overlap

loader=TextLoader("speech.txt")
documents=loader.load()

text_splitter=CharacterTextSplitter(chunk_size=200,chunk_overlap=30)
docs=text_splitter.split_documents(documents)


- The `speech.txt` file is loaded.
- The text file is obtained in a document format (documents).
- The text is divided into chunks.
- `CharacterTextSplitter` ensures each text chunk is 1000 characters long and adds a 30-character overlap. docs represents a list consisting of these chunks.

In [7]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='The Evolution of Computer Technology'),
 Document(metadata={'source': 'speech.txt'}, page_content="The journey of computer technology from the 1940s to today represents one of humanity's \nmost remarkable achievements. Starting with ENIAC, the first general-purpose electronic\n computer that filled an entire room, computers have undergone dramatic miniaturization and \n performance improvements. The introduction of transistors in the 1950s replaced bulky vacuum tubes, \n leading to smaller, more reliable machines. The 1970s saw the emergence of personal computers, \n with companies like Apple and IBM making computing accessible to ordinary people. \n The 1990s brought the Internet revolution, connecting computers worldwide and transforming \n how we communicate and access information. Today's smartphones pack more computing power \n than the machines that guided the Apollo missions to the moon. This rapid evolution continues \n

- Vector Store Creation: Creates FAISS embeddings using Ollama and builds a vector database from text chunks for similarity search.

In [8]:
# Create embeddings and initialize FAISS vector store

embeddings=OllamaEmbeddings()
db=FAISS.from_documents(docs,embeddings)

db

  embeddings=OllamaEmbeddings()


<langchain_community.vectorstores.faiss.FAISS at 0x1e4e4288b90>

- **Embedding**: a mathematical representation of text data in vector space.   
- `OllamaEmbeddings`: An embedding class by Ollama model used to transform texts for similarity measurements.
- **FAISS**: A system that indexes and searches high-dimensional vectors quickly, allowing efficient similarity searches between text segments through their vector representations.

- Query Execution: Performs a semantic similarity search in the vector database to find relevant content about war outcome description.

In [9]:
# Search vector DB for documents matching the war outcome query and return first result

query="How does the speaker describe the desired outcome of the war?"
docs=db.similarity_search(query)
docs[0].page_content


'The Evolution of Computer Technology'

- **Similarity Search**: a search technique that finds documents or items most similar to a given query based on their vector representations (embeddings). Instead of exact keyword matching, it looks for semantic meaning and contextual similarity, making it more effective for natural language queries.

#### As a Retriever

- **Retriever**: A component in information retrieval systems that fetches relevant documents or passages from a knowledge base in response to a query. It acts as a bridge between user questions and stored information, using methods like vector similarity or keyword matching to find the most relevant content.

    - We can convert the any vectorstore into a Retriever class. 
    - This allows us to easily use it in other `CHAIN` methods, which largely work with retrievers

db.as_retriever(): that enables more flexible and easier querying operations on the FAISS database.
- Provides an interface for searching through embeddings in the FAISS database
- Makes similarity-based search processes more modular
- Adapts to different querying scenarios (e.g., semantic search)

In [5]:
retriever=db.as_retriever()
docs=retriever.invoke(query)
docs[0].page_content

'It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our heartsâ€”for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'

* **Similarity Search with Score**: A FAISS-specific method that returns both matching documents and their L2 distance scores (where lower scores indicate better matches). It extends regular similarity search by providing numerical measures of how close each document is to the query.

In [6]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

[(Document(metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our heartsâ€”for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
  15343.574),
 (Document(metadata={'source': 'speech.txt'}, page_content='â€¦\n\nIt will be all the easier for us to conduct ourselves as bellige

- Vector Creation: Converts the input query text into a numerical embedding vector representation using the embeddings model.

In [7]:
# Generate embedding vector from query text for similarity comparison

embedding_vector=embeddings.embed_query(query)
embedding_vector

[1.8891308307647705,
 -3.12833571434021,
 1.950231909751892,
 1.4647973775863647,
 -0.9355076551437378,
 0.6083784699440002,
 1.5930413007736206,
 -0.5649856925010681,
 0.8756280541419983,
 -1.9883532524108887,
 1.2988661527633667,
 -1.9822001457214355,
 -0.6751458644866943,
 1.519883155822754,
 -0.529873788356781,
 -2.000640392303467,
 -0.38055598735809326,
 0.11811791360378265,
 0.7277924418449402,
 -1.8338812589645386,
 -0.7317088842391968,
 -1.049381971359253,
 2.229776382446289,
 -1.9945803880691528,
 0.7757887840270996,
 -0.96952223777771,
 0.3436594009399414,
 -1.646192193031311,
 0.1825711578130722,
 -0.8645098209381104,
 2.1028261184692383,
 -2.8758442401885986,
 -2.6772265434265137,
 3.7390706539154053,
 2.3217194080352783,
 -5.089715003967285,
 -1.0646493434906006,
 1.3658063411712646,
 -1.2167930603027344,
 -1.403181791305542,
 -0.4614812135696411,
 -2.01977276802063,
 2.159519910812378,
 0.3468962013721466,
 0.9295225143432617,
 0.584738552570343,
 -0.2518477439880371,
 0.

- Vector-Based Search: Performs similarity search using a pre-computed embedding vector to find matching documents in the database.

In [8]:
# Search vector DB using the embedding vector to find similar documents

docs_score=db.similarity_search_by_vector(embedding_vector)
docs_score

[Document(metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our heartsâ€”for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(metadata={'source': 'speech.txt'}, page_content='â€¦\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high 

- Save and Load: Saves the vector database to a file and loads it back when needed.

In [9]:
# Save database to file and load it back with embeddings

db.save_local("faiss_index")

new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)

In [11]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our heartsâ€”for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(metadata={'source': 'speech.txt'}, page_content='â€¦\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high 

In [None]:
# END