# Faiss
Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter


# Data injestion
loader=TextLoader("speech.txt")
documents=loader.load()

# Data Transformation
text_splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=30)
docs=text_splitter.split_documents(documents)


In [None]:
docs

In [None]:
docs[0].page_content

In [None]:
# vector embedding
embeddings = OllamaEmbeddings(model="mxbai-embed-large")


# storing in FAISS vector db 
db=FAISS.from_documents(docs,embeddings)
db

In [None]:
### querying 

query="How does the speaker describe the desired outcome of the war?"

docs=db.similarity_search(query)
docs[0].page_content


## Retriever

- In LangChain, Retrievers are a standardized way to fetch the most relevant documents from a data store (like a VectorStore) based on a user query.

- When you use a VectorStore like FAISS, Chroma, Pinecone, etc., it stores documents as vectors (embeddings) and can perform similarity search. But LangChain has a Retriever interface, which is a more generic abstraction designed to integrate with RAG pipelines.

- By calling .as_retriever() on your vector store, you turn it into a Retriever-compatible object:

In [None]:
retriever=db.as_retriever()

docs=retriever.invoke(query)
docs[0].page_content

## Similarity Search with score
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance (Manhattan distance). Therefore, a lower score is better.

In [None]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

In [None]:
# Searching by vectors
embedding_vector=embeddings.embed_query(query)
embedding_vector

In [None]:
docs_score=db.similarity_search_by_vector(embedding_vector)
docs_score

## Saving And Loading the vector db in my local

In [None]:

db.save_local("faiss_index")

In [None]:
# Loading the saved vector db 

new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)

docs=new_db.similarity_search(query)

In [None]:
docs