#### Faiss
Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

FAISS is just a vector search library.

You feed it vectors.

It finds the most similar vectors efficiently.

You have to manually manage metadata.

ChromaDB is a vector database.

It stores both embeddings + associated metadata.

You can query with filtering (like SQL-like conditions).

Better for RAG pipelines where you need full control over the document store.

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader=TextLoader("larger_speech.txt")
documents=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=30)
docs=text_splitter.split_documents(documents)
docs

In [None]:
print(f"Number of chunks created: {len(docs)}")

In [None]:
embeddings=OllamaEmbeddings(model="all-minilm:22m")
db=FAISS.from_documents(docs,embeddings)
db

so after storing all my data inputs as a vector in db i.e. fasis
then if i give it a query like prompt

suppose for a text "india independence in 1947"
then i ask as "who got independce in 1947"

then this will be treated as query="who got independce in 1947"
and again query will be stored after embedding into vector in fasis and semantic search will be done

In [None]:
query="How does the speaker describe the desired outcome of the war?"
result_of=db.similarity_search(query)
result_of[0].page_content

#### Similarity Search with score
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance(manhattan distance). Therefore, a lower score is better.


In [None]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

till here we are giving queries but we can also give embedded vector of the query itself

In [None]:
embedded_query=embeddings.embed_query(query)
embedded_query

In [None]:
docs_and_score=db.similarity_search_with_score(embedded_query)
docs_and_score

What Exactly Gets Saved?

Vector index file (like index.faiss)

Contains all the embedded vectors (numerical arrays)

Stored in FAISS's binary format

Metadata file (like index.pkl or docstore.pkl)

Stores the original text chunks and their metadata (e.g. titles, sources, IDs)

Lets FAISS link the vectors back to meaningful documents

In [None]:
### Saving And Loading
db.save_local("faiss_index")

loading the faiss_index

In [None]:
new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)

In [None]:
docs

#### As a Retriever
We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers

k is nearest neighbours


In [None]:
retriever=db.as_retriever(search_kwargs={"k": 1})
docs=retriever.invoke(query)
docs[0].page_content

In [None]:
retriever=db.as_retriever()
docs=retriever.invoke(query)
for doc in docs:
    print("\n the output is \n")
    print(doc.page_content)