## Faiss

Facebook AI Similarity Search (Faiss) is a Library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It  also supports code for evaluation and Parameter Tuning.

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
# we also need to installn faiss-cpu 
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader=TextLoader("../3.2-DataIngestion/speech.txt")
documents=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=30)
docs=text_splitter.split_documents(documents)

In [3]:
# this becomes my vector store db with embeddings process i.e first step 3 done then step 4 done
embeddings=OllamaEmbeddings(model="gemma2:2b")
db=FAISS.from_documents(docs,embeddings)
db

  embeddings=OllamaEmbeddings(model="gemma2:2b")


<langchain_community.vectorstores.faiss.FAISS at 0x2baa0cf3f70>

In [4]:
## querying
query="what does the speaker describe the desired outcome of the war?"
docs=db.similarity_search(query)
docs


[Document(id='055aa60b-6893-40de-a6ad-be508bbd20db', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'),
 Document(id='6dda12a4-9c92-46c6-a94f-c891774b5ba9', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='…\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We ar

In [6]:
docs[0].page_content

'To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'

#### As a Retriever
example  wrt retriever -> what does retriever actually do.

We can also convert the VectorStore into a Retriever class. This allows us to easily use it in other Langchain methods , Which largely works with retrievers


they are like an interface , in which whenever we put any query ,its connected to vector store db . only when we convert this vector store db into retriever class. it acts like interface which will be able to retrieve the details from vector store and provide you with response.
what above line is trying to say is :
The retriever acts as an interface between our queries and the vector store database. When we convert a vector store into a retriever, it gains the ability to search the stored embeddings for relevant information and return the matching results in response to our query.

In [5]:
retriever=db.as_retriever()
docs=retriever.invoke(query)
# later on when we work with llm models at that point we cant directly use vector store db , but initially i need to convert this db into a retriever and then finally i can use it


In [9]:
docs[0].page_content

'To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'

## Simialrity Search with score 

There are some FAISS specific methods.One of them is similarity_search_with_score , which allows us to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore , a lower score is better.

In [10]:
docs_and_sscore=db.similarity_search_with_score(query)
docs_and_sscore


[(Document(id='0021e576-d4c5-4d9d-b84c-b5274b4833bd', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'),
  np.float32(8667.888)),
 (Document(id='4b311447-5063-470d-9ffb-10f76db91512', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='…\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right 

In [6]:
# cant we directly pass vectors instead of sentences 
embeddingg_vector=embeddings.embed_query(query)
embeddingg_vector


[0.9376143217086792,
 -0.12204120308160782,
 -0.8542547225952148,
 -1.3661437034606934,
 -0.937866747379303,
 0.21894784271717072,
 0.3100288212299347,
 0.8018231391906738,
 -0.18234743177890778,
 -0.31126582622528076,
 -1.6208244562149048,
 -0.6205931901931763,
 -0.5650297999382019,
 -0.19176585972309113,
 -3.580568790435791,
 -2.1618454456329346,
 0.14945949614048004,
 -0.5264394283294678,
 -2.7460548877716064,
 -0.22164607048034668,
 0.6758710741996765,
 -1.5165870189666748,
 -2.328587055206299,
 1.6552255153656006,
 -1.8229360580444336,
 -2.334533214569092,
 1.7546336650848389,
 -0.705711841583252,
 -1.842195987701416,
 2.865652322769165,
 -2.8981945514678955,
 -1.5128923654556274,
 0.725580096244812,
 0.9405959248542786,
 1.9952847957611084,
 1.7569785118103027,
 -0.7589486241340637,
 0.3781817853450775,
 2.1598408222198486,
 -3.5921390056610107,
 1.4165370464324951,
 0.9084769487380981,
 -1.3949626684188843,
 -3.43245005607605,
 -2.9455008506774902,
 -3.1341612339019775,
 -0.3969

In [7]:
docs_score=db.similarity_search_by_vector(embeddingg_vector)

In [8]:
docs_score

[Document(id='055aa60b-6893-40de-a6ad-be508bbd20db', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'),
 Document(id='6dda12a4-9c92-46c6-a94f-c891774b5ba9', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='…\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We ar

In [9]:
## Saving and loading
db.save_local('faiss_index')

In [10]:
## load this folder
new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)
docs

[Document(id='055aa60b-6893-40de-a6ad-be508bbd20db', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'),
 Document(id='6dda12a4-9c92-46c6-a94f-c891774b5ba9', metadata={'source': '../3.2-DataIngestion/speech.txt'}, page_content='…\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We ar