### FAISS
Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [8]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [9]:
loader = TextLoader("Data/speech.txt")
docs = loader.load()
docs

[Document(metadata={'source': 'Data/speech.txt'}, page_content='I have three visions for India. In 3000 years of our history people from all over the world have come and invaded us, captured our lands, conquered our minds. From Alexander onwards the Greeks, the Turks, the Moguls, the Portuguese, the British, the French, the Dutch, all of them came and looted us, took over what was ours. Yet we have not done this to any other nation. We have not conquered anyone. We have not grabbed their land, their culture and their history and tried to enforce our way of life on them. Why? Because we respect the freedom of others. That is why my FIRST VISION is that of FREEDOM. I believe that India got its first vision of this in 1857, when we started the war of Independence. It is this freedom that we must protect and nurture and build on. If we are not free, no one will respect us.\n\nWe have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally r

In [38]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=30)
docs = text_splitter.split_documents(docs)
docs

[Document(metadata={'source': 'Data/speech.txt'}, page_content='We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isnâ€™t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.'),
 Document(metadata={'source': 'Data/speech.txt'}, page_content='I have a THIRD VISION. India must stand up to the world. Because I believe that unless India stands up to the world, no one will respect us. Only strength respects strength. We must be strong not only as a military power but also as an economic power. Both must go hand-in-hand. My good fortune was to have worked with three great minds. Dr.Vikram Sarabhai, of the Dept. of Space, Professor Satish Dhawan, 

In [40]:
embeddings = OllamaEmbeddings(model="llama3.2:1b")
db = FAISS.from_documents(docs, embedding=embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x1c4a136d090>

In [41]:
# Querying
query = "What is the percent growth rate of most areas in India?"

In [42]:
docs = db.similarity_search(query=query)
docs[0].page_content

'We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isnâ€™t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.'

#### As a Retriever
We can also convert the vectorstore into a Retriever class.
This allows us to easily use it in other LangChain methods, which largely work with retrievers

In [43]:
retriever = db.as_retriever()
retriever.invoke(query)[0].page_content

'We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isnâ€™t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.'

#### Similarity Search with score
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [44]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='ad3c9282-d13d-4747-a1ef-f55e008121e3', metadata={'source': 'Data/speech.txt'}, page_content='We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isnâ€™t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.'),
  np.float32(0.66169584)),
 (Document(id='6750ec65-2699-473e-97ff-29e6586fd1f3', metadata={'source': 'Data/speech.txt'}, page_content='I have a THIRD VISION. India must stand up to the world. Because I believe that unless India stands up to the world, no one will respect us. Only strength respects strength. We must be strong not only as a military power but also as an economic power. Both must go hand-in-hand. My good fort

In [45]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[-0.016236564,
 0.03122604,
 0.01090053,
 0.015202093,
 0.033741836,
 -0.019103503,
 -0.02089632,
 0.009899559,
 0.008100098,
 0.015645415,
 -0.023402244,
 0.0065451884,
 -0.020337947,
 0.028410321,
 0.013101794,
 -0.008336578,
 -0.023006724,
 0.011764014,
 0.054688957,
 -0.039071504,
 0.01667628,
 0.016812272,
 0.0066339877,
 -0.0002748688,
 0.014796923,
 0.016695423,
 -0.023428563,
 -0.0027333884,
 -0.013570116,
 0.04655851,
 0.0006177462,
 0.009508685,
 -0.01959083,
 0.029811198,
 -0.0048910305,
 0.02437052,
 0.04953435,
 -0.0058258665,
 -0.0055766534,
 -0.008496956,
 0.035867106,
 -0.0045961645,
 -0.01687747,
 0.02067729,
 -0.0023116756,
 -0.011956971,
 0.0114129735,
 0.025709487,
 0.014324798,
 -0.018988093,
 -0.025044847,
 -0.026350096,
 0.03325039,
 -0.012568434,
 -0.016764302,
 -0.020362927,
 -0.007680399,
 0.010487346,
 0.00024757654,
 0.010879031,
 0.007221952,
 0.0016676387,
 -0.03246599,
 -0.03172284,
 0.029983226,
 -0.0023829618,
 -0.0005446886,
 0.01834196,
 0.026896106,


In [None]:
len(embedding_vector) # Llama 3.2 creates a vector of 2048 dimension

2048

In [49]:
docs_and_score = db.similarity_search_by_vector(embedding_vector)
docs_and_score[0].page_content

'We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isnâ€™t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.'

#### Saving and Loading from FAISS DB

In [50]:
db.save_local("faiss_index")

In [54]:
new_db = db.load_local("faiss_index", embeddings=embeddings, allow_dangerous_deserialization=True)
new_db

<langchain_community.vectorstores.faiss.FAISS at 0x1c4a136e0d0>

In [60]:
docs = new_db.similarity_search(query, k=1)
docs[0].page_content

'We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isnâ€™t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.'