**Facebook AI Similarity Search (FAISS)**

Faiss is a library for efficient similarity search and clustering of `dense vectors`. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. \
It also contains supporting code for evaluation and parameter tuning. \
It solves limitations of traditional query search engines that are optimized for hash-based searches, and provides more scalable similarity search functions.


**Efficient similarity search** \
With Faiss, developers can search multimedia documents in ways that are inefficient or impossible with standard database engines (SQL). It includes nearest-neighbor search implementations for million-to-billion-scale datasets that optimize the memory-speed-accuracy tradeoff. Faiss aims to offer state-of-the-art performance for all operating points.

Faiss contains algorithms that search in sets of vectors of any size, and also contains supporting code for evaluation and parameter tuning. Some if its most useful algorithms are implemented on the GPU. Faiss is implemented in C++, with an optional Python interface and GPU support via CUDA.

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [7]:
loader=TextLoader("speech.txt")
documents=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=200,chunk_overlap=30)
docs=text_splitter.split_documents(documents)

In [8]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='My fellow dreamers and doers,'),
 Document(metadata={'source': 'speech.txt'}, page_content='Today marks not just another day, but a new beginning. Each of us carries within ourselves the power to create change, to inspire others, and to make a difference in this world.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Remember that success is not measured by the heights we reach, but by the obstacles we overcome. Every setback is a setup for a comeback.'),
 Document(metadata={'source': 'speech.txt'}, page_content="In this rapidly changing world, it's not the strongest or the smartest who survive, but those most adaptable to change. Embrace uncertainty as an opportunity for growth."),
 Document(metadata={'source': 'speech.txt'}, page_content='Your potential is limited only by your imagination and your willingness to work towards your goals. The future belongs to those who believe in the beauty of their dreams.'),
 Do

In [12]:
embeddings=OllamaEmbeddings(model="gemma:2b")
db=FAISS.from_documents(docs,embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x1d3ab2f6860>

Query

In [14]:
query="What is the success without compassion?" #query based on the speech.txt docs

res=db.similarity_search(query)

res[0].page_content


'Take action today. Small steps forward are better than grand plans that never begin.\n\nBe kind to others along the way, for success without compassion is an empty victory.'

---
**As a Retriever** \
We can also convert the vector store into a Retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers.

In [16]:
retriever=db.as_retriever()
res=retriever.invoke(query)
res[0].page_content

'Take action today. Small steps forward are better than grand plans that never begin.\n\nBe kind to others along the way, for success without compassion is an empty victory.'

**Similarity Search with score** \
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows us to return not only the documents but also the distance score of the query to them. \
The returned distance score is L2 distance. Therefore, a lower score is better.

In [17]:
docs_and_scores=db.similarity_search_with_score(query)
docs_and_scores

[(Document(id='31ffe5ef-4fa8-4253-88ec-3b563911e16a', metadata={'source': 'speech.txt'}, page_content='Take action today. Small steps forward are better than grand plans that never begin.\n\nBe kind to others along the way, for success without compassion is an empty victory.'),
  1870.0876),
 (Document(id='6315a1ac-8d05-4e62-9823-e78efc3b8c3f', metadata={'source': 'speech.txt'}, page_content='Together, we can turn our dreams into reality. The time is now. Thank you.'),
  2389.4268),
 (Document(id='6e6c53b4-871b-40ff-9943-a90aadd4d1cb', metadata={'source': 'speech.txt'}, page_content='Your potential is limited only by your imagination and your willingness to work towards your goals. The future belongs to those who believe in the beauty of their dreams.'),
  2402.359),
 (Document(id='fbe2cb3b-60e9-4c76-810d-3ca3c672cd23', metadata={'source': 'speech.txt'}, page_content="Don't be afraid to fail. In fact, fail forward. Learn from every mistake and let it fuel your determination to succeed.

In [18]:
docs_and_scores[0]

(Document(id='31ffe5ef-4fa8-4253-88ec-3b563911e16a', metadata={'source': 'speech.txt'}, page_content='Take action today. Small steps forward are better than grand plans that never begin.\n\nBe kind to others along the way, for success without compassion is an empty victory.'),
 1870.0876)

In [19]:
embedding_vector=embeddings.embed_query(query)
embedding_vector

[-0.23178254067897797,
 -2.6375861167907715,
 0.6160933375358582,
 1.0583949089050293,
 0.8558192253112793,
 0.8482413291931152,
 1.194616436958313,
 0.08611047267913818,
 0.5526854395866394,
 0.531623363494873,
 1.8470509052276611,
 2.573713541030884,
 0.3344305753707886,
 2.538249969482422,
 -0.9150782227516174,
 0.4860154986381531,
 3.7445554733276367,
 -0.6673808693885803,
 0.9533656239509583,
 0.29377421736717224,
 0.4033169150352478,
 -0.04038746654987335,
 0.14983925223350525,
 -0.3993648588657379,
 -1.988440990447998,
 0.9158721566200256,
 0.6829423904418945,
 -0.42072904109954834,
 0.14759418368339539,
 -0.520768404006958,
 0.17809851467609406,
 -0.7001262307167053,
 -1.4684470891952515,
 -1.3995611667633057,
 -0.3809143006801605,
 1.3422046899795532,
 0.7802356481552124,
 2.1357758045196533,
 0.7381717562675476,
 -2.1459922790527344,
 -0.526160717010498,
 1.245444893836975,
 0.0446544848382473,
 0.4708486497402191,
 -0.025200923904776573,
 0.7368001341819763,
 0.1911456882953

In [20]:
docs_and_score=db.similarity_search_by_vector(embedding_vector)
docs_and_score

[Document(id='31ffe5ef-4fa8-4253-88ec-3b563911e16a', metadata={'source': 'speech.txt'}, page_content='Take action today. Small steps forward are better than grand plans that never begin.\n\nBe kind to others along the way, for success without compassion is an empty victory.'),
 Document(id='6315a1ac-8d05-4e62-9823-e78efc3b8c3f', metadata={'source': 'speech.txt'}, page_content='Together, we can turn our dreams into reality. The time is now. Thank you.'),
 Document(id='6e6c53b4-871b-40ff-9943-a90aadd4d1cb', metadata={'source': 'speech.txt'}, page_content='Your potential is limited only by your imagination and your willingness to work towards your goals. The future belongs to those who believe in the beauty of their dreams.'),
 Document(id='fbe2cb3b-60e9-4c76-810d-3ca3c672cd23', metadata={'source': 'speech.txt'}, page_content="Don't be afraid to fail. In fact, fail forward. Learn from every mistake and let it fuel your determination to succeed.")]

**Saving and Loading Vector Store**

In [21]:
db.save_local('faiss_index')

In [25]:
# loading the faiss file

db_new=FAISS.load_local('faiss_index',embeddings,allow_dangerous_deserialization=True)
res=db_new.similarity_search(query)
res[0].page_content

'Take action today. Small steps forward are better than grand plans that never begin.\n\nBe kind to others along the way, for success without compassion is an empty victory.'