#### Faiss

Facebook AI Similarity seach (faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader= TextLoader("speech.txt")
documents=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=900,chunk_overlap=50)
docs=text_splitter.split_documents(documents)

In [4]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Since\n1929\nThe DS Group (Dharampal Satyapal Group) is a Multi-Business Corporation and one of the leading FMCG (Fast Moving Consumer Goods) conglomerate with strong Indian and International presence. Founded in the year 1929, it is an inspiring and successful business story that blends a remarkable history and legacy with visionary growth. The Group continues to build successful brands across diverse business categories to delight the consumers, while the undeterred pursuit for ‘Quality & Innovation’ ensures consumer loyalty and satisfaction.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Guided by a clear set of values and built on a strong foundation of collective good, Corporate Social Responsibility is an integral part of the Group’s business objectives. The Group works on projects across the country to achieve this objective with a focus on critical areas such as Water, Livelihood and Education. These init

In [5]:
embeddings=OllamaEmbeddings(model="gemma:2b")
db=FAISS.from_documents(docs,embeddings)
db

  embeddings=OllamaEmbeddings(model="gemma:2b")


<langchain_community.vectorstores.faiss.FAISS at 0x26fae0c9be0>

In [7]:
# querying
query="when was DS established?"

dos=db.similarity_search(query)
docs  #this is entire text which is matching
# docs[0].page_content # most similar ans

[Document(metadata={'source': 'speech.txt'}, page_content='Since\n1929\nThe DS Group (Dharampal Satyapal Group) is a Multi-Business Corporation and one of the leading FMCG (Fast Moving Consumer Goods) conglomerate with strong Indian and International presence. Founded in the year 1929, it is an inspiring and successful business story that blends a remarkable history and legacy with visionary growth. The Group continues to build successful brands across diverse business categories to delight the consumers, while the undeterred pursuit for ‘Quality & Innovation’ ensures consumer loyalty and satisfaction.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Guided by a clear set of values and built on a strong foundation of collective good, Corporate Social Responsibility is an integral part of the Group’s business objectives. The Group works on projects across the country to achieve this objective with a focus on critical areas such as Water, Livelihood and Education. These init

#### As a Retriever

We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other Langchain methods, which largely work with retrievers

In [11]:
retriever=db.as_retriever()
docs=retriever.invoke(query)
docs[0].page_content

'Since\n1929\nThe DS Group (Dharampal Satyapal Group) is a Multi-Business Corporation and one of the leading FMCG (Fast Moving Consumer Goods) conglomerate with strong Indian and International presence. Founded in the year 1929, it is an inspiring and successful business story that blends a remarkable history and legacy with visionary growth. The Group continues to build successful brands across diverse business categories to delight the consumers, while the undeterred pursuit for ‘Quality & Innovation’ ensures consumer loyalty and satisfaction.'

#### Similarity Search with score

There are some FAISS specific methods. One og them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [12]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

[(Document(id='26409fc8-31e7-4bd2-91e2-5dd2a39ba768', metadata={'source': 'speech.txt'}, page_content='Since\n1929\nThe DS Group (Dharampal Satyapal Group) is a Multi-Business Corporation and one of the leading FMCG (Fast Moving Consumer Goods) conglomerate with strong Indian and International presence. Founded in the year 1929, it is an inspiring and successful business story that blends a remarkable history and legacy with visionary growth. The Group continues to build successful brands across diverse business categories to delight the consumers, while the undeterred pursuit for ‘Quality & Innovation’ ensures consumer loyalty and satisfaction.'),
  np.float32(2455.5225)),
 (Document(id='e64eed9b-067b-477b-b9b9-f1557dc3c9f2', metadata={'source': 'speech.txt'}, page_content='Guided by a clear set of values and built on a strong foundation of collective good, Corporate Social Responsibility is an integral part of the Group’s business objectives. The Group works on projects across the co

In [13]:
embedding_vector=embeddings.embed_query(query)
embedding_vector

[-0.559970498085022,
 -0.8984678983688354,
 0.572769284248352,
 2.104444742202759,
 1.2855085134506226,
 0.9743225574493408,
 1.7525044679641724,
 -0.38211220502853394,
 1.8528276681900024,
 0.8869463205337524,
 0.985924482345581,
 0.3194546401500702,
 -1.2061659097671509,
 1.7872241735458374,
 -0.04484488442540169,
 -1.1395517587661743,
 6.6700286865234375,
 -0.02262246608734131,
 -1.1473398208618164,
 -0.5306224226951599,
 0.7917165160179138,
 -1.2874032258987427,
 0.1970522403717041,
 -0.33299514651298523,
 -3.4729883670806885,
 1.0169637203216553,
 0.026469940319657326,
 -0.10522358864545822,
 -0.3304177224636078,
 -2.8201184272766113,
 1.8805166482925415,
 -0.9607244729995728,
 0.7333166003227234,
 0.7166591286659241,
 0.32744133472442627,
 0.2726287841796875,
 2.06732177734375,
 -0.7054339647293091,
 0.3202381432056427,
 0.024577122181653976,
 0.12448734790086746,
 0.928091824054718,
 0.15450213849544525,
 -0.5705435872077942,
 -1.007055401802063,
 -0.15383630990982056,
 1.085118

In [14]:
docs_score=db.similarity_search_by_vector(embedding_vector)
docs_score

[Document(id='26409fc8-31e7-4bd2-91e2-5dd2a39ba768', metadata={'source': 'speech.txt'}, page_content='Since\n1929\nThe DS Group (Dharampal Satyapal Group) is a Multi-Business Corporation and one of the leading FMCG (Fast Moving Consumer Goods) conglomerate with strong Indian and International presence. Founded in the year 1929, it is an inspiring and successful business story that blends a remarkable history and legacy with visionary growth. The Group continues to build successful brands across diverse business categories to delight the consumers, while the undeterred pursuit for ‘Quality & Innovation’ ensures consumer loyalty and satisfaction.'),
 Document(id='e64eed9b-067b-477b-b9b9-f1557dc3c9f2', metadata={'source': 'speech.txt'}, page_content='Guided by a clear set of values and built on a strong foundation of collective good, Corporate Social Responsibility is an integral part of the Group’s business objectives. The Group works on projects across the country to achieve this object

#### Saving and loading


In [15]:
db.save_local("faiss_index")

In [19]:
new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)
docs

[Document(id='26409fc8-31e7-4bd2-91e2-5dd2a39ba768', metadata={'source': 'speech.txt'}, page_content='Since\n1929\nThe DS Group (Dharampal Satyapal Group) is a Multi-Business Corporation and one of the leading FMCG (Fast Moving Consumer Goods) conglomerate with strong Indian and International presence. Founded in the year 1929, it is an inspiring and successful business story that blends a remarkable history and legacy with visionary growth. The Group continues to build successful brands across diverse business categories to delight the consumers, while the undeterred pursuit for ‘Quality & Innovation’ ensures consumer loyalty and satisfaction.'),
 Document(id='e64eed9b-067b-477b-b9b9-f1557dc3c9f2', metadata={'source': 'speech.txt'}, page_content='Guided by a clear set of values and built on a strong foundation of collective good, Corporate Social Responsibility is an integral part of the Group’s business objectives. The Group works on projects across the country to achieve this object