## Faiss

- Facebook AI Similarity Search (Faiss) is a library for efficient similarity serach and clustering of dense vectors , It contains algorithms that search in sets of vectors of any size , upto ones that possibly do not fit in RAM . It also contains supporting code for evaluation and parameter tuning .

In [17]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS 
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter


loader = TextLoader("F:\Complete Generative AI\LANGCHAIN\speech.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter( chunk_size=100 , chunk_overlap=20)
docs = text_splitter.split_documents(documents) 

Created a chunk of size 125, which is longer than the specified 100
Created a chunk of size 126, which is longer than the specified 100
Created a chunk of size 151, which is longer than the specified 100


In [18]:
docs

[Document(metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='Hello , My name is  Ashis Kumar Mishra .\n\nI am born and brought up in Odisha .'),
 Document(metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='Currently I am in my pre final year of my Btech from NIT Rourkela .'),
 Document(metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='My habit includes coding , playing football and gaming .'),
 Document(metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='Flight Pattern is a one-act contemporary ballet performed in 30 minutes.'),
 Document(metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='[1] The music inspired the structure of the choreography, with a long and slow crescendo that transitions to a single voice.'),
 Document(metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_co

In [19]:
embeddings = OllamaEmbeddings(model = "gemma:2b" )
db = FAISS.from_documents(docs , embeddings)
db 

<langchain_community.vectorstores.faiss.FAISS at 0x1884e5bb970>

In [20]:
### Take the db and start quering 

query = "What is the name of the user ?"
docs = db.similarity_search(query)

In [21]:
docs

[Document(id='181ab7a1-ce52-461a-b870-95a99fc626fe', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='My habit includes coding , playing football and gaming .'),
 Document(id='628f4349-2eda-4fdc-9f1c-56f55fbefce2', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='[1] The music inspired the structure of the choreography, with a long and slow crescendo that transitions to a single voice.'),
 Document(id='970e71b2-1483-4f62-831c-3b9180cd933f', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='Pite felt that an emotional connection with a single story would be more impactful to the audience than many dancers on stage.'),
 Document(id='892a9303-31cd-45b9-9175-2fc2d4d31045', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='The piece begins with 36 dancers arranged in three equal rows, standing in profile to the audience and staring at a li

In [25]:
# showing the first response
docs[0].page_content

'My habit includes coding , playing football and gaming .'

##### Another way of quering is to do the retrieving it .

- We can also convert the vectorstore into a Retriever class . This allows us to easily use it in other Langchain methods , which largely work with retrievers .

- It acts like an interface whenever there is a query to the db , it will response ones it convert the db to retreive class .

In [29]:
# to convert the db to retriver
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

'My habit includes coding , playing football and gaming .'

##### Similarity Search with Score 

- Allows to return not only the document but also the distance score of the query to them .
- The returned score is L2 score (manhattan dist).
- Low score is better .

In [30]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='181ab7a1-ce52-461a-b870-95a99fc626fe', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='My habit includes coding , playing football and gaming .'),
  3144.0464),
 (Document(id='628f4349-2eda-4fdc-9f1c-56f55fbefce2', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='[1] The music inspired the structure of the choreography, with a long and slow crescendo that transitions to a single voice.'),
  3154.206),
 (Document(id='970e71b2-1483-4f62-831c-3b9180cd933f', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='Pite felt that an emotional connection with a single story would be more impactful to the audience than many dancers on stage.'),
  3289.5798),
 (Document(id='892a9303-31cd-45b9-9175-2fc2d4d31045', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='The piece begins with 36 dancers arranged in three equal rows, standing i

##### Saving into local

In [32]:
db.save_local("F:\Complete Generative AI\LANGCHAIN\_1_Langchain\_6_Vector_store\_6_Vector_storefaiss_db")

In [None]:
# How to load


# use allow_dangerous_deserialization = True    -> since pickle file is opening 
new_df = FAISS.load_local("F:\Complete Generative AI\LANGCHAIN\_1_Langchain\_6_Vector_store\_6_Vector_storefaiss_db" , embeddings , allow_dangerous_deserialization=True)
docs = new_df.similarity_search(query)

In [35]:
docs

[Document(id='181ab7a1-ce52-461a-b870-95a99fc626fe', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='My habit includes coding , playing football and gaming .'),
 Document(id='628f4349-2eda-4fdc-9f1c-56f55fbefce2', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='[1] The music inspired the structure of the choreography, with a long and slow crescendo that transitions to a single voice.'),
 Document(id='970e71b2-1483-4f62-831c-3b9180cd933f', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='Pite felt that an emotional connection with a single story would be more impactful to the audience than many dancers on stage.'),
 Document(id='892a9303-31cd-45b9-9175-2fc2d4d31045', metadata={'source': 'F:\\Complete Generative AI\\LANGCHAIN\\speech.txt'}, page_content='The piece begins with 36 dancers arranged in three equal rows, standing in profile to the audience and staring at a li

In [36]:
docs[0].page_content

'My habit includes coding , playing football and gaming .'