FAISS (Facebook AI Similarity Search)
FAISS is a library developed by Facebook AI that enables efficient similarity search and clustering of dense vectors. It is optimized for searching large datasets and can handle datasets that do not fit in RAM.

Installation
To install the required libraries, use the following commands:

In [1]:
!pip install langchain-community langchain-core langchain -q
!pip install faiss-cpu -q
!pip install langchain-huggingface -q



[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m65.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m41.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [7]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

### Loading and Splitting Documents
FAISS works with vectorized text data. Below, we load a text document (speech.txt), split it into chunks, and prepare it for vector embedding

In [8]:

loader=TextLoader("speech.txt")
documents=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=30)
docs=text_splitter.split_documents(documents)


In [9]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='As mental institutions and sanatoriums began to crumble away, patients living with mental illness that relied on the system for decades were forced out into the world with only two options: sink or swim.\n\nOver the last few decades, the disappearance of long-term care facilities and psychiatric beds has escalated. This trend toward the deinstitutionalization of psychiatric patients began in the early 20th century and is the process of replacing long-stay psychiatric hospitals with less isolated community mental health services. As facilities closed one by one, patients were emptied out and left to their own devices.'),
 Document(metadata={'source': 'speech.txt'}, page_content="RELATED\nAbandoned Hospitals: Uncovering the Forgotten Past\nMost abandoned hospitals we see today were mental or ps . . .\nBorderline Personality Disorder\nThis course discusses best practices for assessing, dia . . .\nCare of the Patient with Intellect

### Creating FAISS Index
We embed the documents using OllamaEmbeddings and store them in a FAISS vector database.

In [10]:
# Hugging Face ka model load karna
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# FAISS database bana kar documents store karna
db = FAISS.from_documents(docs, embeddings)

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Querying the FAISS Index
To search for similar documents based on a query:

In [11]:
### querying
query="How does the speaker describe the desired outcome of the war?"
docs=db.similarity_search(query)
docs[0].page_content


'After the deinstitutionalization process took effect, these changes made it harder for people living with SMI to find appropriate care and shelter, resulting in many becoming homeless or ending up committing crimes and ultimately facing incarceration.\n\nmental_illness_homelessPhoto by: Alex Proimos\n \n\nPrior to the 1950’s, it wasn’t uncommon for state hospitals to provide patients with work. Many state hospitals provided therapy, medication, medical treatment, work and vocational training. Often times, hospitals fulfilled their own needs by having patients help with maintenance or discovering a trade by working on a farm.\n\nAs patients began getting discharged out into the community, a serious downfall to the mental health system was when patients began “falling through the cracks” and became homeless or re-hospitalized, which has made the emergency room (ER) at many hospitals a problem.'

### As a Retriever
We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers

In [12]:
retriever=db.as_retriever()
docs=retriever.invoke(query)
docs[0].page_content

'After the deinstitutionalization process took effect, these changes made it harder for people living with SMI to find appropriate care and shelter, resulting in many becoming homeless or ending up committing crimes and ultimately facing incarceration.\n\nmental_illness_homelessPhoto by: Alex Proimos\n \n\nPrior to the 1950’s, it wasn’t uncommon for state hospitals to provide patients with work. Many state hospitals provided therapy, medication, medical treatment, work and vocational training. Often times, hospitals fulfilled their own needs by having patients help with maintenance or discovering a trade by working on a farm.\n\nAs patients began getting discharged out into the community, a serious downfall to the mental health system was when patients began “falling through the cracks” and became homeless or re-hospitalized, which has made the emergency room (ER) at many hospitals a problem.'

#### Similarity Search with score
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [13]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

[(Document(id='c84710e6-710b-455a-ae19-c18951d9a69f', metadata={'source': 'speech.txt'}, page_content='After the deinstitutionalization process took effect, these changes made it harder for people living with SMI to find appropriate care and shelter, resulting in many becoming homeless or ending up committing crimes and ultimately facing incarceration.\n\nmental_illness_homelessPhoto by: Alex Proimos\n \n\nPrior to the 1950’s, it wasn’t uncommon for state hospitals to provide patients with work. Many state hospitals provided therapy, medication, medical treatment, work and vocational training. Often times, hospitals fulfilled their own needs by having patients help with maintenance or discovering a trade by working on a farm.\n\nAs patients began getting discharged out into the community, a serious downfall to the mental health system was when patients began “falling through the cracks” and became homeless or re-hospitalized, which has made the emergency room (ER) at many hospitals a p

## Query Embedding
To generate an embedding vector for a query and perform a similarity search:

In [14]:
embedding_vector=embeddings.embed_query(query)
embedding_vector

[-0.049689602106809616,
 0.10655161738395691,
 0.0015431138454005122,
 -0.013194361701607704,
 -0.04438783973455429,
 0.058573197573423386,
 0.04666464403271675,
 -0.02453065849840641,
 0.004829442594200373,
 0.0203399695456028,
 -0.03410932049155235,
 0.038737885653972626,
 0.016618888825178146,
 -0.01883487030863762,
 -0.007143064402043819,
 0.006980112753808498,
 0.02085004933178425,
 -0.03520537540316582,
 0.004866411909461021,
 -0.007269794121384621,
 0.07890867441892624,
 0.05607708543539047,
 0.06984829902648926,
 0.01655665412545204,
 0.01339455135166645,
 -0.012157353572547436,
 -0.02082832157611847,
 0.052774861454963684,
 -0.02038348838686943,
 0.026147618889808655,
 0.045908551663160324,
 -0.017534712329506874,
 0.016683530062437057,
 0.0021405566949397326,
 0.006522982381284237,
 0.0013917487813159823,
 -0.05417175590991974,
 -0.016581455245614052,
 -0.011631522327661514,
 -0.020481323823332787,
 -0.0529186986386776,
 -0.039483968168497086,
 0.01654800772666931,
 0.0405348

In [None]:
docs_score=db.similarity_search_by_vector(embedding_vector)
docs_score

[Document(id='b1b7e688-310b-4b2e-923f-be90d25e0193', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(id='b74206ee-d469-4c2f-9e2e-4a91133e4854', metadata={'source': 'speech.txt'}, page_content='…\n

### Saving and Loading FAISS Index
To save the FAISS index locally:

In [16]:
### Saving And Loading
db.save_local("faiss_index")

### To reload the FAISS index later:

In [17]:
new_db=FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)

In [18]:
docs

[Document(id='c84710e6-710b-455a-ae19-c18951d9a69f', metadata={'source': 'speech.txt'}, page_content='After the deinstitutionalization process took effect, these changes made it harder for people living with SMI to find appropriate care and shelter, resulting in many becoming homeless or ending up committing crimes and ultimately facing incarceration.\n\nmental_illness_homelessPhoto by: Alex Proimos\n \n\nPrior to the 1950’s, it wasn’t uncommon for state hospitals to provide patients with work. Many state hospitals provided therapy, medication, medical treatment, work and vocational training. Often times, hospitals fulfilled their own needs by having patients help with maintenance or discovering a trade by working on a farm.\n\nAs patients began getting discharged out into the community, a serious downfall to the mental health system was when patients began “falling through the cracks” and became homeless or re-hospitalized, which has made the emergency room (ER) at many hospitals a pr

******************