### FAISS

###### Facebook AI Similarity Search is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. 

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loaders = TextLoader("speech.txt")
documents = loaders.load()
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=30)
docs = text_splitter.split_documents(documents)

Created a chunk of size 354, which is longer than the specified 200
Created a chunk of size 846, which is longer than the specified 200
Created a chunk of size 769, which is longer than the specified 200
Created a chunk of size 987, which is longer than the specified 200


In [6]:
embeddings = OllamaEmbeddings(model="llama3")
db = FAISS.from_documents(documents, embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x11d5f1d50>

In [7]:
query = "What are the two aspects of the obesity spread?"
query_result = db.similarity_search(query)
docs[0].page_content

'Ladies and Gentlemen,\nIt is a pleasure for me to be here tonight and address such a great audience. The issue I would like to bring up threatens the prosperity and welfare of the whole nation, however, the majority of the population tends to ignore it and pretend as if it is not a problem at all. Namely, I would like to talk about the risks of obesity.'

### As a Retriever
###### We can also convert the vector store into Retriever class. This allows us to easily use it in other Langchain methods, which largely work with retrievers

In [9]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

'Ladies and Gentlemen,\nIt is a pleasure for me to be here tonight and address such a great audience. The issue I would like to bring up threatens the prosperity and welfare of the whole nation, however, the majority of the population tends to ignore it and pretend as if it is not a problem at all. Namely, I would like to talk about the risks of obesity.\n\nFirst of all, it would be reasonable to present the statistics that some of you might find shocking. To be more precise, in accordance with the data provided by the Office of Disease Prevention and Health Promotion, the number of people suffering from obesity has already reached the point of 35% of the whole US population. Just imagine, one-third of Americans is particularly limited in their opportunities to have a happy, comprehensive, and productive life. Let me remind you, that the right to health was considered by the United Nations to be an integral element of the human rights and the individual dignity. I believe, that it is a

### Similarity Search with Score

###### There are some FAISS specific methods. One of them is similarity_search_with_score, whcih allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [10]:
docs_and_sscore = db.similarity_search_with_score(query)
docs_and_sscore

[(Document(id='f42c2534-38fc-4938-a1cf-dc0d7425bd2a', metadata={'source': 'speech.txt'}, page_content='Ladies and Gentlemen,\nIt is a pleasure for me to be here tonight and address such a great audience. The issue I would like to bring up threatens the prosperity and welfare of the whole nation, however, the majority of the population tends to ignore it and pretend as if it is not a problem at all. Namely, I would like to talk about the risks of obesity.\n\nFirst of all, it would be reasonable to present the statistics that some of you might find shocking. To be more precise, in accordance with the data provided by the Office of Disease Prevention and Health Promotion, the number of people suffering from obesity has already reached the point of 35% of the whole US population. Just imagine, one-third of Americans is particularly limited in their opportunities to have a happy, comprehensive, and productive life. Let me remind you, that the right to health was considered by the United Nat

In [11]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[-0.012635677,
 -0.028165447,
 0.015488038,
 -0.0011604525,
 -0.005853158,
 -0.0037467843,
 0.016192531,
 0.006406507,
 -0.014164815,
 0.016458858,
 -0.003611605,
 0.008211664,
 0.0053469795,
 0.009958125,
 -0.006991574,
 0.005824633,
 -0.019397713,
 -0.003137371,
 -0.008254454,
 0.009901515,
 -0.0116163185,
 -0.017993677,
 0.009110108,
 -0.0017268407,
 -0.0029502735,
 -0.0011599376,
 0.0012439205,
 -0.018467141,
 0.0009940858,
 -0.009771563,
 -0.013158355,
 0.005537999,
 0.024733156,
 0.014035284,
 0.04947813,
 -0.0058453684,
 -0.021294897,
 0.0068944623,
 0.021976601,
 -0.005581493,
 0.019512232,
 0.0014675823,
 -0.013083652,
 -0.006524865,
 0.007007521,
 0.00095341075,
 0.009859578,
 0.022475502,
 0.007522487,
 -0.028692748,
 -0.0018522542,
 0.00088071823,
 0.0027736041,
 -0.010311599,
 -0.007939598,
 -0.006895187,
 -0.010744506,
 0.00514411,
 0.014864763,
 0.004213616,
 -0.023703286,
 -0.005336103,
 -0.009621102,
 0.01211455,
 0.016109375,
 0.012034841,
 -0.006563993,
 -0.012016785

In [12]:
query_result_vector = db.similarity_search_by_vector(embedding_vector)
query_result_vector

[Document(id='f42c2534-38fc-4938-a1cf-dc0d7425bd2a', metadata={'source': 'speech.txt'}, page_content='Ladies and Gentlemen,\nIt is a pleasure for me to be here tonight and address such a great audience. The issue I would like to bring up threatens the prosperity and welfare of the whole nation, however, the majority of the population tends to ignore it and pretend as if it is not a problem at all. Namely, I would like to talk about the risks of obesity.\n\nFirst of all, it would be reasonable to present the statistics that some of you might find shocking. To be more precise, in accordance with the data provided by the Office of Disease Prevention and Health Promotion, the number of people suffering from obesity has already reached the point of 35% of the whole US population. Just imagine, one-third of Americans is particularly limited in their opportunities to have a happy, comprehensive, and productive life. Let me remind you, that the right to health was considered by the United Nati

In [13]:
### Saving and Loading
db.save_local("faiss_index")

In [17]:
new_df = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
docs = new_df.similarity_search(query)
docs

[Document(id='f42c2534-38fc-4938-a1cf-dc0d7425bd2a', metadata={'source': 'speech.txt'}, page_content='Ladies and Gentlemen,\nIt is a pleasure for me to be here tonight and address such a great audience. The issue I would like to bring up threatens the prosperity and welfare of the whole nation, however, the majority of the population tends to ignore it and pretend as if it is not a problem at all. Namely, I would like to talk about the risks of obesity.\n\nFirst of all, it would be reasonable to present the statistics that some of you might find shocking. To be more precise, in accordance with the data provided by the Office of Disease Prevention and Health Promotion, the number of people suffering from obesity has already reached the point of 35% of the whole US population. Just imagine, one-third of Americans is particularly limited in their opportunities to have a happy, comprehensive, and productive life. Let me remind you, that the right to health was considered by the United Nati