# Vector Stores

## FAISS

Facebook AI Similarity Search (FAISS) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_ollama import OllamaEmbeddings

# Loading the document
docs = TextLoader('speech.txt').load()

# Splitting the document
splitted_docs = CharacterTextSplitter(chunk_size=500, chunk_overlap=50).split_documents(docs)

Created a chunk of size 668, which is longer than the specified 500
Created a chunk of size 982, which is longer than the specified 500
Created a chunk of size 789, which is longer than the specified 500


In [14]:
# Storing the embedding into VectorStoreDB
from langchain_community.vectorstores import FAISS

embedding =OllamaEmbeddings(model='llama3.2:1b')
db = FAISS.from_documents(splitted_docs, embedding)

### Similarity Search

In [11]:
# Similarity search based on query
query='How does the speaker describe the desired outcome of the war?'

similar_docs = db.similarity_search(query)
similar_docs[0].page_content

'It will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We are, let me say again, the sincere friends of the German people, and shall desire nothing so much as the early reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.'

In [16]:
# Similarity search based on vector

# Getting embedded query
embedded_doc = embedding.embed_query(query)

# Getting similarity score
similar_docs = db.similarity_search_by_vector(embedded_doc)
similar_docs[0].page_content

'It will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We are, let me say again, the sincere friends of the German people, and shall desire nothing so much as the early reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.'

### Similarity Search with score
There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [13]:
similar_docs_score=db.similarity_search_with_score(query)
similar_docs_score

[(Document(id='4f3d31cd-4892-4b81-bcaf-59a6797a3918', metadata={'source': 'speech.txt'}, page_content='It will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We are, let me say again, the sincere friends of the German people, and shall desire nothing so much as the early reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.'),
  0.5486419),
 (Document(id='d8382138-d05d-4812-9cdf-66e7e8207ff3', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressin

### FAISS as a Retriever

We can also convert the vector store into a Retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers.

In [10]:
# Getting db as retriever
retriever = db.as_retriever()
similar_docs = retriever.invoke('How does the speaker describe the desired outcome of the war?')
similar_docs[0].page_content

'It will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We are, let me say again, the sincere friends of the German people, and shall desire nothing so much as the early reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.'

### Saving and loading the DB

In [6]:
# Storing the db to local
db.save_local('faiss_db')

In [18]:
# Loading the saved db
loaded_db = FAISS.load_local('faiss_db', embedding, allow_dangerous_deserialization=True)
loaded_db.similarity_search(query)

[Document(id='4f3d31cd-4892-4b81-bcaf-59a6797a3918', metadata={'source': 'speech.txt'}, page_content='It will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because we act without animus, not in enmity toward a people or with the desire to bring any injury or disadvantage upon them, but only in armed opposition to an irresponsible government which has thrown aside all considerations of humanity and of right and is running amuck. We are, let me say again, the sincere friends of the German people, and shall desire nothing so much as the early reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.'),
 Document(id='d8382138-d05d-4812-9cdf-66e7e8207ff3', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are

## Chroma

Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

In [22]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

# Loading the document
docs = TextLoader('speech.txt').load()

# Splitting the document
splitted_docs = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50).split_documents(docs)

# Embedding Function
embedding = OllamaEmbeddings(model='llama3.2:1b')

In [25]:
# Storing into Vector Store
db = Chroma.from_documents(splitted_docs, embedding, persist_directory='./chroma_db') # persist directory to save the db
db

<langchain_chroma.vectorstores.Chroma at 0x1a9f8427b50>

In [27]:
# Similarity Search
query = 'What does the speaker believe is the main reason the United States should enter the war?'

similar_docs = db.similarity_search(query)
print(similar_docs[0].page_content)

# Similarity Search with Score
similar_docs = db.similarity_search_with_score(query)
print(similar_docs)

always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.
[(Document(metadata={'source': 'speech.txt'}, page_content='always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'), 0.8044684073188696), (Document(metadata={'source': 'speech.txt'}, page_content='The world must be made safe for democracy. Its peace must be planted upon the tested foundations of political liberty. We have no selfish ends to serve. We desire no conquest, n

In [28]:
# As Retriever
retriever = db.as_retriever()

similar_docs = retriever.invoke(query)
similar_docs[0].page_content

'always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'