#### Chroma
chroma is a AI-native open-source vector database focused on developer productivity and happiness. chroma is licensed under Apache 2.0

In [2]:
## building a sample vectorDB
from langchain_chroma import Chroma
from langchain_community.document_loaders import  TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter



In [4]:
loader = TextLoader("dataIngesion/speech.txt")
data = loader.load()
data

[Document(metadata={'source': 'dataIngesion/speech.txt'}, page_content='\nThe world must be made safe for democracy. \nIts peace must be planted upon the tested foundations of political liberty. \nWe have no selfish ends to serve. \nWe desire no conquest, no dominion. \nWe seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. \nWe are but one of the champions of the rights of mankind. \nWe shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them...\n...\nOcean and the North Sea in order to shut in Germany, \nand the principle on which Germany by her submarines seeks to destroy all shipping which enters the war zone which she has laid out around the British Isles.')]

In [5]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

#### create vector store

In [7]:
embeddings = OllamaEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings)
vectordb

<langchain_chroma.vectorstores.Chroma at 0x15c365aabc0>

In [9]:
## query it
query = "Why is the world must be made safe for democracy?"
docs = vectordb.similarity_search(query=query)
docs[0].page_content

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


'Ocean and the North Sea in order to shut in Germany, \nand the principle on which Germany by her submarines seeks to destroy all shipping which enters the war zone which she has laid out around the British Isles.'

In [10]:
## Saving to the disk
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings, persist_directory="./chroma_db")

In [11]:
## laod from the disk
db2 = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
docs = db2.similarity_search(query)
print(docs[0].page_content)

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


Ocean and the North Sea in order to shut in Germany, 
and the principle on which Germany by her submarines seeks to destroy all shipping which enters the war zone which she has laid out around the British Isles.


In [13]:
## similarity search with score
docs = vectordb.similarity_search_with_score(query)
docs

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


[(Document(id='4c9f367d-6f10-4ad3-90b0-ca0529f7defe', metadata={'source': 'dataIngesion/speech.txt'}, page_content='Ocean and the North Sea in order to shut in Germany, \nand the principle on which Germany by her submarines seeks to destroy all shipping which enters the war zone which she has laid out around the British Isles.'),
  12750.168716253535),
 (Document(id='02b4502e-5670-4650-b3c6-0238d0192d30', metadata={'source': 'dataIngesion/speech.txt'}, page_content='The world must be made safe for democracy. \nIts peace must be planted upon the tested foundations of political liberty. \nWe have no selfish ends to serve. \nWe desire no conquest, no dominion. \nWe seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. \nWe are but one of the champions of the rights of mankind. \nWe shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them...\n...'),
  23656.133742340604)]

In [15]:
## Retriever option
retriever = vectordb.as_retriever()
retriever.invoke(query)[0].page_content

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


'Ocean and the North Sea in order to shut in Germany, \nand the principle on which Germany by her submarines seeks to destroy all shipping which enters the war zone which she has laid out around the British Isles.'