# Chroma DB
- Chroma is a AI-native open source vector database focused on developer productivity and happiness.
- Chroma is licensed under Apache 2.0.
- More on available vectore store: https://python.langchain.com/docs/integrations/vectorstores/

# Sample Vector Database

In [1]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [2]:
loader = TextLoader("/Users/surajbhardwaj/Desktop/Langchain/L3_VectoreStore/L3_1_FAISS/speech.txt")
data = loader.load()
data

[Document(metadata={'source': '/Users/surajbhardwaj/Desktop/Langchain/L3_VectoreStore/L3_1_FAISS/speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.\n\nIn operating systems such as CP/M, where the operating system does \nnot keep track of the file size in bytes, the end of a text file is \ndenoted by placing one or more special characters, \nknown as an end-of-file (EOF) marker, as padding after the last \nline in a text file. In modern operating systems such as DOS, \nMicrosoft Windows and Unix-like systems, \ntext files do not contain any special EOF character, \nbecause file systems on those operating systems \nkeep track of the file size in bytes.\n\nSome operating systems, such as Multics, Unix-like systems, \nCP/M, DOS, the classic Mac OS, and Windows, \nstore text fil

In [3]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

In [4]:
embedding = OllamaEmbeddings(model="llama3.2")
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
vectordb

  embedding = OllamaEmbeddings(model="llama3.2")


<langchain_chroma.vectorstores.Chroma at 0x7fadf867e4d0>

In [5]:
query = "What is a text file?"
docs = vectordb.similarity_search(query)

In [6]:
docs[0].page_content

'A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.'

In [7]:
# Save to the local disk
vectordb = Chroma.from_documents(documents=splits, embedding=embedding, persist_directory="./chroma_db")


In [8]:
# Load a vector database from the disk
db2 = Chroma(persist_directory="./chroma_db/", embedding_function=embedding)
docs = db2.similarity_search(query)
print(docs[0].page_content)

A text file (sometimes spelled textfile; 
an old alternative name is flat file) is a kind of computer file 
that is structured as a sequence of lines of electronic text. 
A text file exists stored as data within a computer file system.


# Similarity search with L2 (Euclidean Score)

In [11]:
docs = vectordb.similarity_search_with_score(query)
docs

[(Document(metadata={'source': '/Users/surajbhardwaj/Desktop/Langchain/L3_VectoreStore/L3_1_FAISS/speech.txt'}, page_content='A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.'),
  5368.700921485091),
 (Document(metadata={'source': '/Users/surajbhardwaj/Desktop/Langchain/L3_VectoreStore/L3_1_FAISS/speech.txt'}, page_content='"Text file" refers to a type of container, \nwhile plain text refers to a type of content.'),
  5750.455955366129),
 (Document(metadata={'source': '/Users/surajbhardwaj/Desktop/Langchain/L3_VectoreStore/L3_1_FAISS/speech.txt'}, page_content='keep track of the file size in bytes.'),
  5960.163700314714),
 (Document(metadata={'source': '/Users/surajbhardwaj/Desktop/Langchain/L3_VectoreStore/L3_1_FAISS/speech.txt'}, page_content='Some operating systems, such as Multics, Unix-like sy

# Retriever

In [12]:
retriever = vectordb.as_retriever()
retriever.invoke(query)[0].page_content

'A text file (sometimes spelled textfile; \nan old alternative name is flat file) is a kind of computer file \nthat is structured as a sequence of lines of electronic text. \nA text file exists stored as data within a computer file system.'