## Chroma DB

It is an AI-native open-source vector database focused on developer productivity. It is licenced under Apache 2.0.

To use chroma we need to install the package using `pip install chromadb` and also `pip install langchain-chroma`.

In this tutorial, we will build a sample vectordb from documents.

In [2]:
import os
from langchain_chroma import Chroma
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_ollama import OllamaEmbeddings

file_path = 'levski.txt'
assert os.path.exists(file_path)

text_loader = TextLoader(file_path=file_path)
docs = text_loader.load()

text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=10)
splitted_docs = text_splitter.split_documents(documents=docs)

embeddings = OllamaEmbeddings(model='gemma:2b')

db_persist_dir = 'chroma'
database = Chroma.from_documents(documents=splitted_docs, embedding=embeddings,
                                 persist_directory=db_persist_dir)

assert os.path.exists(db_persist_dir)
database

Created a chunk of size 387, which is longer than the specified 100
Created a chunk of size 625, which is longer than the specified 100


<langchain_chroma.vectorstores.Chroma at 0x1207ffed0>

Query from the Chroma DB

In [3]:
query = 'How many trophies did the team win?'

similar = database.similarity_search(query=query, k=3)
for i, s in enumerate(similar):
    print(f'Similarity {i}: {s}')

Similarity 0: page_content='The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association.' metadata={'source': 'levski.txt'}
Similarity 1: page_content='Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league system in 1937.[1] On the international stage, Levski reached the quarter-finals of the UEFA Cup twice and the quarter-finals of the Cup Winners' C

#### Save and load

In [4]:
import os

db_folder = './chroma'
assert os.path.exists(db_folder)

loaded_db = Chroma(embedding_function=embeddings, persist_directory=db_folder)
loaded_db

<langchain_chroma.vectorstores.Chroma at 0x117b38050>

### Retriever option

In [5]:
retriever = loaded_db.as_retriever()
retrieved = retriever.invoke(input=query, k=3)

for i, r in enumerate(retrieved):
    print(f'Retrieved {i}: {r}')

Retrieved 0: page_content='The team's home kit colour is all-blue. Levski's home ground is the Georgi Asparuhov Stadium in Sofia, which has a capacity of 17,688 spectators. The club's fiercest rival is CSKA Sofia, and matches between the two capital sides are commonly referred to as the Eternal derby of Bulgaria. Levski also contests the Oldest capital derby with Slavia Sofia, since 1915. The club is a regular member of the European Club Association and the European Multisport Club Association.' metadata={'source': 'levski.txt'}
Retrieved 1: page_content='Levski have won a total of 74 trophies, including 26 national championships, 26 national cups and 3 supercups, as well as 13 domestic doubles and one treble. They are the only Bulgarian football club to have never been relegated from the top division since the establishment of the league system in 1937.[1] On the international stage, Levski reached the quarter-finals of the UEFA Cup twice and the quarter-finals of the Cup Winners' Cup

data