# Chroma

This notebook shows how to use functionality related to the Chroma vector database.

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import MarkdownTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader



In [16]:
from langchain.document_loaders import TextLoader
loader = TextLoader('data\PoTa\Princes of the Apocalypse.md')
documents = loader.load()
text_splitter = MarkdownTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

In [3]:
db = Chroma.from_documents(docs, embeddings)

query = "Where is red larch?"
docs = db.similarity_search(query)

DEBUG:Chroma:Logger created


Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


DEBUG:Chroma:Index not found
DEBUG:Chroma:Index saved to .chroma/index/index.bin
DEBUG:Chroma:Index saved to .chroma/index/index.bin
DEBUG:Chroma:time to pre process our knn query: 0.0
DEBUG:Chroma:time to run knn query: 0.00099945068359375


In [8]:
print(docs[].page_content)

IndexError: list index out of range

## Similarity search with score

In [9]:
docs = db.similarity_search_with_score(query)

DEBUG:Chroma:time to pre process our knn query: 0.0
DEBUG:Chroma:time to run knn query: 0.0


In [13]:
docs[3]

(Document(page_content='Trouble in Red Larch\n\nRed Larch is an excellent place for novice adventurers to begin their careers. Scenarios in this section provide 1st- through 3rd-level characters with the opportunity to learn about Red Larch while dealing with local troubles.\n\nTo unearth interesting rumors and find directions to potential adventure sites, the characters must explore Red Larch and interact with the citizens. Red Larch is described in chapter 2. See the "Trouble in Red Larch" entries for what the locals know.', lookup_str='', metadata={'source': 'data\\PoTa\\Princes of the Apocalypse.md'}, lookup_index=0),
 0.28003960847854614)

## Persistance

The below steps cover how to persist a ChromaDB instance

### Initialize PeristedChromaDB
Create embeddings for each chunk and insert into the Chroma vector database. The persist_directory argument tells ChromaDB where to store the database when it's persisted.



In [17]:
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'data/PoTa/db'

embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory=persist_directory)

Running Chroma using direct local API.
No existing DB found in data/PoTa/db, skipping load
No existing DB found in data/PoTa/db, skipping load


DEBUG:Chroma:Index not found
DEBUG:Chroma:Index saved to data/PoTa/db/index/index.bin
DEBUG:Chroma:Index saved to data/PoTa/db/index/index.bin


### Persist the Database
In a notebook, we should call persist() to ensure the embeddings are written to disk. This isn't necessary in a script - the database will be automatically persisted when the client object is destroyed.

In [18]:
vectordb.persist()
vectordb = None

Persisting DB to disk, putting it in the save folder data/PoTa/db
PersistentDuckDB del, about to run persist
Persisting DB to disk, putting it in the save folder data/PoTa/db


### Load the Database from disk, and create the chain
Be sure to pass the same persist_directory and embedding_function as you did when you instantiated the database. Initialize the chain we will use for question answering.

In [19]:
# Now we can load the persisted database from disk, and use it as normal. 
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)


Running Chroma using direct local API.
loaded in 1266 embeddings
loaded in 1 collections
collection with name langchain already exists, returning existing collection
