<h3> ChromaDB Crash Course</h3>

<p style="font-size:16px"> ChromaDB is a vector database that automatically computes and stores embeddings for text data. When a query is made, it compares the embeddings using distance metrics and returns the most relevant results.</p>

In [None]:
pip install chromadb

Create a DB


In [28]:
import chromadb 
chroma_client = chromadb.Client()

create collection

In [3]:
collection = chroma_client.create_collection(name="My_Collection")

Add some text document to collection

In [13]:
collection.add(
    documents=[
        "This is a document about  ",
        "Welcome to this documente"
    ],
    ids=['id1', 'id2']
)

Query the collection

In [14]:
results=collection.query(
    query_texts=["This is a sample query about hawaii"], # chroma will embed this 
    n_results=2 # No of result to return
)

print(results)

{'ids': [['id1', 'id2']], 'embeddings': None, 'documents': [['This is a sample document ', 'Welcome to this documente']], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[None, None]], 'distances': [[1.4456195831298828, 1.9398043155670166]]}


In [15]:
results=collection.query(
    query_texts=["This is a sample query about Orange"], # chroma will embed this 
    n_results=2 # No of result to return
)

print(results)

{'ids': [['id1', 'id2']], 'embeddings': None, 'documents': [['This is a sample document ', 'Welcome to this documente']], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[None, None]], 'distances': [[1.4362506866455078, 1.9013142585754395]]}


Persist the data

In [16]:
client = chromadb.PersistentClient(path="./db/")

In [17]:
client.heartbeat()  # returns a nanosec heartbeat to make sure the client remain connected

1747307273492404200

In [None]:
client.reset() # By simply runnin this will not work

AuthorizationError: Reset is disabled by config

In [30]:
from chromadb.config import DEFAULT_TENANT, DEFAULT_DATABASE, Settings

In [31]:
DEFAULT_TENANT,DEFAULT_DATABASE

('default_tenant', 'default_database')

In [32]:
client = chromadb.PersistentClient(
    path="./db2/",
    settings=Settings(
                is_persistent = True,
                persist_directory = "/db2/",
                allow_reset = True,
        anonymized_telemetry=False),
    tenant=DEFAULT_TENANT,
    database=DEFAULT_DATABASE,
)

# path - parameter must be a local path on the machine where Chroma is running. If the path does not exist, it will be created. The path can be relative or absolute. If the path is not specified, the default is ./chroma in the current working directory.
# settings - Chroma settings object.
# tenant - the tenant to use. Default is default_tenant.
# database - the database to use. Default is default_database.

In [None]:
client.reset()  # Now this will work

True

<b> Creating, Inspecting and Deleting Collection </b>

<p>
Chroma uses collection names in the url, so there are a few restrictions on naming them:

- The length of the name must be between 3 and 63 characters.
- The name must start and end with a lowercase letter or a digit, and it can contain dots, dashes, and underscores in between.
- The name must not contain two consecutive dots.
- The name must not be a valid IP address.</p>

<italics>Chroma collections are created with a name and an optional embedding function. If you supply an embedding function, you must supply it every time you get the collection.</italics >

In [34]:
from chromadb.utils import embedding_functions

In [37]:
emb_fun=embedding_functions.SentenceTransformerEmbeddingFunction()

  from .autonotebook import tqdm as notebook_tqdm


In [38]:
model_name = "all-MiniLM-L6-v2"
emb_fun = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)

In [48]:
emb_fun(["Welcome"])

[array([-5.73623739e-02, -1.16481781e-02, -1.22139952e-03,  2.09228769e-02,
         2.38797851e-02, -3.77398245e-02, -2.04281565e-02, -2.06078868e-02,
        -4.46448885e-02, -2.95937173e-02,  3.68522406e-02,  5.80347553e-02,
        -6.66918010e-02,  1.99460499e-02, -6.76470697e-02,  6.63192570e-02,
         7.03084841e-02, -1.20073380e-02, -2.81319190e-02, -5.41426837e-02,
         4.89913998e-03, -4.27241176e-02,  5.71595831e-03,  3.22622769e-02,
        -4.49759774e-02, -1.69816334e-02,  3.40951197e-02,  6.09736741e-02,
         1.69017669e-02, -3.45815085e-02, -4.21386994e-02,  7.32292309e-02,
         4.14056964e-02,  9.79096349e-03,  2.84802783e-02, -2.45167073e-02,
         2.18786951e-02, -2.08951533e-02, -3.16071697e-02, -1.68134626e-02,
         1.00759100e-02, -2.11724937e-02, -4.47666273e-02,  1.13326788e-03,
        -5.24000973e-02,  1.01943985e-01, -4.18204255e-02, -4.08316217e-02,
         1.66139044e-02,  5.19749448e-02,  4.90134489e-03, -5.97857637e-03,
        -4.8

In [47]:
len(emb_fun(["Welcome"])[0])

384