<a href="https://colab.research.google.com/github/Junaidkhan1485786/chromaDatabaseintroduction/blob/main/chroma_database_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**ChromaDB** is a vector database designed for managing and searching high-dimensional data efficiently. It’s particularly useful for applications involving AI and machine learning, where it supports embedding-based searches. ChromaDB integrates well with various data processing and analysis tools, offering scalability and performance for complex queries. Its API is designed to simplify the handling of large datasets and complex data relationships.

In [2]:
# including some steps use chroma database

**including some steps use chroma database**

1.   Install chroma
2.   Create the client
3. Create the collection
4. Load data
5. Query data by nearest embedding

In [3]:
pip install chromadb

Collecting chromadb
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.112.0-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.30.6-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.5.0-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.18.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.3 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.26.0-py3-none-any.whl.metadata (1.4 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_pro

In [4]:
import chromadb
client = chromadb.Client()

collection = client.get_or_create_collection("test")

In [6]:
collection

Collection(id=55e51bbd-37f8-437a-b471-78675cec93e2, name=test)

In [7]:
# Adding data to the collection in ChromaDB

collection.add(
    embeddings=[
        [1.1, 2.3, 3.2],  # Vector representation for item 1
        [4.5, 6.9, 4.4],  # Vector representation for item 2
        [1.1, 2.3, 3.2],  # Vector representation for item 3 (same as item 1)
        [4.5, 6.9, 4.4],  # Vector representation for item 4 (same as item 2)
        [1.1, 2.3, 3.2],  # Vector representation for item 5 (same as item 1)
        [4.5, 6.9, 4.4],  # Vector representation for item 6 (same as item 2)
        [1.1, 2.3, 3.2],  # Vector representation for item 7 (same as item 1)
        [4.5, 6.9, 4.4],  # Vector representation for item 8 (same as item 2)
    ],
    metadatas=[
        {"uri": "img1.png", "style": "style1"},  # Metadata for item 1
        {"uri": "img2.png", "style": "style2"},  # Metadata for item 2
        {"uri": "img3.png", "style": "style1"},  # Metadata for item 3
        {"uri": "img4.png", "style": "style1"},  # Metadata for item 4
        {"uri": "img5.png", "style": "style1"},  # Metadata for item 5
        {"uri": "img6.png", "style": "style1"},  # Metadata for item 6
        {"uri": "img7.png", "style": "style1"},  # Metadata for item 7
        {"uri": "img8.png", "style": "style1"},  # Metadata for item 8
    ],
    documents=["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8"],  # Associated document identifiers
    ids=["id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8"],  # Unique identifiers for each entry
)

# Querying the collection to find similar items

query_result = collection.query(
    query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],  # Embeddings to search for (two query vectors)
    n_results=2,  # Number of similar items to retrieve
)

# Print the result of the query
print(query_result)


{'ids': [['id7', 'id1'], ['id4', 'id2']], 'distances': [[0.0, 0.0], [11.959999084472656, 11.959999084472656]], 'metadatas': [[{'style': 'style1', 'uri': 'img7.png'}, {'style': 'style1', 'uri': 'img1.png'}], [{'style': 'style1', 'uri': 'img4.png'}, {'style': 'style2', 'uri': 'img2.png'}]], 'embeddings': None, 'documents': [['doc7', 'doc1'], ['doc4', 'doc2']], 'uris': None, 'data': None, 'included': ['metadatas', 'documents', 'distances']}


In [8]:
print(query_result)

{'ids': [['id7', 'id1'], ['id4', 'id2']], 'distances': [[0.0, 0.0], [11.959999084472656, 11.959999084472656]], 'metadatas': [[{'style': 'style1', 'uri': 'img7.png'}, {'style': 'style1', 'uri': 'img1.png'}], [{'style': 'style1', 'uri': 'img4.png'}, {'style': 'style2', 'uri': 'img2.png'}]], 'embeddings': None, 'documents': [['doc7', 'doc1'], ['doc4', 'doc2']], 'uris': None, 'data': None, 'included': ['metadatas', 'documents', 'distances']}


In [9]:
# Example of querying the collection and printing the results in vertical format

query_result = collection.query(
    query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
    n_results=2,
)

# Extracting results for better readability
ids = query_result['ids']
distances = query_result['distances']
metadatas = query_result['metadatas']
documents = query_result['documents']

# Printing results in a vertical format
for i in range(len(ids)):
    print(f"Query {i + 1}:")
    print(f"  IDs:")
    for id in ids[i]:
        print(f"    {id}")
    print(f"  Distances:")
    for distance in distances[i]:
        print(f"    {distance:.2f}")
    print(f"  Metadata:")
    for meta in metadatas[i]:
        print(f"    Style: {meta['style']}, URI: {meta['uri']}")
    print(f"  Documents:")
    for doc in documents[i]:
        print(f"    {doc}")
    print()


Query 1:
  IDs:
    id7
    id1
  Distances:
    0.00
    0.00
  Metadata:
    Style: style1, URI: img7.png
    Style: style1, URI: img1.png
  Documents:
    doc7
    doc1

Query 2:
  IDs:
    id4
    id2
  Distances:
    11.96
    11.96
  Metadata:
    Style: style1, URI: img4.png
    Style: style2, URI: img2.png
  Documents:
    doc4
    doc2



In [10]:
# Import the necessary library (assuming ChromaDB is imported already)
# from chromadb import ChromaDB, Collection

# Initialize the ChromaDB client and create a collection
# client = ChromaDB()
# collection = client.create_collection(name="my_collection")

# Add data to the collection
collection.add(
    embeddings=[
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
    ],
    metadatas=[
        {"uri": "img1.png", "style": "style1"},
        {"uri": "img2.png", "style": "style2"},
        {"uri": "img3.png", "style": "style1"},
        {"uri": "img4.png", "style": "style1"},
        {"uri": "img5.png", "style": "style1"},
        {"uri": "img6.png", "style": "style1"},
        {"uri": "img7.png", "style": "style1"},
        {"uri": "img8.png", "style": "style1"},
    ],
    documents=["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8"],
    ids=["id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8"],
)

# Query the collection
query_result = collection.query(
    query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
    n_results=2,
)

# Function to print results in vertical format
def print_query_results(results):
    ids = results['ids']
    distances = results['distances']
    metadatas = results['metadatas']
    documents = results['documents']

    for i in range(len(ids)):
        print(f"Query {i + 1}:")
        print(f"  IDs:")
        for id in ids[i]:
            print(f"    {id}")
        print(f"  Distances:")
        for distance in distances[i]:
            print(f"    {distance:.2f}")
        print(f"  Metadata:")
        for meta in metadatas[i]:
            print(f"    Style: {meta['style']}, URI: {meta['uri']}")
        print(f"  Documents:")
        for doc in documents[i]:
            print(f"    {doc}")
        print()

# Call the function to display results
print_query_results(query_result)




Query 1:
  IDs:
    id7
    id1
  Distances:
    0.00
    0.00
  Metadata:
    Style: style1, URI: img7.png
    Style: style1, URI: img1.png
  Documents:
    doc7
    doc1

Query 2:
  IDs:
    id4
    id2
  Distances:
    11.96
    11.96
  Metadata:
    Style: style1, URI: img4.png
    Style: style2, URI: img2.png
  Documents:
    doc4
    doc2

