# Test Chroma Database

[Official Website](https://docs.trychroma.com/)  
[Chroma Cookbook](https://cookbook.chromadb.dev/)

In [2]:
import chromadb
import os

DB_PATH = "../../../Database/Example/"

## Cloning Collection

In [2]:
client = chromadb.PersistentClient(
    path=os.path.join(DB_PATH, "test_cloning")
)  

In [3]:
collection = client.get_or_create_collection(
    name="test_collection",
    metadata={
        "hnsw:space": "cosine",
        "info": "This is the collection for testing cloning"
    }
)

In [5]:
collection.add(
    ids=[f"{i}" for i in range(10)],
    documents=[f"document_{i}" for i in range(10)],
    metadatas=[{"info": f"metadata_{i}"} for i in range(10)]
)

In [7]:
# Creating Collection to clone the data into

cloned_collection = client.get_or_create_collection(
    name="cloned_collection",
    metadata={
        "info": "This is the cloned collection from test_collection"
    }
)

In [10]:
existing_docs = collection.count()
existing_docs, client.get_max_batch_size()   # max batch size chroma can handle is 5461

(10, 5461)

In [16]:
batch_size = 20

for i in range(0, existing_docs, batch_size):   # start, end, step
    batch = collection.get(include=["metadatas", "documents", "embeddings"], limit=batch_size, offset=i)
    cloned_collection.add(
        ids=batch["ids"],
        documents=batch["documents"],
        metadatas=batch["metadatas"],
        embeddings=batch["embeddings"]
    )

In [18]:
cloned_collection.count()

10

To have a another embedding function in cloned collection, you have to to following steps:
- Create cloned collection with new embedding function
- only get `["metadatas", "documents"]` instead of `["metadatas", "documents", "embedding"]` from original collection
- add to cloned collection `ids`, `documents`, `metadatas` from original collection
    - embedding will be automatically calculated by new embedding function that was defined during creation of cloned collection

## Copying Collection

Copy a collection from a databse to another database.

In [3]:
client_1 = chromadb.PersistentClient(
    path=os.path.join(DB_PATH, "test_copying", "1")
)  

client_2 = chromadb.PersistentClient(
    path=os.path.join(DB_PATH, "test_copying", "2")
)  

In [4]:
collection = client_1.get_or_create_collection(
    name="test_collection",
    metadata={
        "hnsw:space": "cosine",
        "info": "This is the collection for testing copying"
    }
)

In [5]:
collection.add(
    ids=[f"{i}" for i in range(10)],
    documents=[f"document_{i}" for i in range(10)],
    metadatas=[{"info": f"metadata_{i}"} for i in range(10)]
)

collection.count()

10

In [6]:
copied_collection = client_2.get_or_create_collection(
    "copied_collection",
    metadata=collection.metadata
)

In [8]:
existing_docs = collection.count()
existing_docs, client_1.get_max_batch_size()   # max batch size chroma can handle is 5461

(10, 5461)

In [9]:
batch_size = 20

for i in range(0, existing_docs, batch_size):   # start, end, step
    batch = collection.get(
        include=["metadatas", "documents", "embeddings"], 
        limit=batch_size, 
        offset=i
    )
    copied_collection.add(
        ids=batch["ids"],
        documents=batch["documents"],
        metadatas=batch["metadatas"],
        embeddings=batch["embeddings"]
    )

In [10]:
copied_collection.count()

10

To have a another embedding function in copied collection, you have to to following steps:
- Create copied collection with new embedding function
- only get `["metadatas", "documents"]` instead of `["metadatas", "documents", "embedding"]` from original collection
- add to cloned collection `ids`, `documents`, `metadatas` from original collection
    - embedding will be automatically calculated by new embedding function that was defined during creation of copied collection