# Vector Databases CRUD with ChromaDB

In this notebook, we will learn how to perform **CRUD operations** in **ChromaDB**:

- **C → Create (Add documents)**  
- **R → Read (Query documents)**  
- **U → Update (Modify documents/metadata)**  
- **D → Delete (Remove documents)**  

We will do this **with** and **without metadata**.  
Metadata helps us add extra info (author, year, category) so we can filter results easily.  


# Setup & Imports

We first import `chromadb` and create a client to connect to ChromaDB.  
We also set up an **embedding function** (SentenceTransformer model) that converts text → vectors.


In [1]:
import chromadb
from chromadb.utils import embedding_functions

# Create client
client = chromadb.Client()

# Define embedding function (SentenceTransformer)
ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# Create a Collection
We create a new collection named `"news_crud"`.  
Think of a collection like a table in SQL – it stores documents, embeddings, and metadata.


In [2]:
# Delete old collection if it exists
try:
    client.delete_collection("news_crud")
except:
    pass

# Create new collection
collection = client.create_collection(
    name="news_crud",
    embedding_function=ef
)

print(" Collection created")

 Collection created


# Add Documents (Create)
We insert documents into the collection.  
Each document must have a **unique ID**.


In [3]:
collection.add(
    documents=[
        "Elon Musk founded SpaceX in 2002.",
        "Apple just released iPhone 16 Pro.",
        "Virat Kohli scored a century yesterday."
    ],
    ids=["doc1", "doc2", "doc3"]
)

print(" Added 3 documents")

 Added 3 documents


# Query Documents (Read)
We query the database.  
The query is converted into an embedding → compared with stored vectors → top matches returned.


In [4]:
results = collection.query(
    query_texts=["Who founded SpaceX?"],
    n_results=2
)

print(" Query results:")
print(results)


 Query results:
{'ids': [['doc1', 'doc2']], 'embeddings': None, 'documents': [['Elon Musk founded SpaceX in 2002.', 'Apple just released iPhone 16 Pro.']], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[None, None]], 'distances': [[0.23058748245239258, 0.9184808731079102]]}


# Update Documents
Updating = re-adding with the **same ID**.  
This overwrites the old document.


In [5]:
collection.add(
    documents=["Apple unveiled iPhone 17 Ultra with AI features."],
    ids=["doc2"]   # same ID → overwrites
)

print(" Updated doc2")


 Updated doc2


# Delete Documents
We can delete documents using their IDs.


In [6]:
collection.delete(ids=["doc3"])
print(" Deleted doc3 (Virat Kohli news)")


 Deleted doc3 (Virat Kohli news)


# Peek Collection
We check the current state of the collection.


In [7]:
collection.peek()

{'ids': ['doc1', 'doc2'],
 'embeddings': array([[-6.02147216e-03,  2.66316328e-02,  1.03084885e-01,
         -3.98854241e-02, -3.02013680e-02, -5.04952632e-02,
          2.26892345e-03, -1.86547730e-03,  3.01086381e-02,
          1.63353700e-02,  4.95526707e-03,  2.80301068e-02,
          4.87003513e-02, -6.02711178e-02, -5.95448092e-02,
          4.97660302e-02, -5.22015709e-03, -3.37934941e-02,
          6.16169199e-02,  3.36451903e-02,  5.74301444e-02,
         -8.95075798e-02,  1.49134723e-02,  4.05533705e-03,
          5.93187176e-02,  7.40951002e-02,  6.60510063e-02,
          2.59323381e-02,  3.15149426e-02,  1.99558884e-02,
          1.47814760e-02,  4.71999981e-02,  4.66110706e-02,
         -1.84505235e-03, -1.33250086e-02,  1.23160936e-01,
          7.50400275e-02, -3.88929248e-02, -1.51681295e-03,
          1.55410804e-02,  1.68378535e-03, -4.15757895e-02,
          1.99168641e-02, -2.87700929e-02, -2.23909467e-02,
         -4.95715775e-02,  5.07577732e-02, -7.44896829e-02,


# **Metadata Example**

## Create a Collection with Metadata
Now let’s use metadata (category, author, year).  
Metadata helps filter documents beyond semantic meaning.


In [8]:
# Delete old collection if exists
try:
    client.delete_collection("news_metadata")
except:
    pass

collection_meta = client.create_collection(
    name="news_metadata",
    embedding_function=ef
)

print(" Metadata collection created")

 Metadata collection created


## Add Documents with Metadata
We attach metadata along with each document.


In [9]:
collection_meta.add(
    documents=[
        "Elon Musk founded SpaceX in 2002.",
        "Apple just released iPhone 16 Pro.",
        "Virat Kohli scored a century yesterday."
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[
        {"category": "space", "author": "John", "year": 2002},
        {"category": "tech", "author": "Alice", "year": 2024},
        {"category": "sports", "author": "Raj", "year": 2025}
    ]
)

print(" Added documents with metadata")

 Added documents with metadata


## Query with Metadata Filter
We filter search results by metadata.  
Example → find `"Apple iPhone"` only inside **tech** category.


In [10]:
results = collection_meta.query(
    query_texts=["Apple iPhone"],
    n_results=2,
    where={"category": "tech"}   # filter
)

print(" Query results (filtered by category=tech):")
print(results)


 Query results (filtered by category=tech):
{'ids': [['doc2']], 'embeddings': None, 'documents': [['Apple just released iPhone 16 Pro.']], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[{'year': 2024, 'category': 'tech', 'author': 'Alice'}]], 'distances': [[0.39052993059158325]]}


##  Update Metadata
Re-adding with the same ID allows us to update metadata too.


In [11]:
collection_meta.add(
    documents=["Apple unveiled iPhone 17 Ultra with AI features."],
    ids=["doc2"],
    metadatas=[{"category": "tech", "author": "Alice", "year": 2025}]
)

print(" Updated doc2 with new metadata")


 Updated doc2 with new metadata


##  Delete by Metadata
We can delete all docs from a certain category.  
Example → delete all `"sports"` news.


In [12]:
collection_meta.delete(where={"category": "sports"})
print(" Deleted all sports news")

 Deleted all sports news


# Summary

- **Add (Create)** → `collection.add()`  
- **Read (Query)** → `collection.query()`  
- **Update** → re-add with same `id`  
- **Delete** → `collection.delete(ids=[...])` or `collection.delete(where={...})`  
- **Metadata** → enables category/year filtering  

This extends **Day94 notebook** by adding CRUD operations and metadata support.
