# üìù CRUD Operations in ChromaDB

## Overview
This notebook covers **Create, Read, Update, Delete (CRUD)** operations in ChromaDB. These are essential operations for managing your vector database in real-world applications.

## What You'll Learn
- Creating collections with metadata
- Adding documents with custom metadata
- Retrieving documents by ID
- Updating existing documents and metadata
- Deleting records and collections

## Prerequisites
```bash
pip install chromadb
```

---

In [1]:
import chromadb
client = chromadb.Client()

## 1. Setup

Initialize the ChromaDB client. We'll use an in-memory client for this demo.

In [2]:
from datetime import datetime

collection = client.create_collection(
    name="reviews", 
#     embedding_function=emb_fn,
    metadata={
        "description": "Product reviews",
        "created": str(datetime.now())
    }  
)

### Creating a Collection with Metadata

Collections can have **metadata** - descriptive information about the collection itself:
- `description`: What the collection contains
- `created`: Timestamp for tracking
- Custom fields for your use case

In [3]:
# get collection gets a collection object. If collection doesn't exist then it throws an error
collection = client.get_collection("reviews")

### Retrieving an Existing Collection

Use `get_collection()` to access an existing collection by name.  
‚ö†Ô∏è **Note**: Throws an error if the collection doesn't exist.

---

## 2. CREATE - Adding Documents

When adding documents, you can include:
- `documents`: The text content
- `ids`: Unique identifiers for each document
- `metadatas`: Additional information for filtering (category, rating, date, etc.)

**üí° Tip**: Well-designed metadata enables powerful filtering in queries!

In [4]:
# 1. Add Initial Customer Feedback
collection.add(
    documents=[
        "The delivery was fast and the product quality is excellent!",
        "I was not able to increase TV's brightness so I returned it back",
        "The shoes I ordered were too small. Sizing is inaccurate.",
        "Great customer support. Resolved my issue in minutes."
    ],
    ids=["r1", "r2", "r3", "r4"],
    metadatas=[
        {"product_category": "electronics", "rating": 5},
        {"product_category": "electronics", "rating": 2},
        {"product_category": "apparel", "rating": 3},
        {"product_category": "services", "rating": 4}
    ]
)

In [5]:
collection.count()

4

---

## 3. READ - Retrieving Documents

### Count Documents
Check how many documents are in the collection.

In [6]:
# returns all the documents in a collection
collection.peek()

{'ids': ['r1', 'r2', 'r3', 'r4'],
 'embeddings': array([[-0.0339851 ,  0.0181215 ,  0.04055561, ..., -0.0774245 ,
          0.04827411,  0.08681256],
        [ 0.07655202, -0.03120006, -0.02235121, ...,  0.00834924,
          0.01038241, -0.00836819],
        [ 0.01316783,  0.04316622,  0.08494314, ..., -0.06392363,
         -0.11793304, -0.00748242],
        [-0.0141813 , -0.00990827,  0.02939942, ...,  0.02678237,
         -0.08970497,  0.03805254]], shape=(4, 384)),
 'documents': ['The delivery was fast and the product quality is excellent!',
  "I was not able to increase TV's brightness so I returned it back",
  'The shoes I ordered were too small. Sizing is inaccurate.',
  'Great customer support. Resolved my issue in minutes.'],
 'uris': None,
 'included': ['metadatas', 'documents', 'embeddings'],
 'data': None,
 'metadatas': [{'rating': 5, 'product_category': 'electronics'},
  {'rating': 2, 'product_category': 'electronics'},
  {'product_category': 'apparel', 'rating': 3},
  {'p

### Peek at Documents
`peek()` returns a sample of documents in the collection - useful for quick inspection.

In [7]:
collection.get(ids=['r1','r2'])

{'ids': ['r1', 'r2'],
 'embeddings': None,
 'documents': ['The delivery was fast and the product quality is excellent!',
  "I was not able to increase TV's brightness so I returned it back"],
 'uris': None,
 'included': ['metadatas', 'documents'],
 'data': None,
 'metadatas': [{'product_category': 'electronics', 'rating': 5},
  {'rating': 2, 'product_category': 'electronics'}]}

### Get Documents by ID
Retrieve specific documents using their unique IDs.

In [8]:
collection.query(
    query_texts = ["fast shipping"],
    n_results=2,
    where={"product_category": "electronics"}
)

{'ids': [['r1', 'r2']],
 'embeddings': None,
 'documents': [['The delivery was fast and the product quality is excellent!',
   "I was not able to increase TV's brightness so I returned it back"]],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'rating': 5, 'product_category': 'electronics'},
   {'rating': 2, 'product_category': 'electronics'}]],
 'distances': [[0.8920919895172119, 2.0018959045410156]]}

### Query with Filters
Combine semantic search with metadata filtering for precise results.

---

## 4. UPDATE - Modifying Documents

Update existing documents by their ID. You can change:
- The document text (which re-generates the embedding)
- The metadata

**Use Case**: A customer updated their review after their issue was resolved.

In [10]:
collection.update(
    documents=["The shoes I ordered were replaced with a larger size. Fits perfectly now!"],
    ids=["r3"],
    metadatas=[
        {"product_category": "apparel", "rating": 5}  # Rating improved after resolution
    ]
)

In [11]:
# Query Again to Verify Update
query_results = collection.query(
    query_texts=["shoes size"],
    n_results=2,
    where={"product_category": "apparel"}
)
query_results

{'ids': [['r3']],
 'embeddings': None,
 'documents': [['The shoes I ordered were replaced with a larger size. Fits perfectly now!']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'rating': 5, 'product_category': 'apparel'}]],
 'distances': [[0.6437578201293945]]}

### Verify the Update
Query again to confirm the document was updated successfully.

---

## 5. DELETE - Removing Data

### Delete Specific Records
Remove documents by their IDs. Useful for:
- Removing outdated information
- GDPR compliance (right to be forgotten)
- Data cleanup

In [12]:
collection.delete(ids=["r1","r2","r3"])

### Delete Entire Collection
‚ö†Ô∏è **Warning**: This permanently removes the collection and all its data!

In [13]:
client.delete_collection(name='reviews')

---

## üìù Summary

| Operation | Method | Description |
|-----------|--------|-------------|
| **Create** | `collection.add()` | Add new documents with metadata |
| **Read** | `collection.get()`, `peek()`, `query()` | Retrieve documents |
| **Update** | `collection.update()` | Modify existing documents |
| **Delete** | `collection.delete()` | Remove specific documents |
| **Delete Collection** | `client.delete_collection()` | Remove entire collection |