# Notebook: Vector Store Proof of Concept

This notebook demonstrates why a vector store like ChromaDB was chosen. Beyond simple key-value storage, it allows for powerful **semantic similarity searches**.

**Goal:**
1.  Create a temporary, in-memory ChromaDB instance.
2.  Add several related and unrelated tasks.
3.  Perform a natural language query against the database.
4.  Verify that the most semantically relevant tasks are returned.

This capability is the foundation for future features like finding duplicate tasks or retrieving contextually related information.

In [2]:
import sys
sys.path.append('..')

import chromadb
from src.models.task import Task

# Use an in-memory instance for this POC to avoid creating files
client = chromadb.Client()
collection = client.get_or_create_collection(name="poc_tasks")

print("✅ In-memory ChromaDB collection created.")

✅ In-memory ChromaDB collection created.


In [3]:
from datetime import datetime

# Create sample tasks, being explicit with optional fields to satisfy the linter.
tasks = [
    Task(title="Finalize the quarterly marketing report", category="Work", description=None, due_date=None),
    Task(title="Prepare slides for the project presentation", category="Work", description=None, due_date=None),
    Task(title="Schedule a dentist appointment", category="Personal", description=None, due_date=None),
    Task(title="Review the Q3 financial report", category="Work", description=None, due_date=None),
]

# Add tasks to the collection
for task in tasks:
    # Step 1: Convert Pydantic model to a dictionary, excluding None values
    metadata = task.model_dump(exclude_none=True)

    # Step 2: Convert any datetime objects into ISO 8601 strings
    for key, value in metadata.items():
        if isinstance(value, datetime):
            metadata[key] = value.isoformat()
    
    # Step 3: Add the fully compliant metadata to the collection
    collection.add(
        ids=[task.id],
        documents=[task.title + " " + (task.description or "")],
        metadatas=[metadata]
    )

print(f"✅ Added {len(tasks)} tasks to the collection.")

✅ Added 4 tasks to the collection.


## Performing the Semantic Search
Now, let's ask a question in natural language and see which tasks the vector store considers most relevant.

In [4]:
from datetime import datetime

# Create sample tasks, being explicit with optional fields to satisfy the linter.
tasks = [
    Task(title="Finalize the quarterly marketing report", category="Work", description=None, due_date=None),
    Task(title="Prepare slides for the project presentation", category="Work", description=None, due_date=None),
    Task(title="Schedule a dentist appointment", category="Personal", description=None, due_date=None),
    Task(title="Review the Q3 financial report", category="Work", description=None, due_date=None),
]

# Add tasks to the collection
for task in tasks:
    # Step 1: Convert Pydantic model to a dictionary, excluding None values
    metadata = task.model_dump(exclude_none=True)

    # Step 2: Convert any datetime objects into ISO 8601 strings
    for key, value in metadata.items():
        if isinstance(value, datetime):
            metadata[key] = value.isoformat()
    
    # Step 3: Add the fully compliant metadata to the collection
    collection.add(
        ids=[task.id],
        documents=[task.title + " " + (task.description or "")],
        metadatas=[metadata]
    )

print(f"✅ Added {len(tasks)} tasks to the collection.")

✅ Added 4 tasks to the collection.


### Conclusion

As seen above, the query correctly identified "Finalize the quarterly marketing report" and "Review the Q3 financial report" as the most relevant tasks, even though the words don't match exactly. It correctly ignored the irrelevant dentist appointment task.

This proves the power of using a vector store for semantic understanding and is a strong foundation for more advanced agent capabilities.