## RAG with PostgreSQL as a Vector Database

In this example, we'll use the **Ollama** distribution as a base template. First, create a new distribution from the default:

```bash
llama stack build --template ollama --image-type venv --image-name vllm-test
```

Then move the generated template file into your project directory:

```bash
cp ~/.llama/distributions/ollama/ollama-run.yaml /path-to-current-project/ollama-postgresql.yaml
```

### Add PostgreSQL as a `vector_io` Provider

Extend the configuration file by adding a new `vector_io` provider block using PostgreSQL:

> ⚠️ **Warning**: Do **not** remove the existing SQLite configuration. Llama Stack uses it for internal metadata storage. Removing it will cause the system to fail. Keep both PostgreSQL and SQLite configurations.

```yaml
vector_io:
  - config:
      kvstore:
        type: postgres
        name: postgres_kv
        host: localhost
        port: 5432
        database: llamastack
        user: postgres
        password: postgres
    provider_id: pgvector
    provider_type: remote::pgvector
```

### Add PostgreSQL Configuration to the Agent Provider

Also add PostgreSQL as the persistence layer for the agent provider:

```yaml
agents:
  - config:
      persistence_store:
        type: postgres
        name: postgres_kv
        host: localhost
        port: 5432
        database: llamastack
        user: postgres
        password: postgres
    provider_id: meta-reference
    provider_type: inline::meta-reference
```

## Running PostgreSQL with pgvector

You’ll need a PostgreSQL instance with the **pgvector** extension enabled. You can use your own, or run a preconfigured container like this:

```bash
podman run --name customvector \
  -e POSTGRES_HOST_AUTH_METHOD=trust \
  -v $(pwd)/01-init-users.sql:/docker-entrypoint-initdb.d/01-init-users.sql:Z \
  -v $(pwd)/02-init-db.sql:/docker-entrypoint-initdb.d/02-init-db.sql:Z \
  -v $(pwd)/03-extensions.sql:/docker-entrypoint-initdb.d/03-extensions.sql:Z \
  -p 5432:5432 \
  -d pgvector/pgvector:pg17
```

## Register and Use the Vector Database in Your Code

You can now register the vector database using Llama Stack as a library client:

```python
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient

client = LlamaStackAsLibraryClient('./ollama-postgresql.yaml')
client.initialize()

client.vector_dbs.register(
    vector_db_id=vector_db_id,
    provider_id="pgvector",
    embedding_model=embedding_model,
)
```

## What Happens in PostgreSQL

After running the notebook or script, two new tables will be created in your PostgreSQL database:

- **`metadata_store`**: Contains metadata about registered vector databases (e.g., model, dimension, provider ID).
- **`vector_store_xxxx`**: Stores document chunks and embeddings. The suffix `xxxx` corresponds to the vector DB ID you registered. These embeddings are used by Llama Stack to retrieve the most relevant chunks during the RAG process.


In [None]:
import os
from llama_stack_client import Agent, AgentEventLogger
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
from llama_stack_client.types import Document
import uuid

model = "meta-llama/Llama-3.2-3B-Instruct"
os.environ["INFERENCE_MODEL"] = model
client = LlamaStackAsLibraryClient('./ollama-postgresql.yaml')
client.initialize()

# Create a vector database instance
embed_lm = next(m for m in client.models.list() if m.model_type == "embedding")
embedding_model = embed_lm.identifier
vector_db_id = f"v{uuid.uuid4().hex}"
client.vector_dbs.register(
    vector_db_id=vector_db_id,
    provider_id="pgvector",
    embedding_model=embedding_model,
)

# Create Documents
urls = [
    "memory_optimizations.rst",
    "chat.rst",
    "llama3.rst",
    "qat_finetune.rst",
    "lora_finetune.rst",
]
documents = [
    Document(
        document_id=f"num-{i}",
        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
        mime_type="text/plain",
        metadata={},
    )
    for i, url in enumerate(urls)
]

# Insert documents
client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
)

# Get the model being served
llm = next(m for m in client.models.list() if m.model_type == "llm")
model = llm.identifier

# Create the RAG agent
rag_agent = Agent(
    client,
    model=model,
    instructions="You are a helpful assistant. Use the RAG tool to answer questions as needed.",
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {"vector_db_ids": [vector_db_id]},
        }
    ],
)

session_id = rag_agent.create_session(session_name=f"s{uuid.uuid4().hex}")

turns = ["what is torchtune", "tell me about dora"]

for t in turns:
    print("user>", t)
    stream = rag_agent.create_turn(
        messages=[{"role": "user", "content": t}], session_id=session_id, stream=True
    )
    for event in AgentEventLogger().log(stream):
        event.print()