
## Updated full-stack template walkthrough

This lab now uses a Postgres-backed stack managed through Alembic migrations and Docker Compose.

- **Database + Alembic**: `backend/alembic/env.py` and `backend/alembic/versions/20250212_initial.py` create tables for echo retries, planner runs, course resources, and RAG document chunks. The new `20250529_agent_runs.py` migration adds the `agent_runs` table for persisting agent executions. Run migrations with `alembic upgrade head` (the backend container executes this automatically on start).
- **FastAPI wiring**: `backend/app/database.py` exposes a SQLAlchemy `SessionLocal` dependency. Routers like `app/routers/echo.py`, `app/routers/planner.py`, `app/routers/resources.py`, and `app/routers/agent.py` read and write real rows instead of in-memory mocks.
- **Docker Compose + Nginx**: `docker-compose.yml` now launches Postgres, FastAPI (with migrations), the Vite dev server, and an Nginx reverse proxy (`nginx/default.conf`) that fronts both the API (`/api`) and frontend assets.
- **RAG chatbot + Agent**: `app/services/rag.py` indexes seeded `DocumentChunk` rows, while `app/services/chatbot.py` blends retrieval with Gemini when `GEMINI_API_KEY` is configured. The release readiness agent also uses RAG to retrieve relevant documentation.
- **AI-powered Agent**: The release readiness agent now integrates Gemini for generating strategic insights and AI-powered recommendations, FAISS for RAG retrieval, and persists all runs to the database for auditing.

### How to run the stack with Docker Compose

1. `cd ai-web`
2. `docker compose up --build`
3. Open http://localhost:8080 to reach Nginx. API requests are proxied to FastAPI at `/api`. Postgres data lives in the `db_data` volume.

### Creating and applying new migrations

1. Enter the backend container: `docker compose exec backend bash`
2. Generate a migration: `alembic revision -m "describe change" --autogenerate`
3. Apply migrations: `alembic upgrade head`

### Testing the new features

- **Echo retry + persistence**: Submit the echo form; the "Recent echo attempts" list should update from the `echo_attempts` table.
- **Resource Board**: Add a URL in the Resource Board. Refreshing the page keeps entries thanks to the `resources` table.
- **Planner + history**: Generate a plan in the Planner panel; the newest plan appears in the history list powered by the `plan_runs` table.
- **Release Readiness Agent**: Select a feature, run the agent, and see AI-powered recommendations with Gemini insights, RAG-retrieved context, and historical runs from the `agent_runs` table.

Refer to the updated source files when walking through the lab so students can trace how migrations, database sessions, and the React UI connect end to end.



## Why Nginx, Alembic, and FAISS matter for this stack

**Nginx reverse proxy**
- *What it is*: A lightweight web server that terminates HTTP, proxies requests, and serves static assets.
- *How we integrated it*: `docker-compose.yml` launches an `nginx` service with `nginx/default.conf` routing `/api/` to FastAPI (port 8000) and everything else to the Vite dev server (port 5173). This keeps the release readiness endpoint (`/api/ai/release-readiness`) and the React UI on a single origin behind http://localhost:8080.
- *Why we adopted it*: A unified front door eliminates CORS headaches and mirrors production topologies where a reverse proxy fronts both the SPA and API.
- *If we removed it*: Students would juggle multiple ports, browser `fetch` calls would hit CORS errors, and any deployed version would need a different networking story than the local lab.

**Alembic migrations**
- *What it is*: A schema migration tool for SQLAlchemy projects so database changes are reproducible and versioned.
- *How we integrated it*: `backend/alembic/env.py` reads `DATABASE_URL`, and migrations create tables for echo retries, plan runs, resources, RAG document chunks, and now agent runs. The backend container runs `python -m alembic upgrade head` before starting Uvicorn, so every boot applies the latest schema and seeds data.
- *Why we adopted it*: Planner history, the resources board, RAG context for the chatbot, and agent execution history all live in Postgres. Alembic keeps the schema in sync across teammates so future agent runs can safely persist outputs or read existing context without surprises.
- *If we removed it*: Each developer would run ad hoc SQL by hand, migrations would get lost, and any endpoint touching `plan_runs`, `resources`, `document_chunks`, or `agent_runs` would fail at runtime.

**FAISS retrieval**
- *What it is*: A fast similarity search library for vector embeddings, used here for lightweight RAG.
- *How we integrated it*: `app/services/rag.py` normalizes document embeddings and loads them into a FAISS `IndexFlatL2`; both the chatbot service and the release readiness agent call `build_retriever()` to fetch the top context chunks before generating an answer or recommendations.
- *Why we adopted it*: It gives agents a deterministic context window without external APIs, so answers stay grounded in the course docs even offline.
- *If we removed it*: The chatbot and agent would fall back to generic responses with no retrieved evidence, reducing accuracy and making the lab less realistic.

### Agent foundation at a glance
- **Tools**: `app/services/agent_tools.py` defines deterministic product briefs, launch windows, support contacts, and SLO watch items so the agent has stable inputs.
- **RAG Integration**: The agent retrieves relevant documentation using FAISS before generating recommendations.
- **Gemini AI**: When `GEMINI_API_KEY` is configured, the agent uses Gemini to generate strategic insights and AI-powered recommendations.
- **Database Persistence**: All agent runs are saved to the `agent_runs` table for auditing and learning.
- **Orchestration**: `run_release_readiness_agent()` in `app/services/agent.py` sequences tool calls, retrieves RAG context, calls Gemini, builds a plan, persists the run, and returns structured recommendations with tool traces.
- **API surface**: `app/routers/agent.py` exposes `/ai/release-readiness`, `/ai/history`, and `/ai/features`, keeping the router thin and delegating all logic to the service layer.


## Deep Dive: FAISS, Embeddings, and RAG Integration

### Understanding Embeddings

**What are embeddings?**
Embeddings are numerical representations of text (or other data) that capture semantic meaning. Think of them as coordinates in a high-dimensional space where semantically similar texts are positioned close together.

**Why do we need embeddings?**
- **Semantic search**: Find documents by meaning, not just keyword matching
- **Similarity comparison**: Determine how similar two pieces of text are
- **Efficient retrieval**: Search through millions of documents quickly

**How embeddings work:**
1. **Text → Vector**: Convert text into a fixed-size numerical vector (e.g., 256 or 768 dimensions)
2. **Semantic proximity**: Similar meanings produce similar vectors
3. **Distance metrics**: Use cosine similarity or L2 distance to find similar vectors

**Example:**
```python
# Simplified conceptual example
embed("FastAPI tutorial") → [0.2, 0.8, ..., 0.5]  # 256 numbers
embed("FastAPI guide")    → [0.3, 0.7, ..., 0.4]  # Very similar!
embed("Pizza recipe")     → [0.9, 0.1, ..., 0.2]  # Very different
```

### What is FAISS?

**FAISS (Facebook AI Similarity Search)** is a library developed by Meta for efficient similarity search and clustering of dense vectors.

**Key features:**
- **Speed**: Search billions of vectors in milliseconds
- **Scalability**: Works on CPU and GPU
- **Flexibility**: Multiple index types for different use cases
- **Memory efficiency**: Optimized data structures

**Common FAISS index types:**
1. **IndexFlatL2**: Exact search using L2 (Euclidean) distance
   - Most accurate, good for small datasets
   - Our implementation uses this for simplicity
2. **IndexFlatIP**: Exact search using inner product (cosine similarity)
   - Good when vectors are normalized
3. **IndexIVFFlat**: Approximate search with inverted file index
   - Faster for large datasets, slight accuracy tradeoff

### How RAG (Retrieval-Augmented Generation) Works

**RAG combines retrieval and generation:**

```
┌─────────────────────────────────────────────────────┐
│  User Query: "How do I deploy the agent?"           │
└───────────────────┬─────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────────────┐
│  Step 1: Embed Query                                │
│  query_vector = embed("How do I deploy the agent?") │
└───────────────────┬─────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────────────┐
│  Step 2: FAISS Search                               │
│  Find top 3 most similar document chunks            │
│  Results: [doc1 (score: 0.23), doc2 (0.45), ...]   │
└───────────────────┬─────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────────────┐
│  Step 3: Build Context                              │
│  Combine retrieved chunks into a context string     │
└───────────────────┬─────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────────────┐
│  Step 4: Generate with AI                           │
│  Pass context + query to Gemini for answer         │
│  Result: Grounded, accurate response                │
└─────────────────────────────────────────────────────┘
```

**Why RAG is powerful:**
- **Grounding**: AI responses are backed by actual documents
- **Up-to-date**: Add new documents without retraining
- **Transparency**: Show which documents were used
- **Reduced hallucination**: AI has factual context to work with

### Our RAG Implementation in `app/services/rag.py`

Let's break down how our implementation works:

#### 1. Creating Embeddings (Deterministic)

```python
def embed_text(text: str) -> np.ndarray:
    """Create a deterministic hashed embedding without external dependencies."""
    vector = np.zeros(EMBED_DIM, dtype="float32")  # 256 dimensions
    for token in _tokenize(text):  # Split text into words
        vector[hash(token) % EMBED_DIM] += 1.0  # Hash each word to a dimension
    norm = np.linalg.norm(vector)
    if norm:
        vector /= norm  # Normalize to unit length
    return vector
```

**What's happening:**
- Uses a **bag-of-words** approach with hashing
- Each word maps to a dimension via hash function
- Normalizes the vector so all vectors have length 1
- **Why deterministic?** Same text always produces same embedding (no API calls, offline-friendly)
- **Production alternative:** Use real embedding models like `text-embedding-004` from Gemini or OpenAI's embeddings

#### 2. Building the FAISS Index

```python
class Retriever:
    def __init__(self, chunks: Sequence[DocumentChunk]):
        self.chunks = list(chunks)
        if not self.chunks:
            self.index = None
            return

        # Create FAISS index for L2 (Euclidean) distance
        self.index = faiss.IndexFlatL2(EMBED_DIM)  # 256 dimensions
        
        # Stack all embeddings into a numpy array
        embeddings = np.stack([
            np.array(chunk.embedding, dtype="float32") 
            for chunk in self.chunks
        ])
        
        # Add to index (builds internal data structures)
        self.index.add(embeddings)
```

**What's happening:**
- Creates a FAISS index that uses **L2 distance** (Euclidean distance)
- Loads all pre-computed embeddings from database
- Adds them to FAISS index for fast searching

#### 3. Searching for Relevant Context

```python
def search(self, query: str, k: int = 3) -> list[RetrievedContext]:
    if not self.index or not self.chunks:
        return []

    # Embed the query
    query_vector = np.expand_dims(embed_text(query), axis=0)
    
    # Search FAISS index for k nearest neighbors
    distances, indices = self.index.search(query_vector, min(k, len(self.chunks)))
    
    # Build results with content, source, and score
    results: list[RetrievedContext] = []
    for score, idx in zip(distances[0], indices[0]):
        if idx == -1:
            continue
        chunk = self.chunks[idx]
        results.append(
            RetrievedContext(
                content=chunk.content,
                source=chunk.source,
                score=float(score)  # Lower is better for L2 distance
            )
        )
    return results
```

**What's happening:**
1. **Embed the query** using the same function as documents
2. **Search FAISS** for the k closest vectors
3. **Return results** with the actual content, source, and similarity score
4. **Lower score = more similar** (L2 distance)

#### 4. Integration in the Agent

The agent uses RAG in `app/services/agent.py`:

```python
# Build retriever from database chunks
retriever = build_retriever(db)

# Create search query combining feature name and context
search_query = f"{brief.name} {context.audience_role} release launch"

# Retrieve top 3 relevant chunks
rag_contexts = retriever.search(search_query, k=3)

# Pass to Gemini for AI-powered insights
gemini_insight, ai_recommendations = _generate_gemini_insight(
    brief, launch_window, slo_items, rag_contexts, context
)
```

**The flow:**
1. **Load** all document chunks from PostgreSQL
2. **Build** FAISS index from embeddings
3. **Search** for relevant context based on feature + audience
4. **Pass** retrieved context to Gemini along with other data
5. **Generate** AI-powered recommendations grounded in documentation

### Why This Matters for Production

**Without RAG:**
```
User: "How do I deploy the agent?"
AI: "You can deploy using Docker or Kubernetes..."  # Generic answer, might be wrong
```

**With RAG:**
```
User: "How do I deploy the agent?"
System: [Retrieves actual deployment docs]
AI: "According to the deployment guide, run `docker compose up --build` 
     in the ai-web directory. The agent will be available at localhost:8080."
     # Accurate, grounded in actual docs
```

**Key benefits:**
- **Accuracy**: Answers come from verified documentation
- **Auditability**: Can show which docs were used
- **Maintainability**: Update docs without retraining AI
- **Cost-effective**: No need to fine-tune models

### Performance Considerations

**Current implementation:**
- **Index type**: IndexFlatL2 (exact search)
- **Embedding method**: Deterministic hashing
- **Dataset size**: Dozens of chunks (course docs)
- **Speed**: Milliseconds per search

**For production scale:**
- **Larger datasets** (millions of chunks): Use `IndexIVFFlat` or `IndexHNSW`
- **Better embeddings**: Switch to `text-embedding-004` from Gemini
- **Caching**: Cache frequent queries to avoid repeated searches
- **Reranking**: Use a second model to rerank retrieved results


# Lab 05 · AI-Powered Release Readiness Agent

*This lab notebook provides guided steps. All commands are intended for local execution.*

## Objectives
- Build a production-grade release readiness agent that integrates with Gemini AI, FAISS RAG, and database persistence.
- Provide credible tool abstractions for product briefs, launch windows, and stakeholder contacts.
- Expose `/ai/release-readiness`, `/ai/history`, and `/ai/features` endpoints that return structured recommendations.
- Persist agent runs to the database for auditing and learning from past executions.

In this lab, you will build a **production-grade AI-powered agent** that demonstrates:

1. **Model an agent workflow**: Learn how agents orchestrate multiple tools, RAG retrieval, and AI calls to accomplish complex tasks
2. **Build tool abstractions**: Create reusable functions that agents can call to gather information
3. **Integrate AI capabilities**: Use Gemini to generate intelligent insights and recommendations
4. **Implement RAG**: Retrieve relevant documentation using FAISS to ground agent responses
5. **Persist agent runs**: Store execution history in PostgreSQL for auditing and learning
6. **Structure agent responses**: Return well-formatted JSON with AI insights, recommendations, and tool traces


## What will be learned
- Integrating Gemini AI for generating strategic insights and recommendations.
- Using FAISS-backed RAG to retrieve relevant documentation.
- Persisting agent runs to PostgreSQL for auditing and compliance.
- Coordinating multiple service helpers inside an agent workflow.
- Returning Pydantic models that the frontend can render without extra parsing.
- Logging tool invocations to aid observability and debugging.

By the end of this lab, you will understand:

### Core Concepts
- **Agent Architecture**: How agents break down complex tasks into smaller tool calls, RAG retrieval, and AI generation
- **Tool Design**: Creating focused functions that do one thing well
- **RAG Integration**: Retrieving relevant context to ground AI responses
- **AI Integration**: Using Gemini to generate intelligent insights
- **Database Persistence**: Storing agent runs for auditing and learning
- **Service Layer Pattern**: Separating business logic from API routing

### Technical Skills
- **Coordinating service helpers**: Calling multiple functions within an agent workflow
- **Pydantic models**: Defining schemas that validate data and generate documentation
- **Error handling**: Gracefully managing missing data or invalid inputs
- **Logging tool invocations**: Tracking which tools were called and what they returned
- **Priority-based recommendations**: Categorizing actions by urgency (high/medium/low)


## Prerequisites & install
Reuse the virtual environment created earlier in the course. The required dependencies are already in `requirements.txt`.

```bash
cd ai-web/backend
. .venv/bin/activate
pip install -r requirements.txt
```

### Environment Setup
Reuse the virtual environment created earlier in the course. This lab builds on the FastAPI foundation from previous labs.

### Required Dependencies
The following are needed (already in requirements.txt):
- `pydantic` - Data validation and type hints
- `google-generativeai` - Gemini API client
- `faiss-cpu` - FAISS vector similarity search
- `sqlalchemy` - Database ORM
- `alembic` - Database migrations

### Gemini API Key (Optional but Recommended)
To enable AI-powered insights, add your Gemini API key to `backend/.env`:
```
GEMINI_API_KEY=your-api-key-here
```
The agent works without it but will only return deterministic recommendations.

### Verification
Ensure your backend is running and accessible:
```bash
curl http://localhost:8000/health
```


## Step-by-step tasks
Build out the tools, service, and router layers needed to run the AI-powered release readiness agent.

### Architecture Overview

Before diving into code, let's understand the enhanced architecture:

```
┌─────────────────────────────────────────┐
│  FastAPI Router (agent.py)             │  ← HTTP endpoints
│  - /ai/release-readiness               │
│  - /ai/history                          │
│  - /ai/features                         │
└─────────────────┬───────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────┐
│  Agent Service (agent.py)               │  ← Business logic
│  - Orchestrates tool calls              │
│  - Retrieves RAG context (FAISS)        │
│  - Calls Gemini for AI insights         │
│  - Persists runs to database            │
│  - Builds recommendations               │
└─────────────────┬───────────────────────┘
                  │
        ┌─────────┼─────────┐
        ▼         ▼         ▼
┌───────────┐ ┌───────┐ ┌─────────┐
│  Tools    │ │ RAG   │ │ Gemini  │
│  (agent_  │ │(rag.py)│ │  API   │
│  tools.py)│ │       │ │        │
└───────────┘ └───────┘ └─────────┘
```

This separation of concerns makes the system:
- **Testable**: Each layer can be tested independently
- **Maintainable**: Changes to one layer don't ripple through the entire system
- **Reusable**: Tools can be used by multiple agents
- **Auditable**: All runs are persisted for review

Let's build each layer from the bottom up.


### Step 1: Release data tools
The tools in `app/services/agent_tools.py` provide deterministic product data so the agent can make realistic decisions.

#### What Are Tools?
Tools are **focused functions** that agents call to gather information or perform actions. Each tool should:
- Do **one thing well**
- Have **clear inputs and outputs**
- Be **stateless** (no hidden dependencies)
- Return **structured data** (Pydantic models, not raw dicts)

#### The Four Core Tools

1. **`fetch_feature_brief(feature_slug)`**: Returns product information
2. **`fetch_launch_window(feature_slug)`**: Returns deployment timing
3. **`fetch_support_contacts(audience_role)`**: Returns stakeholders to notify
4. **`list_slo_watch_items(feature_slug)`**: Returns reliability concerns

Review the existing implementation in `app/services/agent_tools.py`.


In [None]:
# Review the existing tools
from app.services.agent_tools import (
    fetch_feature_brief,
    fetch_launch_window,
    fetch_support_contacts,
    list_slo_watch_items,
)

# Test each tool
print("Feature Brief:")
print(fetch_feature_brief("curriculum-pathways").model_dump())
print("\nLaunch Window:")
print(fetch_launch_window("curriculum-pathways").model_dump())
print("\nSupport Contacts:")
print([c.model_dump() for c in fetch_support_contacts("Instructor")])
print("\nSLO Watch Items:")
print(list_slo_watch_items("curriculum-pathways"))

### Step 2: AI-Powered Agent Service
The agent service in `app/services/agent.py` orchestrates tools, RAG, Gemini, and database persistence.

#### Key Components

**1. Tool Orchestration**
The agent calls tools to gather context:
```python
brief = fetch_feature_brief(context.feature_slug)
launch_window = fetch_launch_window(context.feature_slug)
contacts = fetch_support_contacts(context.audience_role)
slo_watch_items = list_slo_watch_items(context.feature_slug)
```

**2. RAG Retrieval (FAISS)**
The agent retrieves relevant documentation:
```python
retriever = build_retriever(db)
rag_contexts = retriever.search(search_query, k=3)
```

**3. Gemini AI Insights**
When configured, Gemini generates strategic insights:
```python
gemini_insight, ai_recommendations = _generate_gemini_insight(
    brief, launch_window, slo_items, rag_contexts, context
)
```

**4. Database Persistence**
All runs are saved for auditing:
```python
agent_run = AgentRun(
    feature_slug=context.feature_slug,
    summary=summary,
    gemini_insight=gemini_insight,
    recommended_actions=[...],
    tool_calls=[...],
    rag_contexts=[...],
    used_gemini=used_gemini,
)
db.add(agent_run)
db.commit()
```

#### Enhanced Output Structure

```python
class AgentRunResult(BaseModel):
    summary: str
    gemini_insight: str | None = None  # AI-generated insight
    recommended_actions: list[AgentRecommendation]  # With priority levels
    plan: Plan
    tool_calls: list[AgentToolCall]  # Includes RAG and Gemini calls
    rag_contexts: list[RAGContext] = []  # Retrieved documents
    used_gemini: bool = False  # Whether Gemini was used
```

#### Recommendation Priorities

Recommendations now have priority levels:
```python
class AgentRecommendation(BaseModel):
    title: str
    detail: str
    priority: Literal["high", "medium", "low"] = "medium"
```


In [None]:
# Review the agent service structure
from app.services.agent import (
    AgentRunContext,
    AgentRunResult,
    AgentRecommendation,
    AgentToolCall,
    RAGContext,
)

# Show the enhanced models
print("AgentRunResult fields:")
for name, field in AgentRunResult.model_fields.items():
    print(f"  - {name}: {field.annotation}")

print("\nAgentRecommendation fields:")
for name, field in AgentRecommendation.model_fields.items():
    print(f"  - {name}: {field.annotation}")

### Step 3: API Router with Multiple Endpoints
The router in `app/routers/agent.py` exposes three endpoints.

#### Endpoint 1: `/ai/release-readiness` (POST)
Runs the agent and returns structured recommendations:
```python
@router.post("/release-readiness", response_model=AgentRunResult)
def release_readiness(
    payload: AgentRunContext,
    db: Session = Depends(get_db),
) -> AgentRunResult:
    return run_release_readiness_agent(payload, db=db)
```

#### Endpoint 2: `/ai/history` (GET)
Returns historical agent runs for auditing:
```python
@router.get("/history", response_model=AgentHistoryResponse)
def agent_history(
    feature_slug: str | None = Query(None),
    limit: int = Query(10, ge=1, le=50),
    db: Session = Depends(get_db),
) -> AgentHistoryResponse:
    runs = get_agent_history(db, feature_slug=feature_slug, limit=limit)
    return AgentHistoryResponse(runs=items, total=len(items))
```

#### Endpoint 3: `/ai/features` (GET)
Lists available features for the agent:
```python
@router.get("/features")
def list_available_features() -> dict[str, Any]:
    # Returns feature metadata for frontend dropdowns
```


In [None]:
# The router is already configured. Verify the endpoints exist:
from app.routers.agent import router

print("Agent router endpoints:")
for route in router.routes:
    if hasattr(route, 'methods'):
        print(f"  {list(route.methods)[0]:6} {route.path}")

### Step 4: Database Model for Agent Runs
The `AgentRun` model in `app/models.py` persists agent executions.

```python
class AgentRun(Base):
    __tablename__ = "agent_runs"

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    feature_slug: Mapped[str] = mapped_column(String(120), index=True)
    audience_role: Mapped[str] = mapped_column(String(120))
    audience_experience: Mapped[str] = mapped_column(String(32))
    summary: Mapped[str] = mapped_column(Text)
    gemini_insight: Mapped[str | None] = mapped_column(Text, nullable=True)
    recommended_actions: Mapped[Any] = mapped_column(JSONB)
    tool_calls: Mapped[Any] = mapped_column(JSONB)
    rag_contexts: Mapped[Any] = mapped_column(JSONB, default=list)
    used_gemini: Mapped[bool] = mapped_column(Boolean, default=False)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=utcnow)
```

The migration `alembic/versions/20250529_agent_runs.py` creates this table and seeds feature-related document chunks for RAG.


In [None]:
# Review the AgentRun model
from app.models import AgentRun

print("AgentRun table columns:")
for column in AgentRun.__table__.columns:
    print(f"  - {column.name}: {column.type}")

### Step 5: Execute the Agent
Call the agent to see the full output with AI insights, RAG contexts, and tool traces.

#### What to Look For

When you call `run_release_readiness_agent()`, inspect:

1. **Summary**: Clear and actionable overview
2. **Gemini Insight**: AI-generated strategic assessment (when API key configured)
3. **Recommended actions**: Priority-coded (high/medium/low)
4. **Tool calls**: Including RAG retrieval and Gemini insight generation
5. **RAG contexts**: Retrieved documents from FAISS
6. **Plan**: Nested planner output

#### Expected Output Structure

```json
{
  "summary": "Curriculum Pathways targets Instructor personas...",
  "gemini_insight": "Strategic launch assessment: The feature...",
  "recommended_actions": [
    {
      "title": "Confirm launch communications",
      "detail": "Share the feature brief with...",
      "priority": "high"
    },
    {
      "title": "[AI] Review documentation coverage",
      "detail": "Gemini recommends...",
      "priority": "medium"
    }
  ],
  "tool_calls": [
    {"tool": "fetch_feature_brief", ...},
    {"tool": "rag_retrieval", ...},
    {"tool": "gemini_insight_generation", ...}
  ],
  "rag_contexts": [
    {"content": "...", "source": "docs/agent", "score": 0.23}
  ],
  "used_gemini": true,
  "plan": {...}
}
```


In [None]:
from datetime import date
from app.services.agent import AgentRunContext, run_release_readiness_agent
from app.database import SessionLocal

# Create a database session (in Docker, this connects to PostgreSQL)
db = SessionLocal()

try:
    context = AgentRunContext(
        feature_slug='curriculum-pathways',
        launch_date=date(2025, 3, 10),
        audience_role='Instructor',
        audience_experience='intermediate',
    )

    result = run_release_readiness_agent(context, db=db)
    
    # Display key results
    print("=== Summary ===")
    print(result.summary)
    
    print("\n=== Gemini Insight ===")
    print(result.gemini_insight or "(Gemini not configured)")
    
    print("\n=== Recommendations ===")
    for rec in result.recommended_actions:
        print(f"  [{rec.priority.upper()}] {rec.title}")
    
    print("\n=== Tool Calls ===")
    for tc in result.tool_calls:
        print(f"  - {tc.tool}: {tc.output_preview[:50]}...")
    
    print(f"\n=== Used Gemini: {result.used_gemini} ===")
finally:
    db.close()

## Validation / acceptance checks

### What to Verify

After implementing all steps, test that:

1. **The endpoint responds**: `curl` succeeds with 200 OK
2. **The response is structured**: JSON contains `summary`, `gemini_insight`, `recommended_actions`, `plan`, `tool_calls`, `rag_contexts`, `used_gemini`
3. **History endpoint works**: `GET /ai/history` returns past runs
4. **Features endpoint works**: `GET /ai/features` lists available features
5. **The documentation works**: FastAPI docs (`/docs`) show all endpoints
6. **Error handling works**: Invalid input returns helpful error messages

### Manual Testing

```bash
# Run the agent
curl -X POST http://localhost:8000/ai/release-readiness \
  -H 'Content-Type: application/json' \
  -d '{"feature_slug":"curriculum-pathways","launch_date":"2025-03-10","audience_role":"Instructor","audience_experience":"intermediate"}'

# Get agent history
curl http://localhost:8000/ai/history

# Get available features
curl http://localhost:8000/ai/features
```

### Expected Success Criteria

- ✅ The response includes a summary, recommendations with priorities, and tool call traces
- ✅ When Gemini is configured, `gemini_insight` contains AI-generated text and `used_gemini` is true
- ✅ The `rag_contexts` array contains retrieved documents with scores
- ✅ The history endpoint returns previously executed agent runs
- ✅ The FastAPI interactive docs (`/docs`) display all three endpoints under the **ai** tag
- ✅ React development mode renders the structured agent output with AI insights and RAG contexts


## Looking Ahead: LangChain and LangSmith for Future ClassesNow that you understand our agent architecture with custom tools, FAISS RAG, and Gemini integration, let's explore how industry-standard frameworks like **LangChain** and **LangSmith** could enhance and streamline this workflow in future iterations of this course.### What is LangChain?**LangChain** is an open-source framework for building applications powered by Large Language Models (LLMs). It provides abstractions and tools for:- **Chains**: Sequential operations that process inputs through multiple steps- **Agents**: Autonomous systems that decide which tools to use- **Tools**: Reusable functions that agents can call- **Memory**: Persistent context across multiple interactions- **Retrievers**: Built-in RAG and vector database integrations**Official website**: https://www.langchain.com/  **Documentation**: https://python.langchain.com/docs/### Why LangChain Would Be Useful for Our Agent#### 1. Built-in RAG Components**What we did manually:**```python# Our custom implementationretriever = build_retriever(db)rag_contexts = retriever.search(search_query, k=3)```**With LangChain:**```pythonfrom langchain.vectorstores import FAISSfrom langchain.embeddings import GoogleGenerativeAIEmbeddings# More powerful, standardizedembeddings = GoogleGenerativeAIEmbeddings(model="text-embedding-004")vectorstore = FAISS.from_documents(documents, embeddings)retriever = vectorstore.as_retriever(search_kwargs={"k": 3})```**Benefits:**- **Pre-built integrations** with 50+ embedding models and vector stores- **Standardized interfaces** that work across different providers- **Advanced features** like MMR (Maximal Marginal Relevance) for diversity- **Easy switching** between FAISS, Pinecone, Weaviate, Chroma, etc.#### 2. Agent Frameworks**What we did manually:**```python# Our custom agent orchestrationdef run_release_readiness_agent(context, db):    brief = fetch_feature_brief(context.feature_slug)    launch_window = fetch_launch_window(context.feature_slug)    rag_contexts = retriever.search(...)    gemini_insight = _generate_gemini_insight(...)    # ... manual orchestration```**With LangChain:**```pythonfrom langchain.agents import create_openai_functions_agent, AgentExecutorfrom langchain.tools import tool@tooldef fetch_feature_brief(feature_slug: str) -> dict:    """Fetch feature brief for a given feature slug."""    return {...}@tool  def retrieve_docs(query: str) -> list[str]:    """Retrieve relevant documentation using RAG."""    return vectorstore.similarity_search(query, k=3)# Agent decides which tools to use and whenagent = create_openai_functions_agent(llm, tools=[fetch_feature_brief, retrieve_docs])executor = AgentExecutor(agent=agent, tools=tools)result = executor.invoke({"input": "Analyze release readiness for curriculum-pathways"})```**Benefits:**- **Autonomous decision-making**: Agent chooses which tools to call- **Automatic retries**: Built-in error handling and retry logic- **Streaming**: Stream responses in real-time- **Multiple agent types**: ReAct, OpenAI Functions, Structured Chat, etc.#### 3. Prompt Templates and Chains**What we did manually:**```pythonprompt = f"""You are a release readiness advisor...Context:{full_context}User's Launch Date: {context.launch_date}..."""response = model.generate_content(prompt)```**With LangChain:**```pythonfrom langchain.prompts import ChatPromptTemplatefrom langchain.chains import LLMChainprompt = ChatPromptTemplate.from_messages([    ("system", "You are a release readiness advisor helping teams prepare for launches."),    ("human", "Based on this context: {context}\n\nProvide analysis for launch on {launch_date}")])chain = LLMChain(llm=llm, prompt=prompt)result = chain.invoke({"context": full_context, "launch_date": context.launch_date})```**Benefits:**- **Reusable templates**: Define prompts once, use everywhere- **Validation**: Type-checked inputs and outputs- **Composition**: Chain multiple LLM calls together- **Few-shot examples**: Easily add example inputs/outputs#### 4. Memory Systems**What we currently have:**```python# Static: Each agent run is independentagent_run = AgentRun(...)db.add(agent_run)```**With LangChain:**```pythonfrom langchain.memory import ConversationBufferMemory# Dynamic: Agent remembers previous interactionsmemory = ConversationBufferMemory()chain = ConversationChain(llm=llm, memory=memory)# First callchain.invoke("What's the launch date for curriculum-pathways?")# Second call remembers contextchain.invoke("What are the risks?")  # Knows we're still talking about curriculum-pathways```**Benefits:**- **Conversation tracking**: Multi-turn dialogues- **Context window management**: Automatic summarization- **Multiple memory types**: Buffer, summary, entity, vector store-backed### What is LangSmith?**LangSmith** is a platform for **debugging, testing, evaluating, and monitoring** LLM applications. Think of it as the observability layer for AI agents.**Official website**: https://www.langchain.com/langsmith  **Documentation**: https://docs.smith.langchain.com/### Why LangSmith Would Be Critical for Production#### 1. Observability and Debugging**What we currently have:**```python# Limited: Just tool call traces in responsetool_calls = [    AgentToolCall(tool="fetch_feature_brief", ...),    AgentToolCall(tool="rag_retrieval", ...),]```**With LangSmith:**- **Full trace visualization**: See every step of agent execution- **Input/output inspection**: Examine prompts and responses- **Latency breakdown**: Identify slow components- **Token usage tracking**: Monitor costs per run- **Error tracking**: Automatic error capture with context**Example trace:**```Agent Run (2.3s, $0.02)├─ fetch_feature_brief (0.1s)│  └─ Input: {"feature_slug": "curriculum-pathways"}│  └─ Output: {"name": "Curriculum Pathways", ...}├─ RAG retrieval (0.4s)│  ├─ Embedding (0.1s, 5 tokens)│  └─ FAISS search (0.3s)│  └─ Output: [3 documents]├─ Gemini insight (1.8s, $0.02)│  ├─ Prompt (234 tokens)│  ├─ Response (156 tokens)│  └─ Output: "Strategic launch assessment..."└─ Result assembled (0.0s)```#### 2. Dataset Management and Testing**What we currently lack:**```python# Manual testing, no test suite# Have to run agent manually to verify behavior```**With LangSmith:**```python# Create datasets for evaluationdataset = client.create_dataset("release_readiness_tests")# Add test casesclient.create_examples(    dataset_id=dataset.id,    inputs=[        {"feature_slug": "curriculum-pathways", "launch_date": "2025-03-01"},        {"feature_slug": "ai-code-review", "launch_date": "2025-04-15"},    ],    outputs=[        {"expected_recommendations": ["Confirm launch communications", ...]},        {"expected_recommendations": ["Validate operational readiness", ...]},    ])# Run evaluationresults = client.run_on_dataset(    dataset_name="release_readiness_tests",    llm_or_chain=agent_chain,    evaluation=custom_evaluator)```**Benefits:**- **Regression testing**: Ensure changes don't break existing behavior- **A/B testing**: Compare different prompts or models- **Ground truth comparison**: Measure accuracy against expected outputs#### 3. Prompt Iteration and Optimization**What we currently do:**```python# Manual: Edit prompt string, redeploy, test manuallyprompt = f"You are a release readiness advisor..."```**With LangSmith:**- **Prompt playground**: Test prompts interactively- **Version control**: Track prompt changes over time- **Comparison view**: See outputs side-by-side- **Automatic optimization**: Find best-performing prompt variants#### 4. Production Monitoring**What we currently have:**```python# Basic: Save runs to databaseagent_run = AgentRun(...)db.add(agent_run)```**With LangSmith:**- **Real-time dashboards**: Monitor agent performance live- **Alerting**: Get notified of errors or anomalies- **Cost tracking**: See token usage and costs per endpoint- **User feedback**: Collect thumbs up/down from users- **Automated analysis**: Identify patterns in failures### Migration Path: From Our Implementation to LangChainIf we were to adopt LangChain in future classes, here's how we'd migrate:#### Phase 1: RAG Layer (Easiest)```python# Replace app/services/rag.py with LangChain vectorstorefrom langchain.vectorstores import FAISSfrom langchain.embeddings import GoogleGenerativeAIEmbeddingsvectorstore = FAISS.from_documents(chunks, GoogleGenerativeAIEmbeddings())```#### Phase 2: Tool Definitions```python# Convert app/services/agent_tools.py to LangChain toolsfrom langchain.tools import tool@tooldef fetch_feature_brief(feature_slug: str) -> FeatureBrief:    """Fetch feature brief for release planning."""    return FeatureBrief(...)```#### Phase 3: Agent Logic```python# Replace app/services/agent.py with LangChain agentfrom langchain.agents import create_openai_functions_agentagent = create_openai_functions_agent(    llm=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),    tools=[fetch_feature_brief, fetch_launch_window, retrieve_docs],    prompt=prompt_template)```#### Phase 4: Add LangSmith```pythonimport osos.environ["LANGCHAIN_TRACING_V2"] = "true"os.environ["LANGCHAIN_API_KEY"] = "your-key"# That's it! All runs now traced automatically```### When to Use LangChain vs Custom Implementation**Use custom implementation (like ours) when:**- ✅ Learning fundamentals (educational context)- ✅ Full control over every detail- ✅ Minimal dependencies- ✅ Simple, deterministic workflows- ✅ Offline operation required**Use LangChain when:**- ✅ Building production systems- ✅ Need to iterate quickly on prompts- ✅ Want built-in observability- ✅ Need to support multiple LLM providers- ✅ Complex agent orchestration- ✅ Team collaboration on prompts### Hands-on Exercise for Future Classes**Goal**: Convert one part of our agent to use LangChain1. **Install LangChain**: `pip install langchain langchain-google-genai`2. **Replace RAG**: Use `langchain.vectorstores.FAISS` instead of our custom `rag.py`3. **Add tracing**: Enable LangSmith to visualize agent execution4. **Compare**: Measure performance, code complexity, and developer experience### Resources for Deeper Learning**LangChain:**- 📘 **Official Docs**: https://python.langchain.com/docs/- 🎓 **Tutorials**: https://python.langchain.com/docs/tutorials/- 📦 **Agent Templates**: https://python.langchain.com/docs/modules/agents/- 💬 **Community**: Discord and GitHub discussions**LangSmith:**- 📘 **Docs**: https://docs.smith.langchain.com/- 🎥 **Demo Videos**: https://www.langchain.com/langsmith- 📊 **Evaluation Guide**: https://docs.smith.langchain.com/evaluation- 🔍 **Tracing Guide**: https://docs.smith.langchain.com/tracing### Key Takeaways1. **Our implementation taught you the fundamentals** - You now understand:   - How embeddings work   - How FAISS indexes documents   - How RAG retrieves context   - How agents orchestrate tools   - How to structure AI applications2. **LangChain provides production-ready abstractions** - It handles:   - Provider integrations (Gemini, OpenAI, Anthropic, etc.)   - Prompt management and versioning   - Agent frameworks and tooling   - Error handling and retries3. **LangSmith enables production monitoring** - Essential for:   - Debugging complex agent behavior   - Evaluating prompt changes   - Tracking costs and latency   - Collecting user feedback4. **The choice depends on your context**:   - **Learning/Teaching**: Custom implementation (better understanding)   - **Production**: LangChain + LangSmith (faster iteration, better tooling)   - **Hybrid**: Start custom, migrate to LangChain as complexity growsBy understanding both approaches, you're equipped to:- Build AI agents from first principles- Adopt industry-standard frameworks when appropriate- Make informed architectural decisions- Debug and optimize agent behavior- Scale from prototype to production

## Homework / extensions

The current implementation already includes what were previously extension challenges:
- ✅ **Gemini AI integration** for generating strategic insights
- ✅ **FAISS RAG** for retrieving relevant documentation
- ✅ **Database persistence** for auditing agent runs
- ✅ **Priority levels** for recommendations
- ✅ **Frontend integration** with the AgentPanel component

### New Extension Ideas

1. **Add more sophisticated prompting**
   - Implement few-shot examples in Gemini prompts
   - Add chain-of-thought reasoning
   - *Teaching moment*: Prompt engineering for production systems

2. **Implement agent memory**
   - Use past runs to inform new recommendations
   - Track feature launch patterns over time
   - *Teaching moment*: Stateful agents and learning from history

3. **Add real-time streaming**
   - Stream Gemini responses to the frontend
   - Show progressive tool call results
   - *Teaching moment*: Server-sent events and async patterns

4. **Create an agent evaluation framework**
   - Define test cases with expected recommendations
   - Measure recommendation quality over time
   - *Teaching moment*: Evaluating AI systems

5. **Add multi-agent orchestration**
   - Create specialized sub-agents for different concerns
   - Implement agent-to-agent communication
   - *Teaching moment*: Multi-agent architectures

### Advanced Challenges

- **Implement semantic caching** to avoid repeated RAG queries
- **Add confidence scores** to recommendations
- **Create a feedback loop** where users rate recommendations
- **Implement A/B testing** for different prompting strategies
