# Day 1 - Lab 2: Generating a Product Requirements Document (PRD) (Solution)

**Objective:** Use the structured `day1_user_stories.json` artifact from the previous lab to generate a formal, comprehensive Product Requirements Document (PRD) in markdown format.

**Introduction:**
This solution notebook demonstrates how to synthesize detailed, low-level requirements (user stories) into a high-level planning document (the PRD). It also introduces the advanced concept of using code (Pydantic models) to define and validate the structure of documentation.

For definitions of key terms used in this lab, please refer to the [GLOSSARY.md](../../GLOSSARY.md).

## Step 1: Setup

**Explanation:**
We begin by setting up our environment and loading the key artifact from Lab 1: `day1_user_stories.json`. The `load_artifact` helper function reads the file content, and `json.loads` parses the JSON string into a Python list of dictionaries, making it ready for use in our prompts.

In [1]:
import sys
import os
import json

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
try:
    # Assumes the notebook is in 'labs/Day_01_.../'
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    # Fallback for different execution environments
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

from utils import setup_llm_client, get_completion, save_artifact, load_artifact, clean_llm_output, recommended_models_table

# Initialize the LLM client. You can change the model here.
client, model_name, api_provider = setup_llm_client(model_name="gemini-2.5-pro")

# Load the artifact from Lab 1
user_stories_str = load_artifact("artifacts/day1_user_stories.json")
if user_stories_str:
    user_stories_data = json.loads(user_stories_str)
else:
    print("Warning: Could not load user stories. Lab may not function correctly.")
    user_stories_data = []

✅ LLM Client configured: Using 'google' with model 'gemini-2.5-pro'


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
recommended_models_table()

| Model | Provider | Vision | Image Gen | Audio Transcription | Context Window | Max Output Tokens |
|---|---|---|---|---|---|---|
| claude-opus-4-1-20250805 | anthropic | ✅ | ❌ | ❌ | 200,000 | 100,000 |
| claude-opus-4-20250514 | anthropic | ✅ | ❌ | ❌ | 200,000 | 100,000 |
| claude-sonnet-4-20250514 | anthropic | ✅ | ❌ | ❌ | 1,000,000 | 100,000 |
| codex-mini | openai | ✅ | ❌ | ❌ | 200,000 | 100,000 |
| dall-e-3 | openai | ❌ | ✅ | ❌ | - | - |
| deepseek-ai/DeepSeek-V3 | huggingface | ❌ | ❌ | ❌ | 128,000 | 100,000 |
| deepseek-ai/DeepSeek-V3-Small | huggingface | ❌ | ❌ | ❌ | 128,000 | 100,000 |
| deepseek-ai/DeepSeek-VL2 | huggingface | ✅ | ❌ | ❌ | 32,000 | 8,000 |
| deepseek-ai/DeepSeek-VL2-Small | huggingface | ✅ | ❌ | ❌ | 32,000 | 8,000 |
| deepseek-ai/DeepSeek-VL2-Tiny | huggingface | ✅ | ❌ | ❌ | 32,000 | 8,000 |
| deepseek-ai/Janus-Pro-7B | huggingface | ✅ | ❌ | ❌ | 8,192 | 2,048 |
| gemini-2.0-flash | google | ✅ | ❌ | ❌ | 1,048,576 | 8,192 |
| gemini-2.0-flash-lite | google | ✅ | ❌ | ❌ | 1,048,576 | 8,192 |
| gemini-2.0-flash-live-001 | google | ✅ | ❌ | ❌ | 1,048,576 | 8,192 |
| gemini-2.5-flash | google | ✅ | ❌ | ❌ | 1,048,576 | 65,536 |
| gemini-2.5-flash-image-preview | google | ✅ | ✅ | ❌ | 32,768 | 32,768 |
| gemini-2.5-flash-lite | google | ✅ | ❌ | ❌ | 1,048,576 | 65,536 |
| gemini-2.5-pro | google | ✅ | ❌ | ❌ | 1,048,576 | 65,536 |
| gemini-deep-think | google | ✅ | ❌ | ❌ | 1,000,000 | 100,000 |
| gemini-live-2.5-flash-preview | google | ✅ | ❌ | ❌ | 1,048,576 | 8,192 |
| gemini-veo-3 | google | ✅ | ❌ | ❌ | - | - |
| google-cloud/speech-to-text/latest_long | google | ❌ | ❌ | ✅ | - | - |
| google-cloud/speech-to-text/latest_short | google | ❌ | ❌ | ✅ | - | - |
| gpt-4.1 | openai | ✅ | ❌ | ❌ | 1,000,000 | 32,000 |
| gpt-4.1-mini | openai | ✅ | ❌ | ❌ | 1,000,000 | 32,000 |
| gpt-4.1-nano | openai | ✅ | ❌ | ❌ | 1,000,000 | 32,000 |
| gpt-4.5 | openai | ✅ | ❌ | ❌ | 128,000 | 16,384 |
| gpt-4o | openai | ✅ | ❌ | ❌ | 128,000 | 16,384 |
| gpt-4o-mini | openai | ✅ | ❌ | ❌ | 128,000 | 16,384 |
| gpt-5-2025-08-07 | openai | ✅ | ❌ | ❌ | 400,000 | 128,000 |
| gpt-5-mini-2025-08-07 | openai | ✅ | ❌ | ❌ | 400,000 | 128,000 |
| gpt-5-nano-2025-08-07 | openai | ✅ | ❌ | ❌ | 400,000 | 128,000 |
| gpt-image-1 | openai | ✅ | ✅ | ❌ | - | - |
| imagen-3.0-generate-002 | google | ❌ | ✅ | ❌ | - | - |
| imagen-4.0-generate-001 | google | ❌ | ✅ | ❌ | 480 | - |
| meta-llama/Llama-3.3-70B-Instruct | huggingface | ❌ | ❌ | ❌ | 4,096 | 1,024 |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct | huggingface | ✅ | ❌ | ❌ | 1,000,000 | 100,000 |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | huggingface | ✅ | ❌ | ❌ | 10,000,000 | 100,000 |
| mistralai/Mistral-7B-Instruct-v0.3 | huggingface | ❌ | ❌ | ❌ | 32,768 | 8,192 |
| o3 | openai | ✅ | ❌ | ❌ | 200,000 | 100,000 |
| o4-mini | openai | ✅ | ❌ | ❌ | 200,000 | 100,000 |
| tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3 | huggingface | ❌ | ❌ | ❌ | 4,096 | 1,024 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 | huggingface | ❌ | ❌ | ❌ | 4,096 | 1,024 |
| whisper-1 | openai | ❌ | ❌ | ✅ | - | - |

'| Model | Provider | Vision | Image Gen | Audio Transcription | Context Window | Max Output Tokens |\n|---|---|---|---|---|---|---|\n| claude-opus-4-1-20250805 | anthropic | ✅ | ❌ | ❌ | 200,000 | 100,000 |\n| claude-opus-4-20250514 | anthropic | ✅ | ❌ | ❌ | 200,000 | 100,000 |\n| claude-sonnet-4-20250514 | anthropic | ✅ | ❌ | ❌ | 1,000,000 | 100,000 |\n| codex-mini | openai | ✅ | ❌ | ❌ | 200,000 | 100,000 |\n| dall-e-3 | openai | ❌ | ✅ | ❌ | - | - |\n| deepseek-ai/DeepSeek-V3 | huggingface | ❌ | ❌ | ❌ | 128,000 | 100,000 |\n| deepseek-ai/DeepSeek-V3-Small | huggingface | ❌ | ❌ | ❌ | 128,000 | 100,000 |\n| deepseek-ai/DeepSeek-VL2 | huggingface | ✅ | ❌ | ❌ | 32,000 | 8,000 |\n| deepseek-ai/DeepSeek-VL2-Small | huggingface | ✅ | ❌ | ❌ | 32,000 | 8,000 |\n| deepseek-ai/DeepSeek-VL2-Tiny | huggingface | ✅ | ❌ | ❌ | 32,000 | 8,000 |\n| deepseek-ai/Janus-Pro-7B | huggingface | ✅ | ❌ | ❌ | 8,192 | 2,048 |\n| gemini-2.0-flash | google | ✅ | ❌ | ❌ | 1,048,576 | 8,192 |\n| gemini-2.0-flash-lite

## Step 2: The Challenges - Solutions

### Challenge 1 (Foundational): Generating a Simple PRD

**Explanation:**
This prompt serves as a baseline. We provide the LLM with the user stories and ask it to summarize them into a few key sections. This is a simple synthesis task that demonstrates the LLM's ability to extract and reorganize information.

In [3]:
simple_prd_prompt = f"""
You are a Product Manager writing a Product Requirements Document (PRD) for a new hire onboarding tool.

Use the following JSON data containing user stories as your primary source of information:
<user_stories>
{user_stories_str}
</user_stories>

Generate a PRD in markdown format with the following sections:
1. **Introduction:** A brief overview of the project's purpose.
2. **User Personas:** A summary of the key users involved.
3. **Features / User Stories:** A list of the user stories and their acceptance criteria.
"""

print("--- Generating Simple PRD ---")
if user_stories_data:
    simple_prd_output = get_completion(simple_prd_prompt, client, model_name, api_provider)
    print(simple_prd_output)
else:
    print("Skipping PRD generation because user stories are missing.")

--- Generating Simple PRD ---
Here is the Product Requirements Document (PRD) based on the provided user stories.

# Product Requirements Document: New Hire Onboarding Tool

*   **Version:** 1.0
*   **Status:** Draft
*   **Author:** Product Manager

---

## 1. Introduction

This document outlines the product requirements for a new hire onboarding tool. The purpose of this project is to create a centralized, engaging, and efficient onboarding experience for new employees, while providing HR specialists and managers with the necessary tools to track progress and ensure a smooth transition. The platform will serve as a single source of truth for all onboarding-related information, tasks, and training, replacing fragmented processes with a streamlined and personalized journey for every new hire.

## 2. User Personas

The primary users of this tool are divided into three key groups:

*   **New Hire:** An individual who has recently joined the company. Their goal is to quickly find the infor

### Challenge 2 (Intermediate): Generating a PRD from a Template

**Explanation:**
This is a more advanced and practical task. Providing a template gives us much greater control over the final output. The LLM's task shifts from creative writing to structured content generation. We instruct it to fill in every section, which forces it to infer logical content for sections like "Success Metrics" and "Out of Scope" based on the provided requirements. This is a powerful pattern for creating consistent documentation.

In [4]:
# Load the PRD template from the 'templates' directory.
prd_template_content = load_artifact("templates/prd_template.md")

if not prd_template_content:
    print("❌ ERROR: Could not load PRD template from templates/prd_template.md")
    print("   Please ensure the file exists and is readable.")
    print("   Current working directory:", os.getcwd())
    prd_template_content = ""  # Set to empty string to prevent undefined variable errors
else:
    print(f"✅ Successfully loaded PRD template ({len(prd_template_content)} characters)")

template_prd_prompt = f"""
You are a Senior Product Manager responsible for creating a detailed and formal Product Requirements Document (PRD).

Your task is to populate the provided PRD template using the information from the user stories JSON.

<prd_template>
{prd_template_content}
</prd_template>

<user_stories_json>
{user_stories_str}
</user_stories_json>

Fill out every section of the template. For sections like 'Success Metrics' or 'Out of Scope', you must infer reasonable content based on the user stories and the overall project goal of creating a new hire onboarding tool.
The final output should be the completed PRD in markdown format.
"""

print("--- Generating PRD from Template ---")
if user_stories_data and prd_template_content:
    prd_from_template_output = get_completion(template_prd_prompt, client, model_name, api_provider)
    print(prd_from_template_output)
else:
    print("Skipping PRD generation because user stories or template are missing.")
    if not user_stories_data:
        print("   - User stories data is missing")
    if not prd_template_content:
        print("   - PRD template is missing")
    prd_from_template_output = ""

✅ Successfully loaded PRD template (4217 characters)
--- Generating PRD from Template ---
# Product Requirements Document: WelcomeHub - New Hire Onboarding Portal

| Status | **Draft** |
| :--- | :--- |
| **Author** | Product Team |
| **Version** | 1.0 |
| **Last Updated** | 2023-10-27 |

## 1. Executive Summary & Vision
WelcomeHub is a centralized digital onboarding platform designed to streamline the new hire experience. We are building this to solve the current fragmented and inefficient onboarding process, which leads to new hire confusion and administrative overhead for HR and managers. The vision is to create a seamless, engaging, and consistent onboarding journey that accelerates new hire productivity and fosters a sense of belonging from day one.

## 2. The Problem
*A detailed look at the pain points this product will solve. This section justifies the project's existence.*

**2.1. Problem Statement:**
New hires currently face a disjointed and overwhelming onboarding experience,

In [5]:
# Diagnostic: Check template file existence and path resolution
print("=== TEMPLATE DIAGNOSTIC ===")
print("Current working directory:", os.getcwd())

# Check if templates directory exists
templates_dir = os.path.join(project_root, "templates")
print(f"Project root: {project_root}")
print(f"Templates directory: {templates_dir}")
print(f"Templates directory exists: {os.path.exists(templates_dir)}")

if os.path.exists(templates_dir):
    print("Contents of templates directory:")
    for item in os.listdir(templates_dir):
        full_path = os.path.join(templates_dir, item)
        print(f"  - {item} ({'file' if os.path.isfile(full_path) else 'directory'})")

# Check the specific template file
template_file = os.path.join(project_root, "templates", "prd_template.md")
print(f"\nPRD template file path: {template_file}")
print(f"PRD template file exists: {os.path.exists(template_file)}")

if os.path.exists(template_file):
    print(f"File size: {os.path.getsize(template_file)} bytes")
else:
    print("❌ Template file not found!")
    
print("==========================")

=== TEMPLATE DIAGNOSTIC ===
Current working directory: /Users/armando/Documents/Github/AG-AISOFTDEV/Solutions/Day_01_Planning_and_Requirements
Project root: /Users/armando/Documents/Github/AG-AISOFTDEV
Templates directory: /Users/armando/Documents/Github/AG-AISOFTDEV/templates
Templates directory exists: True
Contents of templates directory:
  - adr_template.md (file)
  - prd_template.md (file)

PRD template file path: /Users/armando/Documents/Github/AG-AISOFTDEV/templates/prd_template.md
PRD template file exists: True
File size: 4217 bytes


### Challenge 3 (Advanced): Programmatic Validation with Pydantic

**Explanation:**
This is the most advanced challenge. We are now using an LLM to write *code that validates documents*. Generating a Pydantic model turns our document's structure into a testable, code-based standard. This is a form of 'documentation-as-code' that allows for automated governance, ensuring all future PRDs conform to the same reliable format.

1.  **Prompting for Code:** We give the LLM the PRD template and ask it to generate a Pydantic model. Pydantic is a data validation library, and using it to define our document structure turns that structure into a testable, reusable standard.
2.  **Saving the Model:** We save the generated Python code to a specific location (`app/validation_models/prd_model.py`). This isn't just a temporary script; it's a formal part of our application's codebase, intended to be used for future validation tasks.
3.  **Saving the PRD:** Finally, we save the markdown PRD generated in the intermediate step. This becomes the official `day1_prd.md` artifact for our project.

In [6]:
pydantic_model_prompt = f"""
You are a Python developer specializing in data validation with Pydantic.

Based on the following markdown PRD template, generate a single Pydantic model class named `ProductRequirementsDocument` that represents its structure.

<prd_template>
{prd_template_content}
</prd_template>

The model should have fields that correspond to the main sections of the template. Use appropriate Python types (e.g., str, List, Dict) from the `typing` library.
Ensure you include the necessary imports from `pydantic` and `typing`.
Only output the raw Python code for the model, without any explanation.
"""

print("--- Generating Pydantic Model for PRD ---")

if prd_template_content:
    pydantic_model_code = get_completion(pydantic_model_prompt, client, model_name, api_provider)
    
    if pydantic_model_code:
        # Use our standardized cleaning function
        cleaned_code = clean_llm_output(pydantic_model_code, language='python')
        
        print("\n--- Generated Pydantic Model ---")
        print(cleaned_code)

        # Save the generated Pydantic model code to a file.
        model_path = "app/validation_models/prd_model.py"
        save_artifact(cleaned_code, model_path)
    else:
        print("Warning: Pydantic model generation failed, get_completion returned None.")
else:
    print("Skipping Pydantic model generation because template is missing.")

# Finally, save the completed PRD from the intermediate challenge as our official artifact
if prd_from_template_output:
    save_artifact(prd_from_template_output, "artifacts/day1_prd.md")

--- Generating Pydantic Model for PRD ---

--- Generated Pydantic Model ---
import datetime
from typing import Dict, List, Literal

from pydantic import BaseModel, Field


class GoalMetric(BaseModel):
    """Represents a single goal with its corresponding KPI and target."""
    goal: str = Field(..., description="The high-level objective, e.g., 'Improve New Hire Efficiency'.")
    kpi: str = Field(..., alias="Key Performance Indicator (KPI)", description="The metric used to measure progress towards the goal.")
    target: str = Field(..., description="The specific, measurable target for the KPI, e.g., 'Decrease by 20% in Q1'.")

    class Config:
        allow_population_by_field_name = True


class UserStory(BaseModel):
    """Represents a single user story with its acceptance criteria."""
    title: str = Field(..., description="The user story in the format: 'As a [persona], I want to [action], so that [benefit]'.")
    acceptance_criteria: List[str] = Field(..., description="A list 

## Lab Conclusion

Excellent work! You have now taken the structured user stories from the first lab and synthesized them into a formal Product Requirements Document. You also created a Pydantic model to enforce the structure of this document, introducing automated governance into your workflow. The `day1_prd.md` artifact will be the primary input for Day 2, where we will begin designing our system's architecture and database.

> **Key Takeaway:** Using an LLM to populate a pre-defined template is a powerful pattern for creating consistent, high-quality documentation at scale. It combines the LLM's language skills with your required structure.