# Day 1 - Lab 1: AI-Powered Requirements & User Stories (Solution)

**Objective:** Use a Large Language Model (LLM) to decompose a vague problem statement into structured features, user personas, and Agile user stories, culminating in a machine-readable JSON artifact.

**Introduction:**
This notebook contains the complete solution for Lab 1. It demonstrates how to use an LLM to systematically break down a problem, generate structured requirements, and programmatically validate the output. Each step includes explanations of the code and the reasoning behind the prompts.

For definitions of key terms used in this lab, please refer to the [GLOSSARY.md](../../GLOSSARY.md).

## Step 1: Setup

**Purpose:** This initial block of code prepares our environment for the lab. It adds the project root to the system path to ensure our `utils.py` helper script can be imported, and then initializes the LLM API client.

**Model Selection:**
Our `utils.py` script is configured to work with multiple AI providers. You can change the `model_name` parameter in the `setup_llm_client()` function to any of the models listed in the `RECOMMENDED_MODELS` dictionary in `utils.py`. For example, to use a Hugging Face model, you could change the line to: `client, model_name, api_provider = setup_llm_client(model_name="meta-llama/Llama-3.3-70B-Instruct")`

**Libraries Explained:**
- **`os`**, **`sys`**: Standard Python libraries for interacting with the file system and Python's path, ensuring our modules are discoverable.
- **`json`**: A standard library for working with JSON data. We use `json.loads` to parse the LLM's text output into a Python dictionary or list, and `json.dumps` to format Python objects into a pretty-printed JSON string for saving.
- **`utils`**: Our custom helper script. 
  - `setup_llm_client()`: Handles reading the `.env` file and initializing the API client.
  - `get_completion()`: Simplifies the process of sending a prompt to the LLM and receiving a text response.
  - `save_artifact()`: Ensures our project artifacts are stored consistently in the `artifacts` directory.
  - `clean_llm_output()`: A new standardized function to remove markdown fences from LLM outputs.

In [1]:
import sys
import os
import json

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
try:
    # Assumes the notebook is in 'labs/Day_01_.../'
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    # Fallback for different execution environments
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

from utils import setup_llm_client, get_completion, save_artifact, clean_llm_output

# Initialize the LLM client. You can change the model here.
# For example: setup_llm_client(model_name="gemini-2.5-flash")
client, model_name, api_provider = setup_llm_client(model_name="gpt-4o")

✅ LLM Client configured: Using 'openai' with model 'gpt-4o'


## Step 2: The Problem Statement

We define our starting point—a simple, high-level problem statement—as a Python variable. This makes it easy to reuse in multiple prompts.

In [2]:
problem_statement = "We need a tool to help our company's new hires get up to speed."

## Step 3: The Challenges

Here are the complete solutions for each challenge.

### Challenge 1 (Foundational): Brainstorming Features

**Explanation:**
This first challenge is about exploration. We use simple, direct prompts to get the LLM's initial thoughts on the problem. The goal is to generate a broad set of ideas (features and personas) that will serve as the raw material for the more structured tasks to follow. We expect the output to be human-readable markdown.

In [3]:
# This prompt is direct and open-ended, encouraging the LLM to be creative.
features_prompt = f"""
Based on the problem statement: '{problem_statement}', brainstorm a list of potential features for a new hire onboarding tool. 
Format the output as a simple markdown list.
"""

print("--- Brainstorming Features ---")
brainstormed_features = get_completion(features_prompt, client, model_name, api_provider)
print(brainstormed_features)

# This prompt asks for specific roles to ground the brainstorming in user-centric thinking.
personas_prompt = f"""
Based on the problem statement: '{problem_statement}', identify and describe three distinct user personas who would interact with this tool. 
For each persona, describe their role and main goal.
"""

print("\n--- Identifying User Personas ---")
user_personas = get_completion(personas_prompt, client, model_name, api_provider)
print(user_personas)

--- Brainstorming Features ---
- **Interactive Onboarding Checklist**
  - Step-by-step tasks for new hires to complete
  - Progress tracking and reminders

- **Welcome Message and Introduction Videos**
  - Personalized welcome message from the CEO or team leader
  - Video introductions to the company culture and values

- **Company Handbook and Policies Access**
  - Digital access to the company's handbook
  - Searchable database of company policies

- **Training Modules and Resources**
  - Interactive training courses and quizzes
  - Access to learning materials and tutorials

- **Team Directory and Org Chart**
  - Interactive organizational chart showing team structure
  - Contact information and bios for team members

- **Mentorship Program Integration**
  - Pairing new hires with mentors
  - Scheduled check-ins and mentoring resources

- **Task Management and Goal Setting**
  - Tools for setting initial work goals
  - Integration with task management software

- **Feedback and Surv

### Challenge 2 (Intermediate): Generating Formal User Stories

**Explanation:**
This challenge represents a significant increase in complexity and value. We are no longer asking for simple text; we are demanding a specific, structured data format (JSON). 

The prompt is carefully engineered:
1.  **Persona:** `You are a Senior Product Manager...` tells the LLM the role it should adopt.
2.  **Context:** We provide the previous outputs (`problem_statement`, `brainstormed_features`, `user_personas`) inside `<context>` tags to give the LLM all the necessary information.
3.  **Format:** The `OUTPUT REQUIREMENTS` section is extremely explicit. It tells the LLM to *only* output JSON, defines the exact keys for each object, and specifies the format for nested data (like the array of Gherkin strings). This strictness is key to getting reliable, machine-readable output.
4.  **Parsing:** The `try...except` block is a crucial step. It attempts to parse the LLM's string output into a Python list of dictionaries. If it succeeds, we know the LLM followed our instructions perfectly. If it fails, we print the raw output to help debug the prompt.

In [4]:
# The prompt is highly-structured to guide the LLM toward a perfect JSON output.
json_user_stories_prompt = f"""
You are a Senior Product Manager creating a product backlog for a new hire onboarding tool.

Based on the following context:
<context>
Problem Statement: {problem_statement}
Potential Features: {brainstormed_features}
User Personas: {user_personas}
</context>

Your task is to generate a list of 5 detailed user stories.

**OUTPUT REQUIREMENTS**:
- You MUST output a valid JSON array. Your response must begin with [ and end with ]. Do not include any text or markdown before or after the JSON array.
- Each object in the array must represent a single user story.
- Each object must have the following keys: 'id' (an integer), 'persona' (a string from the personas), 'user_story' (a string in the format 'As a [persona], I want [goal], so that [benefit].'), and 'acceptance_criteria' (an array of strings, with each string in Gherkin format 'Given/When/Then').
"""

print("--- Generating User Stories as JSON ---")
# We set a lower temperature to encourage the LLM to stick to the requested format.
json_output_str = get_completion(json_user_stories_prompt, client, model_name, api_provider, temperature=0.2)

print(f"Raw LLM response length: {len(json_output_str)} characters")
print("First 200 characters of response:")
print(repr(json_output_str[:200]))

# Attempt to parse the string output into a Python list.
try:
    # Use our new standardized cleaning function from utils.py
    cleaned_json_str = clean_llm_output(json_output_str, language='json')
    print(f"\nCleaned JSON length: {len(cleaned_json_str)} characters")
    
    user_stories_json = json.loads(cleaned_json_str)
    print("✅ Successfully parsed LLM output as JSON.")
    print(f"Number of user stories generated: {len(user_stories_json)}")
    
    # Pretty-print the first user story to verify its structure
    print("\n--- Sample User Story ---")
    print(json.dumps(user_stories_json[0], indent=2))
    
except (json.JSONDecodeError, TypeError, IndexError) as e:
    print(f"❌ Error: Failed to parse LLM output as JSON. Error: {e}")
    print("\n--- DEBUGGING INFO ---")
    print("Raw LLM Output:")
    print("-" * 50)
    print(json_output_str)
    print("-" * 50)
    
    if 'cleaned_json_str' in locals():
        print("\nCleaned JSON:")
        print("-" * 50)
        print(cleaned_json_str)
        print("-" * 50)
    
    user_stories_json = [] # Assign an empty list to prevent errors in the next cell
    print("\n⚠️  Set user_stories_json to empty list to prevent downstream errors.")
    print("   Please check the API key configuration and re-run this cell.")

--- Generating User Stories as JSON ---
Raw LLM response length: 3161 characters
First 200 characters of response:
'```json\n[\n    {\n        "id": 1,\n        "persona": "New Hire - Entry-Level Employee",\n        "user_story": "As a New Hire - Entry-Level Employee, I want an interactive onboarding checklist, so that '

Cleaned JSON length: 3149 characters
✅ Successfully parsed LLM output as JSON.
Number of user stories generated: 5

--- Sample User Story ---
{
  "id": 1,
  "persona": "New Hire - Entry-Level Employee",
  "user_story": "As a New Hire - Entry-Level Employee, I want an interactive onboarding checklist, so that I can track my progress and ensure I complete all necessary tasks.",
  "acceptance_criteria": [
    "Given I am a new hire, When I log into the onboarding tool, Then I should see an interactive checklist with tasks to complete.",
    "Given I have completed a task, When I mark it as done, Then the checklist should update to reflect my progress.",
    "Given I have

### Challenge 3 (Advanced): Programmatic Validation and Artifact Creation

**Explanation:**
This is the final and most critical step. We treat the LLM's output as untrusted input and subject it to programmatic validation. This ensures that the artifact we create is reliable and can be consumed by other automated tools in later stages of the SDLC without causing errors. 

The `validate_and_save_stories` function acts as a gatekeeper. It checks for the correct data types (a list of objects) and ensures that all required fields are present in each object. Only if all checks pass do we proceed to save the file using `save_artifact`. This creates a trustworthy `day1_user_stories.json` file that can be confidently used as an input for other automated processes in our SDLC.

In [5]:
def validate_and_save_stories(stories_data):
    """Validates the structure of the user stories data and saves it if valid."""
    if not isinstance(stories_data, list) or not stories_data:
        print("Validation Failed: Data is not a non-empty list.")
        return False

    required_keys = ['id', 'persona', 'user_story', 'acceptance_criteria']
    all_stories_valid = True

    # Loop through each story object in the list.
    for i, story in enumerate(stories_data):
        # Check for the presence of all required keys.
        if not all(key in story for key in required_keys):
            print(f"Validation Failed: Story at index {i} is missing one or more required keys.")
            print(f"   Expected keys: {required_keys}")
            print(f"   Found keys: {list(story.keys()) if isinstance(story, dict) else 'Not a dictionary'}")
            all_stories_valid = False
            continue # Don't bother with further checks for this invalid story
        
        # Check that the acceptance criteria is a list with at least one item.
        ac = story.get('acceptance_criteria')
        if not isinstance(ac, list) or not ac:
            print(f"Validation Failed: Story at index {i} (ID: '{story.get('id')}') has invalid or empty acceptance criteria.")
            print(f"   Expected: list with at least one item")
            print(f"   Found: {type(ac)} with value {ac}")
            all_stories_valid = False

    # Only save the artifact if all stories in the list are valid.
    if all_stories_valid:
        print(f"\n✅ All {len(stories_data)} user stories passed validation.")
        artifact_path = "artifacts/day1_user_stories.json"
        
        # Use the helper function to save the file, creating the 'artifacts' directory if needed.
        # We use json.dumps with an indent to make the saved file human-readable.
        save_artifact(json.dumps(stories_data, indent=2), artifact_path)
        return True
    else:
        print(f"\n❌ Validation failed for one or more stories. Artifact not saved.")
        return False

# Note: The actual validation call is now in the next cell with better error handling

In [6]:
# Diagnostic: Check the current state of user_stories_json
print("=== DIAGNOSTIC INFO ===")
if 'user_stories_json' in locals():
    print(f"user_stories_json exists: {type(user_stories_json)}")
    print(f"Length: {len(user_stories_json) if hasattr(user_stories_json, '__len__') else 'N/A'}")
    if user_stories_json:
        print("Sample content:", user_stories_json[0] if len(user_stories_json) > 0 else "Empty list")
    else:
        print("user_stories_json is empty or falsy")
        print("This means JSON parsing likely failed in the previous cell.")
        print("Check the raw LLM output above for formatting issues.")
else:
    print("user_stories_json variable does not exist")
    print("This means the previous cell never executed successfully")

# Also check if we have the raw output
if 'json_output_str' in locals():
    print(f"\nRaw LLM output length: {len(json_output_str)} characters")
    print("First 200 characters of raw output:")
    print(repr(json_output_str[:200]))
else:
    print("json_output_str not available")
print("========================")

=== DIAGNOSTIC INFO ===
user_stories_json exists: <class 'list'>
Length: 5
Sample content: {'id': 1, 'persona': 'New Hire - Entry-Level Employee', 'user_story': 'As a New Hire - Entry-Level Employee, I want an interactive onboarding checklist, so that I can track my progress and ensure I complete all necessary tasks.', 'acceptance_criteria': ['Given I am a new hire, When I log into the onboarding tool, Then I should see an interactive checklist with tasks to complete.', 'Given I have completed a task, When I mark it as done, Then the checklist should update to reflect my progress.', 'Given I have pending tasks, When a deadline is approaching, Then I should receive a reminder notification.']}

Raw LLM output length: 3161 characters
First 200 characters of raw output:
'```json\n[\n    {\n        "id": 1,\n        "persona": "New Hire - Entry-Level Employee",\n        "user_story": "As a New Hire - Entry-Level Employee, I want an interactive onboarding checklist, so that '


In [7]:
# Run the validation function on the data we parsed from the LLM.
print("=== VALIDATION STEP ===")

if 'user_stories_json' not in locals():
    print("❌ ERROR: user_stories_json variable not found.")
    print("   Make sure to run the previous cell that generates user stories.")
elif not user_stories_json:
    print("❌ ERROR: user_stories_json is empty or None.")
    print("   This usually means JSON parsing failed in the previous step.")
    print("   Solutions:")
    print("   1. Check that your API keys are correctly configured")
    print("   2. Re-run the previous cell to generate user stories")
    print("   3. Examine the raw LLM output for formatting issues")
    
    # Try to re-parse if we have the raw output
    if 'json_output_str' in locals() and json_output_str.strip():
        print("\n🔄 Attempting to re-parse the JSON...")
        try:
            cleaned_json_str = clean_llm_output(json_output_str, language='json')
            user_stories_json = json.loads(cleaned_json_str)
            print("✅ Re-parsing successful! Proceeding with validation...")
            validate_and_save_stories(user_stories_json)
        except (json.JSONDecodeError, TypeError) as e:
            print(f"❌ Re-parsing failed: {e}")
            print("Raw output that failed to parse:")
            print("-" * 50)
            print(json_output_str)
            print("-" * 50)
else:
    print(f"✅ Found user_stories_json with {len(user_stories_json)} stories")
    validate_and_save_stories(user_stories_json)

=== VALIDATION STEP ===
✅ Found user_stories_json with 5 stories

✅ All 5 user stories passed validation.
✅ Successfully saved artifact to: artifacts/day1_user_stories.json


## Lab Conclusion

Congratulations! You have completed the first lab. You started with a vague, one-sentence problem and finished with a structured, validated, machine-readable requirements artifact. This is the critical first step in an AI-assisted software development lifecycle. The `day1_user_stories.json` file you created will be the direct input for our next lab, where we will generate a formal Product Requirements Document (PRD).

> **Key Takeaway:** The single most important skill demonstrated in this lab is turning unstructured ideas into structured, machine-readable data (JSON). This transformation is what enables automation and integration with other tools later in the SDLC.