## Step 1: Setup

**Explanation:**
We load the `day1_prd.md` artifact from Day 1. This document is the single source of truth for our project's requirements and provides the essential context for the LLM to generate a relevant and accurate database schema.

In [2]:
import sys
import os
import sqlite3

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
try:
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

print(f"Current working directory: {os.getcwd()}")
print(f"Project root directory: {project_root}")

from utils import setup_llm_client, get_completion, save_artifact, load_artifact, clean_llm_output, recommended_models_table, prompt_enhancer

# Initialize separate LLM clients for different artifacts to use the latest models from different providers.
# - Schema generation uses a strong instruction-following model
# - Seed data generation uses a model tuned for data generation
schema_client, schema_model_name, schema_api_provider = setup_llm_client(model_name="gemini-2.5-pro")
seed_client, seed_model_name, seed_api_provider = setup_llm_client(model_name="gemini-2.5-pro")

# Load the PRD
prd_content = load_artifact("artifacts/adr_001_database_choice.md")
if not prd_content:
    print("Warning: Could not load prd_content = artifacts/adr_001_database_choice.md. Lab may not function correctly.")

Current working directory: c:\aiswe\220372-AG-AISOFTDEV-Team-2-CodeVoyagers
Project root directory: c:\


2025-11-05 14:44:23,365 ag_aisoftdev.utils INFO LLM Client configured provider=google model=gemini-2.5-pro latency_ms=None artifacts_path=None
2025-11-05 14:44:24,536 ag_aisoftdev.utils INFO LLM Client configured provider=google model=gemini-2.5-pro latency_ms=None artifacts_path=None


### STEP 2 - Generating the SQL Schema

**Explanation:**
This prompt instructs the LLM to act as a Database Administrator (DBA). By providing the full PRD as context, we enable the LLM to understand the entities and relationships required by the application. The prompt specifically asks for `CREATE TABLE` statements, guiding the LLM to produce a ready-to-use SQL script. We then clean up the response to remove markdown fences and save the pure SQL code.

In [None]:
schema_prompt = f"""
You are a Database Administrator. Based on the provided PRD, generate SQL `CREATE TABLE` statements to define the database schema for an onboarding tool.

**PRD Context:**
<prd>
{prd_content}
</prd>

Ensure the schema includes:
1. A `users` table with fields for `id`, `name`, `email`, and `role`.
2. An `onboarding_tasks` table with fields for `id`, `title`, `description`, `assigned_to` (foreign key to `users`), and `status`.
3. An `applicants` table that matches the fields used by index.html. This table should include:
    - `id` (primary key, integer, autoincrement)
    - `name` (text)
    - `title` (text)
    - `experience_years` (integer)
    - `education` (text)
    - `location` (text)
    - `skills` (text) -- store as a comma-separated list or JSON text
    - `summary` (text)

Make sure to include proper data types, primary keys, foreign keys, and sensible constraints (e.g., unique on emails where appropriate, not null where required). If any relationships are needed between `applicants` and other tables, include them and document via comments in the SQL.

Output only the raw SQL code.
"""

print("--- Generating SQL Schema ---")
if prd_content:
    try:
        try:
            enhanced_schema_prompt = prompt_enhancer(schema_prompt)
            print("Schema Enhanced prompt\n", enhanced_schema_prompt)
        except Exception as e:
            print(f"Prompt enhancement failed ({e}); falling back to original prompt.")
            enhanced_schema_prompt = schema_prompt

        try:
            generated_schema = get_completion(enhanced_schema_prompt, schema_client, schema_model_name, schema_api_provider)
        except Exception as e:
            print(f"Schema generation failed ({e}). Aborting schema creation.")
            generated_schema = ""

        cleaned_schema = clean_llm_output(generated_schema, language='sql') if generated_schema else ""
        if cleaned_schema:
            print(cleaned_schema)
            save_artifact(cleaned_schema, "artifacts/schema.sql", overwrite=True)
        else:
            print("No schema produced.")
    finally:
        # Ensure variable exists for downstream cells
        cleaned_schema = cleaned_schema if 'cleaned_schema' in locals() else ""
else:
    print("Skipping schema generation because PRD is missing.")
    cleaned_schema = ""

--- Generating SQL Schema ---


2025-11-05 14:54:04,769 ag_aisoftdev.utils INFO LLM Client configured provider=openai model=o3 latency_ms=None artifacts_path=None


ProviderOperationError: [openai:o3] prompt enhancement error: [openai:o3] completion error: Connection error.

### Step 3 - Generating Realistic Seed Data

**Explanation:**
An empty database isn't very useful for development. This prompt asks the LLM to generate realistic seed data. By providing both the PRD (for thematic context) and the SQL schema (for structural correctness), we guide the LLM to create `INSERT` statements that are both thematically appropriate (e.g., onboarding-related task titles) and syntactically correct.

In [None]:
seed_data_prompt = f"""
You are a data specialist. Based on the provided PRD and SQL schema, generate 5-10 realistic SQL `INSERT` statements to populate the tables with sample data for an onboarding tool.

**PRD Context:**
<prd>
{prd_content}
</prd>

**SQL Schema:**
<schema>
{cleaned_schema}
</schema>

Generate at least 5 project managers and 3 employees.
Output only the raw SQL `INSERT` statements.
"""

print("--- Generating Seed Data ---")
if prd_content and cleaned_schema:
    # Enhance the seed data prompt for better structure and fidelity
    enhanced_seed_prompt = prompt_enhancer(seed_data_prompt)
    print("Seed Data Enhanced prompt\n", enhanced_seed_prompt)

    # Use the seed-data specific client
    generated_seed_data = get_completion(enhanced_seed_prompt, seed_client, seed_model_name, seed_api_provider)

    # Clean up the generated seed data
    cleaned_seed_data = clean_llm_output(generated_seed_data, language='sql')
    print(cleaned_seed_data)

    # Save the cleaned seed data to a file
    save_artifact(cleaned_seed_data, "artifacts/seed_data.sql", overwrite=True)
else:
    print("Skipping seed data generation because PRD or schema is missing.")

### Step 4 - Creating and Seeding a Live Database

**Explanation:**
This Python function demonstrates a crucial engineering task: turning text-based artifacts into a live system component. The `create_database` function uses Python's built-in `sqlite3` library.
1.  It establishes a connection to a database file, which creates the file if it doesn't exist.
2.  It reads the `schema.sql` artifact and executes it. It's important to use `cursor.executescript()` here. While `cursor.execute()` is designed for a single SQL statement, `executescript()` is necessary for running a string that contains multiple SQL statements, which is exactly what our `schema.sql` and `seed_data.sql` files contain.
3.  It then reads and executes the `seed_data.sql` artifact to populate the newly created tables.
4.  `conn.commit()` saves all the changes to the database file.
5.  The `finally` block ensures that `conn.close()` is always called, which is a critical best practice to prevent resource leaks.

In [None]:
def create_database(db_path, schema_path, seed_path):
    """Creates and seeds a SQLite database from SQL files."""
    if not os.path.exists(schema_path):
        print(f"Error: Schema file not found at {schema_path}")
        return

    # Delete the old database file if it exists to start fresh
    if os.path.exists(db_path):
        os.remove(db_path)
        print(f"Removed existing database file at {db_path}")

    conn = None
    try:
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        print(f"Successfully connected to database at {db_path}")

        # Read and execute the schema file
        schema_sql = load_artifact(schema_path)
        if schema_sql:
            cursor.executescript(schema_sql)
            print("Tables created successfully.")

        # Read and execute the seed data file if it exists
        if os.path.exists(seed_path):
            seed_sql = load_artifact(seed_path)
            if seed_sql:
                cursor.executescript(seed_sql)
                print("Seed data inserted successfully.")

        conn.commit()
        print("Database changes committed.")

    except sqlite3.Error as e:
        print(f"Database error: {e}")
    finally:
        if conn:
            conn.close()
            print("Database connection closed.")

# Define file paths
db_file = os.path.join(os.getcwd(), "artifacts", "main_database.db")
schema_file = os.path.join(os.getcwd(), "artifacts", "schema.sql")
seed_file = os.path.join(os.getcwd(), "artifacts", "seed_data.sql")

# Execute the function
create_database(db_file, schema_file, seed_file)

### Step 5 - Verify the database was created successfully by querying the data


In [None]:
# Verify the database was created successfully by querying the data
def verify_database(db_path):
    """Verify the database contains the expected data"""
    if not os.path.exists(db_path):
        print(f"Database file not found at {db_path}")
        return
    
    conn = None
    try:
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        
        # Check users table
        cursor.execute("SELECT COUNT(*) FROM users")
        user_count = cursor.fetchone()[0]
        print(f"Users table contains {user_count} records")
        
        # Check applicants table
        cursor.execute("SELECT COUNT(*) FROM applicants")
        applicant_count = cursor.fetchone()[0]
        print(f"Applicants table contains {applicant_count} records")
        
        # Check onboarding_tasks table
        cursor.execute("SELECT COUNT(*) FROM onboarding_tasks")
        task_count = cursor.fetchone()[0]
        print(f"Onboarding_tasks table contains {task_count} records")
        
        # Show sample data
        print("\n--- Sample Users ---")
        cursor.execute("SELECT id, name, email, role FROM users LIMIT 3")
        for row in cursor.fetchall():
            print(f"ID: {row[0]}, Name: {row[1]}, Email: {row[2]}, Role: {row[3]}")
            
        print("\n--- Sample Applicants ---")
        cursor.execute("SELECT id, name, title, experience_years FROM applicants LIMIT 3")
        for row in cursor.fetchall():
            print(f"ID: {row[0]}, Name: {row[1]}, Title: {row[2]}, Experience: {row[3]} years")
            
        print("\n--- Sample Tasks ---")
        cursor.execute("SELECT id, title, status, assigned_to FROM onboarding_tasks LIMIT 3")
        for row in cursor.fetchall():
            print(f"ID: {row[0]}, Title: {row[1]}, Status: {row[2]}, Assigned to: {row[3]}")
        
    except sqlite3.Error as e:
        print(f"Database error: {e}")
    finally:
        if conn:
            conn.close()

# Verify the database
verify_database(db_file)