# Test Assessment Generation

This notebook runs the BridgeAI assessment generator so you can try different job descriptions and see:

1. **Step 1 output** — Requirements extraction: inferred stack, level, confidence, summary, and key skills from the job description.
2. **Final assessment** — The generated title, description, and time limit as candidates will see it.

**Setup:**
1. Install Jupyter: `pip install jupyter` (if needed).
2. From the **repo root** (`bridge-assessements/`): run `jupyter notebook` or `jupyter lab`, then open `notebooks/test-assessment-generation.ipynb`.
3. Ensure `server/config.env` exists and has your API key (`OPENAI_API_KEY` or Anthropic/Gemini). When running the notebook from repo root, set env first if needed: `export $(grep -v '^#' server/config.env | xargs)` (or run from `server/` with `config.env` loaded).
4. Edit the job description in the next cell and run the cells (Run All, or run cell by cell).

**Note:** The notebook runs the same TypeScript script the app uses (`server/src/scripts/test-assessment-generation.ts`) with `--steps` so you get both Step 1 and the final assessment. No server or database required.

In [11]:
# Edit this job description and re-run the cell (and the one below) to see new output.
JOB_DESCRIPTION = """
Backend Engineer – Node.js

We're looking for a backend engineer to build and maintain APIs and services.

Requirements:
- 2+ years experience with Node.js and TypeScript
- Experience with REST APIs and relational databases (PostgreSQL)
- Familiarity with authentication (JWT, OAuth) and rate limiting
- Comfort with testing (unit and integration)
- Good communication and collaboration skills

Nice to have: Redis, message queues, Docker/Kubernetes.
"""

In [12]:
import subprocess
import tempfile
import os
import json

env = {**os.environ}

# Repo root: if cwd is notebooks/, go up one level; otherwise use cwd (e.g. when run from repo root)
REPO_ROOT = os.getcwd()
if os.path.basename(REPO_ROOT) == "notebooks":
    REPO_ROOT = os.path.dirname(REPO_ROOT)
SERVER_DIR = os.path.join(REPO_ROOT, "server")
SCRIPT_PATH = os.path.join(SERVER_DIR, "src", "scripts", "test-assessment-generation.ts")

if not os.path.exists(SCRIPT_PATH):
    raise FileNotFoundError(f"Script not found: {SCRIPT_PATH}. Repo root used: {REPO_ROOT}")

with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
    f.write(JOB_DESCRIPTION)
    temp_path = f.name

step1_result = None
assessment = None

try:
    print("Running assessment generator (Step 1 + Step 2)...")
    # Load server/config.env via Node so API keys are available (Node 20.6+)
    result = subprocess.run(
        ["node", "--env-file=config.env", "./node_modules/.bin/tsx", "src/scripts/test-assessment-generation.ts", temp_path, "--steps"],
        cwd=SERVER_DIR,
        capture_output=True,
        text=True,
        timeout=120,
        env=env,
    )
    if result.returncode != 0:
        print(result.stdout)
        if result.stderr:
            print("STDERR:", result.stderr)
        print("Exit code:", result.returncode)
    else:
        lines = [ln.strip() for ln in result.stdout.strip().split("\n") if ln.strip()]
        for line in reversed(lines):
            try:
                data = json.loads(line)
                if "step1" in data and "assessment" in data:
                    step1_result = data["step1"]
                    assessment = data["assessment"]
                    print("OK — Step 1 (requirements extraction) and final assessment parsed.")
                    break
                if "title" in data and "description" in data and "timeLimit" in data:
                    assessment = data
                    print("OK — title, description, timeLimit parsed (no Step 1 in output).")
                    break
            except json.JSONDecodeError:
                continue
        if assessment is None:
            print("Could not parse script output as JSON. Raw stdout (last 500 chars):")
            print(result.stdout[-500:] if len(result.stdout) > 500 else result.stdout)
finally:
    os.unlink(temp_path)

Running assessment generator (Step 1 + Step 2)...
OK — Step 1 (requirements extraction) and final assessment parsed.


In [13]:
# Step 1 output: requirements extraction (inferred stack, level, summary)
from IPython.display import display, Markdown, HTML

if step1_result is None:
    print("No Step 1 data. Run the cell above first (with --steps).")
else:
    s = step1_result
    display(HTML("""
    <div style="background:#F0FDF4; border:1px solid #86EFAC; padding:16px 20px; border-radius:12px; margin-bottom:16px;">
        <h3 style="margin:0 0 12px 0; font-size:1rem; color:#166534;">Step 1 — Requirements extraction</h3>
        <p style="margin:0 0 8px 0; color:#15803D;"><strong>Summary:</strong> {summary}</p>
        <p style="margin:0 0 8px 0; color:#15803D;"><strong>Stack:</strong> {stack} (confidence: {stack_conf})</p>
        <p style="margin:0 0 8px 0; color:#15803D;"><strong>Level:</strong> {level} (confidence: {level_conf})</p>
        {key_skills}
        {suggested_scope}
    </div>
    """.format(
        summary=s.get("summary", "").replace("<", "&lt;").replace(">", "&gt;"),
        stack=s.get("stack", "—"),
        stack_conf=s.get("stackConfidence", "—"),
        level=s.get("level", "—"),
        level_conf=s.get("levelConfidence", "—"),
        key_skills='<p style="margin:0 0 8px 0; color:#15803D;"><strong>Key skills:</strong> {}</p>'.format(", ".join(s.get("keySkills") or [])) if s.get("keySkills") else "",
        suggested_scope='<p style="margin:0; color:#15803D;"><strong>Suggested scope:</strong> {}</p>'.format((s.get("suggestedScope") or "").replace("<", "&lt;").replace(">", "&gt;")) if s.get("suggestedScope") else "",
    )))

## Final assessment (candidate view)

Below is how the generated assessment will look to candidates: title, time limit, and full project instructions (description).

In [14]:
# Preview: how this assessment will look to candidates (matches CandidateAssessment.jsx)
from IPython.display import display, Markdown, HTML

if assessment is None:
    print("No assessment to preview. Run the cell above first.")
else:
    # Warn if generation failed or description is a fallback (not full project instructions)
    desc = assessment.get("description", "")
    if desc.startswith("Assessment generation failed") or desc.startswith("Assessment generation could not"):
        display(HTML("""
        <div style="background:#FEF2F2; border:1px solid #FCA5A5; padding:16px 20px; border-radius:12px; margin-bottom:16px;">
            <h3 style="margin:0 0 8px 0; font-size:1rem; color:#991B1B;">Generation failed</h3>
            <p style="margin:0; color:#B91C1C;">The script could not produce a full assessment. Check that <code>server/config.env</code> has a valid API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, or GEMINI_API_KEY) and run the cell above again. If running from repo root, ensure the script can load config (e.g. run from <code>server/</code> or use Node 20.6+ with <code>--env-file=config.env</code>).</p>
        </div>
        """))
    # Show review feedback if present (from ENABLE_ASSESSMENT_REVIEW step)
    if assessment.get("reviewFeedback"):
        fb = assessment["reviewFeedback"].replace("<", "&lt;").replace(">", "&gt;").replace("\\n", "<br>")
        display(HTML(f"""
        <div style=\"background:#EFF6FF; border:1px solid #93C5FD; padding:16px 20px; border-radius:12px; margin-bottom:16px;\">
            <h3 style=\"margin:0 0 8px 0; font-size:1rem; color:#1E3A8A;\">Quality check (review feedback)</h3>
            <p style=\"margin:0; color:#1E40AF; line-height:1.5;\">{fb}</p>
        </div>
        """))
    mins = assessment["timeLimit"]
    if mins >= 60:
        time_str = f"{mins // 60} hour{'s' if mins != 60 else ''}" + (f" {mins % 60} min" if mins % 60 else "")
    else:
        time_str = f"{mins} minutes"

    # Candidate-style header (blue) + card (white) with "Project Instructions"
    html_header = f"""
    <div style="background:#1E3A8A; color:white; padding:24px 32px; border-radius:12px 12px 0 0; text-align:center;">
        <h1 style="margin:0 0 4px 0; font-size:1.5rem;">{assessment['title']}</h1>
        <p style="margin:0; color:#93C5FD;">Technical Assessment</p>
        <p style="margin:8px 0 0 0; font-size:0.875rem; color:#BFDBFE;">{time_str} to complete</p>
    </div>
    <div style="border:1px solid #e5e7eb; border-top:none; padding:24px; border-radius:0 0 12px 12px; background:#fff; margin-bottom:16px;">
        <h2 style="font-size:1.125rem; margin:0 0 16px 0; color:#111;">Project Instructions</h2>
    </div>
    """
    display(HTML(html_header))
    # Description is Markdown (candidates see it rendered the same way via ReactMarkdown)
    display(Markdown(assessment["description"]))

## Scenario
You are tasked with building a backend service for a messaging application called **ChatConnect**. This service will handle user accounts and message sending between users. The primary focus will be on creating a RESTful API that allows users to register, log in, and send messages to each other.

## What you will build
You will develop a RESTful API that supports user registration, authentication, and sending messages between users. The API should be built using **Node.js** and **TypeScript** and should interact with a **PostgreSQL** database.

## Requirements (must-have)
- Implement a user registration endpoint that accepts a username and password, validates the input, and stores the user in the database.
- Create a login endpoint that authenticates users using **JWT** for session management.
- Develop an endpoint to send messages between users, which should include sender ID, receiver ID, and message content.
- Ensure that all endpoints are secured and rate-limited to prevent abuse.
- Write unit and integration tests for the key functionalities of the API.

## Acceptance Criteria (definition of done)
- [ ] User can register with a unique username and password, receiving a success response.
- [ ] User can log in with valid credentials and receive a JWT token.
- [ ] User cannot register with an already taken username, receiving an appropriate error message.
- [ ] User can send a message to another user and receive a success response.
- [ ] User cannot send a message to themselves, receiving an appropriate error message.
- [ ] All endpoints are protected and require a valid JWT token for access.
- [ ] Rate limiting is implemented on all endpoints to restrict excessive requests.
- [ ] Unit tests cover user registration and login functionalities.
- [ ] Integration tests validate the message sending process between users.
- [ ] A README file is provided with setup instructions and API documentation.

## Constraints
- The project must be implementable in a new, empty repository without any external API keys, cloud accounts, or paid services.
- Use an in-memory SQLite database for local development, with the option to switch to PostgreSQL for production.
- Do not implement any frontend components or real-time features such as WebSockets.

## Provided / Assumptions
- Candidate may create their own seed data or use in-memory data for testing.
- Authentication will be simplified using JWT and will not require third-party services.
- The candidate can assume familiarity with basic user authentication flows and REST API design.

## Deliverables
1. A complete Node.js application with TypeScript.
2. A PostgreSQL or SQLite database schema for users and messages.
3. A set of unit and integration tests.
4. A README file with instructions on how to run the application and use the API.

## Nice-to-haves (optional)
- Implement user profile management (e.g., updating user details).
- Add message retrieval functionality (fetching messages between users).
- Implement caching with Redis for message storage.
- Containerize the application using Docker.

---
## How to judge quality

Use the **checklist** below (subjective) and the **Quality snapshot** in the next cell (objective checks from the prompt rules).

**Checklist (subjective)**  
- **Fit to role:** Does the scenario and tech stack match the job description?  
- **Specific, not generic:** Is it a concrete task (e.g. “API for article CRUD”) rather than “build a full‑stack app”?  
- **Scopable in time:** Could a strong candidate finish in the given time limit?  
- **Clear requirements:** Are must-haves unambiguous? Is “definition of done” observable (not vague)?  
- **Fair:** Are constraints and “provided/assumptions” clear so candidates aren’t penalized for guessing?

In [15]:
# Quality snapshot (objective checks from prompt rules in server/src/prompts/index.ts)
import re

if assessment is None:
    print("No assessment. Run the generation cell first.")
else:
    desc = assessment.get("description", "")
    title = assessment.get("title", "")
    time_limit = assessment.get("timeLimit", 0)

    word_count = len(desc.split())
    sections = re.findall(r"^##\s+(.+)$", desc, re.MULTILINE)
    checklist_items = re.findall(r"^\s*-\s*\[\s*\]", desc, re.MULTILINE)

    time_ok = 30 <= time_limit <= 480
    words_ok = 300 <= word_count <= 650
    # Key phrases from prompt (flexible match)
    required_phrases = [
        "scenario",
        "what you will build",
        "requirements",
        "acceptance criteria",
        "constraints",
        "provided",
        "assumptions",
        "deliverables",
        "nice-to-have",
    ]
    section_text = " ".join(s.strip().lower() for s in sections)
    missing = [p for p in required_phrases if p not in section_text]
    criteria_ok = len(checklist_items) >= 10

    print("━━━ Quality snapshot ━━━\n")
    print(f"  Title length:     {len(title)} chars")
    print(f"  Time limit:       {time_limit} min  {'✓ in range [30–480]' if time_ok else '✗ outside 30–480'}")
    print(f"  Description:      {word_count} words  {'✓ in range [300–650]' if words_ok else '✗ outside 300–650'}")
    print(f"  Acceptance items: {len(checklist_items)}  {'✓ ≥ 10' if criteria_ok else '✗ need ≥ 10'}")
    print(f"  Sections found:  {', '.join(sections) if sections else '(none)'}")
    if missing:
        print(f"  Sections:         ✗ missing: {', '.join(missing)}")
    else:
        print("  Sections:         ✓ all expected section topics present")
    print()

━━━ Quality snapshot ━━━

  Title length:     49 chars
  Time limit:       180 min  ✓ in range [30–480]
  Description:      498 words  ✓ in range [300–650]
  Acceptance items: 10  ✓ ≥ 10
  Sections found:  Scenario, What you will build, Requirements (must-have), Acceptance Criteria (definition of done), Constraints, Provided / Assumptions, Deliverables, Nice-to-haves (optional)
  Sections:         ✓ all expected section topics present

