# Idea Generator & Refiner — **Annotated Notebook**

This notebook implements a minimal **Idea Generator & Refiner** using OpenAI. 
You asked for explicit reasoning per line and per cell — so each code cell is heavily commented.

### What you can do here
- **Generate** N ideas for any domain with structured fields (title, summary, category, scores, impact_score)
- **Refine** ideas toward a goal (feasibility / creativity / clarity)
- **Experiment with temperature** to balance creativity vs focus
- **(Optional) Auto-evaluate** ideas with a simple LLM judge

Run cells **top-to-bottom** the first time, then re-run/modify any cell as needed.

## Cell 1 — Install dependencies
**Why this cell matters:** Ensures your environment has only the two required packages. If they're already installed, this does nothing.

**Functions provided by this cell:** None — it's setup only.

In [1]:
# %pip is Jupyter's magic to install packages into the current kernel.
# We pin only what we need: the official OpenAI Python SDK and dotenv for .env loading.
%pip install --upgrade openai python-dotenv


Note: you may need to restart the kernel to use updated packages.


## Cell 2 — Environment & client setup
**Why this cell matters:** It wires up authentication and sets a default model so all subsequent cells can call the API.

**Functions provided by this cell:** None — it exposes `client` and `MODEL` to other cells.

In [None]:
import os
os.environ.pop("OPENAI_API_KEY", None)


In [None]:
print('Model:', MODEL, '| API key present:', 'OPENAI_API_KEY' in os.environ)

In [7]:
import os, json, time, random  # os/env access; json for (de)serialization; time/random for simple retry/backoff
from typing import List, Dict, Any  # type hints for clarity and editor help
from dotenv import load_dotenv      # loads .env files into environment variables
from openai import OpenAI           # official OpenAI Python SDK client

load_dotenv()  # reads .env in the working directory so OPENAI_API_KEY is available

# If you prefer to paste the key interactively instead of using .env, uncomment the two lines below.
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter OPENAI_API_KEY (hidden): ')

MODEL = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')  # default model name; configurable via .env
client = OpenAI()  # constructs a client that reads OPENAI_API_KEY from environment

print('Model:', MODEL, '| API key present:', 'OPENAI_API_KEY' in os.environ)  # quick sanity check


Model: gpt-4o-mini | API key present: True


## Cell 3 — Taxonomy & minimal helpers
**Why this cell matters:** These utilities keep outputs consistent, cheap, and parseable.

**Functions provided:**
- `clamp_0_10(x)`: normalize model scores into 0..10 ints
- `impact(user_value, feasibility, novelty)`: compute a transparent 0..100 impact score
- `chat_json(messages, temperature, max_tokens)`: single, JSON-mode OpenAI callsite

We also define a fixed `CATEGORIES` list to keep labels consistent across runs.

In [8]:
CATEGORIES = ['Product','Feature','Process','Growth','Ops','Research','Content','Other']  # strict taxonomy to avoid drift

def clamp_0_10(x) -> int:
    """Coerce any numeric-ish input to an int in [0,10]. Keeps scores predictable."""
    try:
        return max(0, min(10, int(round(float(x)))))  # round then clamp to range
    except Exception:
        return 0  # fallback if the model returns non-numeric text

def impact(user_value: int, feasibility: int, novelty: int) -> float:
    """Simple, explicit formula (50/30/20). You control the weights; model doesn't."""
    return round((0.5*user_value + 0.3*feasibility + 0.2*novelty) * 10, 1)  # scale to 0..100

def chat_json(messages: List[Dict[str, str]], temperature=0.7, max_tokens=1200) -> Dict[str, Any]:
    """Single place to call the Chat Completions API and enforce JSON outputs."""
    resp = client.chat.completions.create(
        model=MODEL,                   # which model to use
        messages=messages,             # conversation payload (system+user)
        temperature=temperature,       # creativity vs focus knob
        max_tokens=max_tokens,         # hard cap to control cost
        response_format={'type': 'json_object'},  # ask model to return a JSON object
    )
    return json.loads(resp.choices[0].message.content)  # parse JSON text into a Python dict


## Cell 4 — Prompt templates (Generator, Refiner, Judge)
**Why this cell matters:** Prompts are the contract the model must follow. We keep them explicit and JSON-oriented.

**Functions provided by this cell:** None — it defines string templates referenced by later cells.

In [9]:
GEN_SYS = 'You are an expert product strategist. Return ONLY valid JSON. No markdown.'  # sets role + output format
GEN_USR = """Domain: {domain}
Generate {n} distinct, high-quality ideas.

Each idea must include:
- title (short, specific)
- summary (2–4 sentences)
- category (one of: Product, Feature, Process, Growth, Ops, Research, Content, Other)
- user_value (0–10), feasibility (0–10), novelty (0–10)

Return JSON:
{{
  "ideas": [
    {{
      "title":"...", "summary":"...", "category":"Product",
      "user_value":7, "feasibility":6, "novelty":8
    }}
  ]
}}
"""  # tight schema example so the model knows the exact shape

REF_SYS = 'You are a concise product editor. Return ONLY valid JSON. No markdown.'  # refiner role + format
REF_USR = """Refinement goal: {goal}  # feasibility | creativity | clarity
Base idea:
{idea_json}

Rules:
- Keep the core intent but improve for the stated goal.
- Propose 3–6 concrete changes, then output ONE refined idea object.

Return JSON:
{{
  "changes": ["..."],
  "refined": {{
    "title":"...", "summary":"...", "category":"Product",
    "user_value":0, "feasibility":0, "novelty":0
  }}
}}
"""  # refiner requires both a change list and a single refined object

JUDGE_SYS = 'You are a strict evaluator. Return ONLY valid JSON. No markdown.'  # evaluator role
SCORE_USR = """Score this idea for the domain "{domain}" using this rubric:
{rubric}

Idea:
{idea_json}

Return JSON:
{{ "score": 0.0, "rationale": "1–3 sentences" }}
"""  # produces a numeric score and a terse rationale for auditing


## Cell 5 — Core functions (Generate, Refine, Judge)
**Why this cell matters:** These are the actions you actually call. Each function is short and transparent.

**Functions provided:**
- `generate_ideas(domain, n, temperature)` → list of structured ideas
- `refine_ideas(ideas, goal, temperature)` → list of refined ideas
- `judge_score(ideas, domain, rubric, temperature)` → list of scores with rationales

In [10]:
from typing import Any  # for explicit list[dict] annotations

def generate_ideas(domain: str, n: int = 10, temperature: float = 0.8) -> list[dict]:
    # Build the two-message conversation for generation
    messages = [
        {"role": "system", "content": GEN_SYS},  # constrain the assistant
        {"role": "user", "content": GEN_USR.format(domain=domain, n=n)},  # provide schema + instructions
    ]
    data = chat_json(messages, temperature=temperature)  # call API once
    out: list[dict] = []
    for it in data.get('ideas', []):  # iterate ideas returned by the model
        uv = clamp_0_10(it.get('user_value', 0))         # normalize numeric fields
        fe = clamp_0_10(it.get('feasibility', 0))
        nv = clamp_0_10(it.get('novelty', 0))
        cat = str(it.get('category', 'Other')).strip()   # sanitize category
        if cat not in CATEGORIES:
            cat = 'Other'
        out.append({                                   # assemble the structured record
            'domain': domain.strip(),
            'title': str(it.get('title','')).strip(),
            'summary': str(it.get('summary','')).strip(),
            'category': cat,
            'user_value': uv,
            'feasibility': fe,
            'novelty': nv,
            'impact_score': impact(uv, fe, nv),        # compute deterministic composite
            'temperature': float(temperature)          # log the knob used for analysis later
        })
    return out

def refine_ideas(ideas: list[dict], goal: str = 'feasibility', temperature: float = 0.3) -> list[dict]:
    refined = []
    for idea in ideas:
        messages = [
            {"role": "system", "content": REF_SYS},  # editing persona
            {"role": "user", "content": REF_USR.format(goal=goal, idea_json=json.dumps(idea, ensure_ascii=False))},
        ]
        data = chat_json(messages, temperature=temperature, max_tokens=800)  # ask for refined object
        r = data.get('refined', {})
        uv = clamp_0_10(r.get('user_value', idea.get('user_value', 0)))  # keep sane defaults
        fe = clamp_0_10(r.get('feasibility', idea.get('feasibility', 0)))
        nv = clamp_0_10(r.get('novelty', idea.get('novelty', 0)))
        cat = str(r.get('category', idea.get('category', 'Other'))).strip()
        if cat not in CATEGORIES:
            cat = 'Other'
        refined.append({
            'domain': idea.get('domain', ''),
            'title': str(r.get('title', idea.get('title',''))).strip(),
            'summary': str(r.get('summary', idea.get('summary',''))).strip(),
            'category': cat,
            'user_value': uv,
            'feasibility': fe,
            'novelty': nv,
            'impact_score': impact(uv, fe, nv),
            'temperature': float(temperature)
        })
    return refined

def judge_score(ideas: list[dict], domain: str, rubric: str, temperature: float = 0.2) -> list[dict]:
    """Optional auto-eval: returns judge_score + rationale for each idea."""
    out = []
    for idea in ideas:
        messages = [
            {"role": "system", "content": JUDGE_SYS},  # evaluator persona
            {"role": "user", "content": SCORE_USR.format(domain=domain, rubric=rubric, idea_json=json.dumps(idea, ensure_ascii=False))},
        ]
        data = chat_json(messages, temperature=temperature, max_tokens=400)
        out.append({
            'title': idea['title'],
            'judge_score': float(data.get('score', 0.0)),
            'judge_rationale': str(data.get('rationale', '')).strip(),
        })
    return out


## Cell 6 — Generate ideas (example)
**Why this cell matters:** This is the first end-to-end run. You can change the `domain`, `n`, and `temperature` to see different behavior.

**Function provided by this cell:** None — it produces a list named `ideas` in memory for later cells.

In [25]:
domain = 'Meal Plan'  # <-- change the domain to your target space
ideas = generate_ideas(domain, n=8, temperature=0.8)  # higher temp -> more variety (and noise)
len(ideas), ideas[0]  # quick peek: count and first idea


(8,
 {'domain': 'Meal Plan',
  'title': 'Personalized Meal Plans',
  'summary': 'Create meal plans tailored to individual dietary needs, preferences, and health goals. Users can input their restrictions, allergies, and goals to receive a custom weekly plan, maximizing nutritional value and satisfaction.',
  'category': 'Product',
  'user_value': 9,
  'feasibility': 7,
  'novelty': 8,
  'impact_score': 82.0,
  'temperature': 0.8})

## Cell 7 — Save to JSON/CSV
**Why this cell matters:** Persisting artifacts lets you diff runs, share outputs, or import into Sheets/Coda.

**Functions provided by this cell:** `save_json`, `save_csv`.

In [20]:
import csv  # standard lib CSV for lightweight exports

def save_json(path: str, rows: list[dict]):
    with open(path, 'w', encoding='utf-8') as f:         # open the file for writing
        json.dump(rows, f, ensure_ascii=False, indent=2)  # pretty-print JSON for readability

def save_csv(path: str, rows: list[dict]):
    if not rows:
        return  # nothing to write
    # Use a union of keys so the header covers all fields (avoids DictWriter key mismatch).
    keys = sorted({k for r in rows for k in r.keys()})
    with open(path, 'w', newline='', encoding='utf-8') as f:
        w = csv.DictWriter(f, fieldnames=keys)  # create writer with full header
        w.writeheader()                         # write header row
        for r in rows:                          # write each record
            w.writerow(r)

save_json('ideas.json', ideas)
save_csv('ideas.csv', ideas)
"Saved ideas.json and ideas.csv"  # status string as cell output


'Saved ideas.json and ideas.csv'

## Cell 8 — Refine ideas (goal-directed)
**Why this cell matters:** Converts fuzzy ideas into more buildable/creative/clear variants.

**Function provided by this cell:** None — it creates `refined` for later analysis.

In [26]:
goal = 'creativity'  # choose: 'feasibility' | 'creativity' | 'clarity'
refined = refine_ideas(ideas, goal=goal, temperature=0.3)  # keep temp low for tighter edits

save_json('ideas_refined.json', refined)
save_csv('ideas_refined.csv', refined)
len(refined), refined[0]  # confirmation + first refined item


(8,
 {'domain': 'Meal Plan',
  'title': 'Dynamic Personalized Meal Plans',
  'summary': 'Leverage AI to create dynamic meal plans tailored to individual dietary needs, preferences, and health goals. Users can input their restrictions, allergies, and goals to receive a custom weekly plan, enhanced with seasonal recipes, community sharing, and gamified challenges for a fun and engaging experience.',
  'category': 'Product',
  'user_value': 10,
  'feasibility': 8,
  'novelty': 9,
  'impact_score': 92.0,
  'temperature': 0.3})

## Cell 9 — Temperature experiment
**Why this cell matters:** Lets you compare creativity vs. focus by logging diversity and averages.

**Function provided by this cell:** None — it outputs a small summary table.

In [22]:
temps = [0.3, 0.7, 1.0]  # pick a few values to compare
runs: dict[float, list[dict]] = {}

for t in temps:
    runs[t] = generate_ideas(domain, n=8, temperature=t)  # generate per temperature

def unique_titles(rows: list[dict]) -> int:
    return len({r['title'].strip().lower() for r in rows})  # crude diversity measure

summary_rows = []
for t in temps:
    rows = runs[t]
    avg_impact = round(sum(r['impact_score'] for r in rows) / len(rows), 1)
    avg_novelty = round(sum(r['novelty'] for r in rows) / len(rows), 1)
    summary_rows.append({
        'temperature': t,
        'n': len(rows),
        'unique_titles': unique_titles(rows),
        'avg_impact': avg_impact,
        'avg_novelty': avg_novelty,
    })
summary_rows  # inspect: higher temps usually increase diversity and novelty


[{'temperature': 0.3,
  'n': 8,
  'unique_titles': 8,
  'avg_impact': 73.4,
  'avg_novelty': 7.1},
 {'temperature': 0.7,
  'n': 8,
  'unique_titles': 8,
  'avg_impact': 70.8,
  'avg_novelty': 7.2},
 {'temperature': 1.0,
  'n': 8,
  'unique_titles': 8,
  'avg_impact': 73.6,
  'avg_novelty': 7.2}]

## Cell 10 — Optional auto‑evaluation (LLM‑as‑judge)
**Why this cell matters:** A second opinion helps triage. The judge is noisy, but useful for ranking.

**Function provided by this cell:** None — outputs `scores` and saves them as JSON.

In [27]:
rubric = 'User value (50), Feasibility (30), Novelty (20). Penalize vagueness.'  # transparent scoring rule
scores = judge_score(refined or ideas, domain=domain, rubric=rubric, temperature=0.2)  # low temp for consistency
save_json('ideas_scored.json', scores)
scores[:3]  # peek at first three


[{'title': 'Dynamic Personalized Meal Plans',
  'judge_score': 58.0,
  'judge_rationale': 'The idea provides significant user value by addressing personalized dietary needs and promoting engagement through community sharing and gamification. However, the feasibility score is lowered due to potential challenges in accurately leveraging AI for dynamic meal planning. The novelty is moderate, as personalized meal plans exist but the integration of community and gamification adds a unique twist.'},
 {'title': 'Smart Grocery List & Meal Prep Assistant',
  'judge_score': 76.0,
  'judge_rationale': 'The idea provides significant user value by addressing meal planning and grocery shopping challenges, scoring high in user value. Feasibility is moderate due to potential technical challenges in syncing with local inventories and AI integration. Novelty is also good, but similar solutions exist, slightly lowering the score.'},
 {'title': 'Interactive Meal Prep Masterclass',
  'judge_score': 76.0,
 

## Cell 11 — Pick top‑N by judge score (optional)
**Why this cell matters:** Quick shortlist for review.

**Function provided by this cell:** None — just a sorted preview.

In [28]:
topN = 5  # how many to shortlist
sorted(scores, key=lambda x: x['judge_score'], reverse=True)[:topN]


[{'title': 'Smart Grocery List & Meal Prep Assistant',
  'judge_score': 76.0,
  'judge_rationale': 'The idea provides significant user value by addressing meal planning and grocery shopping challenges, scoring high in user value. Feasibility is moderate due to potential technical challenges in syncing with local inventories and AI integration. Novelty is also good, but similar solutions exist, slightly lowering the score.'},
 {'title': 'Interactive Meal Prep Masterclass',
  'judge_score': 76.0,
  'judge_rationale': 'The idea offers high user value by providing personalized meal prep plans and community engagement, which can enhance adherence to meal plans. However, feasibility is somewhat limited due to the need for quality video production and community management. The novelty is moderate, as interactive meal prep content exists but may not be widely implemented in this format.'},
 {'title': 'Personalized Allergy-Friendly Meal Planning',
  'judge_score': 76.0,
  'judge_rationale': 'Th