# üß† Prompt Engineering 101

This notebook explores foundational patterns in **prompt design**, including:
- System vs user role separation
- Few-shot examples for controlled behavior
- Temperature and sampling parameters
- Simple evaluation harness for consistency

It is designed as a teaching and experimentation scaffold for AI engineers studying prompt behavior.

In [None]:
# Imports and configuration
import json, random, uuid, os
from datetime import datetime

OUTPUT_DIR = os.path.join('outputs')
os.makedirs(OUTPUT_DIR, exist_ok=True)

def build_prompt(system, user, shots=None):
    shots = shots or []
    return {
        'id': str(uuid.uuid4()),
        'system': system.strip(),
        'shots': shots,
        'user': user.strip(),
        'created': datetime.utcnow().isoformat() + 'Z'
    }

example = build_prompt(
    system='You are a careful and precise software QA engineer.',
    user='List 3 strategies to test a login page.',
    shots=[{'q': 'How do you test form validation?', 'a': 'Use both valid and invalid inputs to confirm correct behavior.'}]
)
print(json.dumps(example, indent=2))

## 1Ô∏è‚É£ Few-Shot Prompting

Few-shot prompting provides models with examples to shape response tone, format, and reasoning structure.
We can simulate prompt generation and compare consistency.

In [None]:
few_shot_examples = [
    {'q': 'What is regression testing?', 'a': 'Re-running test cases after code changes to ensure existing functionality still works.'},
    {'q': 'Explain smoke testing.', 'a': 'Basic checks to verify key functions work before deeper testing.'}
]

prompt_obj = build_prompt(
    system='You are an experienced QA engineer providing concise, structured answers.',
    user='Define exploratory testing.',
    shots=few_shot_examples
)

print(json.dumps(prompt_obj, indent=2))

## 2Ô∏è‚É£ Prompt Variants (Templates)

Different templates can emphasize tone, role, or domain knowledge. Below are examples of structured variations.

In [None]:
templates = [
    'You are an expert {role}. Provide clear, concise answers.',
    'Act as a {role} mentor guiding a new team member.',
    'As a senior {role}, respond in checklist format with reasoning.',
    'Write your response as if you were documenting best practices for {role}s.'
]

roles = ['Data Scientist', 'QA Engineer', 'AI Ethics Specialist', 'Prompt Engineer']
generated_templates = [t.format(role=r) for r in roles for t in templates]

for t in generated_templates[:4]:
    print('-', t)

## 3Ô∏è‚É£ Evaluation Harness (Mock)

Since we can‚Äôt call APIs directly here, this section simulates how you might **score prompt outputs** using clarity, completeness, and tone metrics. In a real workflow, you‚Äôd parse model responses and evaluate automatically.

In [None]:
def evaluate_prompt(prompt_text):
    metrics = {
        'clarity': random.uniform(0.7, 1.0),
        'structure': random.uniform(0.6, 1.0),
        'tone_consistency': random.uniform(0.5, 1.0)
    }
    score = sum(metrics.values()) / len(metrics)
    return round(score, 3), metrics

scores = []
for idx, tmpl in enumerate(generated_templates[:8]):
    s, m = evaluate_prompt(tmpl)
    scores.append({'template': tmpl, 'score': s, 'metrics': m})

print(json.dumps(scores[:3], indent=2))

## 4Ô∏è‚É£ Parameter Sweeps

We can log how model temperature and max token values affect consistency. (Simulated here for demo.)

In [None]:
temperatures = [0.2, 0.5, 0.7, 1.0]
max_tokens = [100, 250, 500]

results = []
for t in temperatures:
    for m in max_tokens:
        avg_score = round(random.uniform(0.6, 1.0), 3)
        results.append({'temperature': t, 'max_tokens': m, 'avg_eval_score': avg_score})

print(json.dumps(results[:5], indent=2))

## 5Ô∏è‚É£ Export Results & Summary

We log all results with timestamps for further analysis or dashboard integration.

In [None]:
summary = {
    'run_id': str(uuid.uuid4()),
    'timestamp': datetime.utcnow().isoformat() + 'Z',
    'total_templates': len(generated_templates),
    'evaluated': len(scores),
    'avg_score': round(sum(s['score'] for s in scores)/len(scores), 3)
}

summary_path = os.path.join(OUTPUT_DIR, 'prompt_tuning_summary.json')
with open(summary_path, 'w', encoding='utf-8') as f:
    json.dump({'summary': summary, 'results': scores, 'params': results}, f, indent=2)

print('Saved summary to ‚Üí', summary_path)
json.dumps(summary, indent=2)

## üß© Next Steps
- Integrate this with an actual LLM API (e.g., OpenAI, Anthropic, or local model)
- Add human-in-the-loop evaluation for qualitative feedback
- Introduce versioning for prompt templates and output samples
- Correlate prompt structure with model temperature for optimization