# 🎛️ Week 5-6 · Notebook 05 · Prompt Engineering for Manufacturing

Design prompts that coax large language models into reliable copilots for maintenance, quality, and operations teams. We'll blend personas, structure, and evaluation to drive consistent responses.

## 🎯 Learning Objectives
- Choose the right prompt pattern (zero-shot, few-shot, chain-of-thought) for manufacturing tasks.
- Layer roles, constraints, and structured outputs to reduce hallucinations.
- Build reusable prompt templates for maintenance tickets, safety briefings, and supplier outreach.
- Establish an evaluation loop that scores prompt variants for accuracy, tone, and compliance.

## 🏭 Common Prompt Scenarios
| Use Case | Stakeholder | Example Input | Desired Output |
| --- | --- | --- | --- |
| Maintenance triage | Reliability engineer | "Robot arm stalled during sealant dispense" | Root cause summary + next actions |
| Safety alert rewrite | EHS officer | Raw incident log | 3 bullet safety briefing |
| Supplier clarification | Procurement | Mixed-language email thread | Formal bilingual response |
| Executive summary | Plant manager | Shift report paragraphs | 150-word highlights with KPIs |

## 🧱 Prompt Building Blocks
1. **Persona & role** – "You are a senior controls engineer…"
2. **Task definition** – specify format, depth, and tone.
3. **Context payload** – include machine IDs, timestamps, KPI targets.
4. **Constraints & guardrails** – safety-first, cite data, respond bilingually.
5. **Output schema** – bullet list, JSON, table, or Markdown summary.

## 🧪 Prompt Pattern Matrix
| Pattern | When to use | Manufacturing Example | Tips |
| --- | --- | --- | --- |
| Zero-shot | Quick triage with limited data | Classify ticket severity | Add explicit labels to reduce drift |
| One-shot | Mixed language or niche jargon | Supplier email translation | Include bilingual exemplar |
| Few-shot | High accuracy routing | Maintenance tickets labelled by experts | Curate edge-case examples |
| Chain-of-thought | Root cause analysis | Failure analysis from logs | Request "Reasoning" + "Answer" sections |
| ReAct / Toolformer | When external tools needed | Retrieve SOP snippet before response | Provide placeholder for search results |

In [None]:
from transformers import pipeline

# Use an instruction-tuned model; swap with in-house deployment in production
generator = pipeline(
    "text-generation",
    model="tiiuae/falcon-7b-instruct",
    max_new_tokens=160,
    temperature=0.2,
)

incident = {
    "context": "Robot arm stalled during sealant dispense causing 45 seconds downtime.",
    "telemetry": "Torque exceeded 140 Nm at time of stall; axis-2 temperature +18C",
}

prompt_zero = f"Summarize the root cause of this incident: {incident['context']}"

prompt_structured = f"""
You are a reliability engineer. Respond in JSON with keys: root_cause, immediate_actions, follow_up.
Context: {incident['context']}
Telemetry: {incident['telemetry']}
Safety constraints: reference lockout-tagout if needed.
""".strip()

for label, prompt in {"zero_shot": prompt_zero, "structured": prompt_structured}.items():
    print(f"\n--- {label} ---")
    output = generator(prompt)[0]["generated_text"]
    print(output)


## ✅ Prompt Quality Checklist
- **Role clarity:** state expertise level, safety obligations, language requirements.
- **Context sufficiency:** include machine IDs, time ranges, KPIs, and relevant SOP IDs.
- **Output contract:** define tokens like JSON keys or bullet count to simplify automation.
- **Guardrails:** ban speculation, cite data source, escalate safety-critical findings.
- **Evaluation hook:** ask the model to self-rate confidence or list assumptions.

## 🗂️ Template Library
```text
You are a manufacturing reliability assistant. Respond in 3 bullet points.
Context: {context}
Task: {task}
Constraints: {constraints}
```

```text
You are an EHS officer. Draft a safety briefing in English and Spanish.
Incident: {incident}
Audience: Shift supervisors
Include: root cause, PPE reminder, next inspection date
```

```text
You are a supplier liaison. Reply in polite email format.
Thread summary: {summary}
Clarify: {questions}
Tone: Professional, appreciative
```

## 📊 Prompt Evaluation Framework
| Criterion | Metric | Tooling |
| --- | --- | --- |
| Accuracy | Exact match / semantic similarity | Azure/OpenAI evals, internal scoring scripts |
| Safety | Policy violations, hallucinated steps | Red-teaming checklists |
| Tone | Sentiment / formality | Heuristic checks, embedding similarity |
| Latency | Tokens generated vs. budget | Prompt length analyzer |
| Cost | Tokens × price | Billing dashboard |

Track metrics in an experiment log (see homework) and promote only prompts that pass thresholds.

## ⚠️ Prompt Anti-Patterns
- Stuffing raw logs without structure (causes truncation & confusion).
- Asking for "any other thoughts" when compliance matters.
- Mixing multiple tasks (classification + translation) in one request.
- Omitting negative examples for sensitive classifications.
- Forgetting to reset context between batch runs (carry-over risk).

In [None]:
import pandas as pd

experiments = pd.DataFrame([
    {
        "prompt_name": "maintenance_structured_v1",
        "pattern": "structured",
        "accuracy": 0.82,
        "safety_flags": 0,
        "avg_latency_ms": 780,
    },
    {
        "prompt_name": "maintenance_structured_v2",
        "pattern": "structured",
        "accuracy": 0.89,
        "safety_flags": 0,
        "avg_latency_ms": 840,
    },
    {
        "prompt_name": "maintenance_zero_shot",
        "pattern": "zero-shot",
        "accuracy": 0.64,
        "safety_flags": 1,
        "avg_latency_ms": 620,
    },
])

experiments

## 🧪 Lab Assignment
1. Select two manufacturing workflows (e.g., maintenance triage, supplier email).
2. Draft at least three prompt variants per workflow (zero-shot, few-shot, structured JSON).
3. Run through 20 historical examples and log metrics in the experiment tracker.
4. Present the winning prompt with evidence, risks, and fallback plan to stakeholders.
5. Archive prompts and evaluation scores in the shared prompt registry.

## 🚀 Deployment Tips
- Version prompts alongside model releases; annotate with changelog.
- Implement automated linting to catch missing constraints or schema drift.
- Pair prompts with guardrail policies (content filters, refusal handling).
- Monitor live metrics (accuracy, deflection rate, escalation volume) after rollout.

## ✅ Checklist
- [ ] Prompt templates captured with personas and constraints
- [ ] Metrics logged for accuracy, tone, safety, latency
- [ ] Winning prompts approved by EHS/IT stakeholders
- [ ] Rollback prompt defined and tested
- [ ] Prompts stored in version-controlled registry

## 📚 References
- Prompt Engineering Guide (PromptingGuide.ai)
- Manufacturing Prompt Patterns Playbook (2025)
- OpenAI Guidance on Structured Outputs (2024)
- Week 06 Homework rubric (see `HOMEWORK.md`)