# 🎛️ Week 5-6 · Notebook 05 · Prompt Engineering for Manufacturing

**Module:** LLMs, Prompt Engineering & RAG  
**Project:** Build the Knowledge Core for the Manufacturing Copilot

---

Prompt Engineering is the art and science of designing effective inputs to guide a Large Language Model (LLM) toward a desired output. For our Manufacturing Copilot, this means crafting prompts that produce reliable, safe, and factually grounded responses for maintenance engineers, operators, and plant managers.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
1. ✅ **Master Prompt Patterns:** Implement Zero-Shot, Few-Shot, and Chain-of-Thought prompts.
2. ✅ **Enforce Structure:** Use roles, constraints, and output schemas (like JSON) to control LLM behavior.
3. ✅ **Build a Prompt Library:** Create reusable prompt templates for common manufacturing tasks.
4. ✅ **Evaluate Prompts Systematically:** Design an evaluation loop to score and compare prompt variations.

## ⚙️ Setup: Choosing a Runnable Model

While massive models like GPT-4 are powerful, smaller, instruction-tuned models are often sufficient and much faster for specific tasks. We'll use `google/flan-t5-base`, a versatile model that can run on a CPU or a modest GPU.

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer
import torch

device = 0 if torch.cuda.is_available() else -1
model_name = "google/flan-t5-base"

# Using AutoModelForSeq2SeqLM for T5-style models
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

generator = pipeline(
    'text2text-generation',
    model=model,
    tokenizer=tokenizer,
    device=device,
    max_length=200,
    temperature=0.1, # Lower temperature for more deterministic, factual outputs
)

print(f"Pipeline created for {model_name} on device {device}")

## 🧪 Prompt Pattern 1: Zero-Shot Prompting

**When to use:** For simple, direct tasks where the model is expected to understand the instruction without any examples.

**Scenario:** A maintenance ticket comes in. We want the LLM to quickly summarize it.

incident_report = "During the night shift, the primary coolant pump for CNC-12 failed, causing a line stoppage. Telemetry shows a pressure drop from 60 PSI to 5 PSI over 30 seconds before the automated shutdown. The backup pump was engaged manually and production resumed after a 15-minute delay."

# A simple, direct prompt
zero_shot_prompt = f"Summarize the following incident report in one sentence: 

{incident_report}"

summary = generator(zero_shot_prompt)
print("--- Zero-Shot Summary ---")
print(summary[0]['generated_text'])

## 🧪 Prompt Pattern 2: Few-Shot Prompting

**When to use:** When you need the model to follow a specific format or handle domain-specific jargon. You provide a few examples (`shots`) to guide it.

**Scenario:** We need to extract structured data (Asset, Component, Failure Mode) from maintenance logs. A zero-shot prompt might fail, but with examples, the model learns the pattern.

In [None]:
few_shot_prompt = f"""Extract the Asset, Component, and Failure Mode from the log.

Log: "The vision system on Line-5 is failing to detect part #A55-Z. Seems like a camera calibration issue."
Asset: Line-5
Component: Vision System Camera
Failure Mode: Miscalibration
===
Log: "Press-2 is showing hydraulic pressure fluctuations. The main pump seal is likely worn out."
Asset: Press-2
Component: Hydraulic Pump Seal
Failure Mode: Wear and Tear
===
Log: "{incident_report}"
Asset:"""
# Note: We end the prompt here to guide the model to start generating the answer

structured_data = generator(few_shot_prompt)
print("--- Few-Shot Structured Data Extraction ---")
print("Asset: " + structured_data[0]['generated_text'])

## 🧪 Prompt Pattern 3: Chain-of-Thought (CoT) & Roles

**When to use:** For complex reasoning tasks. By asking the model to "think step-by-step," we force it to break down the problem, leading to more accurate results. Combining this with a **role** makes it even more powerful.

**Scenario:** A complex failure occurred. We need a root cause analysis and a recommended action. We'll assign the model the role of a "senior reliability engineer".

In [None]:
cot_prompt = f"""You are a senior reliability engineer. Analyze the following incident report by thinking step-by-step. First, identify the sequence of events. Second, state the primary and secondary symptoms. Third, propose the most likely root cause. Finally, recommend an immediate action.

Report: "{incident_report}"

Analysis:"""

analysis = generator(cot_prompt)
print("--- Chain-of-Thought Analysis ---")
print(analysis[0]['generated_text'])

## 🗂️ Building a Reusable Prompt Template Library

Hardcoding prompts is inefficient. A better approach is to create a library of templates. This promotes consistency and makes maintenance easier.

In [None]:
PROMPT_TEMPLATES = {
    "summarize_incident": "Summarize this incident report for a shift supervisor in 3 bullet points:\n\n{report}",
    "extract_entities_json": "You are a data extraction bot. From the following text, extract the asset ID, component, and a brief description of the failure. Respond ONLY with a valid JSON object with the keys 'asset_id', 'component', and 'failure_description'.\n\nText: \"{report}\"\n\nJSON:",
    "draft_safety_alert": "You are an EHS (Environment, Health, and Safety) officer. Based on the incident below, draft a 2-sentence safety alert to be posted on the factory floor. The tone should be urgent and clear.\n\nIncident: \"{report}\"\n\nAlert:"
}

# Example of using a template
json_prompt = PROMPT_TEMPLATES['extract_entities_json'].format(report=incident_report)
json_output = generator(json_prompt)

print("--- Prompt Template for JSON Output ---")
print(json_output[0]['generated_text'])

## 📊 Evaluating Prompts

How do you know which prompt is better? You test them. A systematic evaluation framework is crucial.

| Criterion | How to Measure | Example |
| --- | --- | --- |
| **Accuracy** | Compare model output to a "golden dataset" of correct answers. | Does the `extract_entities_json` prompt correctly identify the asset 95% of the time? |
| **Safety/Compliance** | Use a checklist or another LLM to flag policy violations. | Does the `draft_safety_alert` prompt ever suggest an unsafe action? |
| **Tone & Style** | Semantic similarity to a desired style guide or manual check. | Is the safety alert's tone appropriately urgent? |
| **Latency & Cost** | Measure response time and token count. | Does the Chain-of-Thought prompt take too long for a real-time interface? |
You should maintain a log of these experiments to track which prompts perform best.

import pandas as pd

# Example of an experiment tracking log
prompt_experiments = pd.DataFrame([
    {
        "prompt_name": "summarize_v1_zero_shot",
        "accuracy": 0.75, # % of summaries deemed 'good' by a human reviewer
        "safety_flags": 0,
        "avg_latency_ms": 450,
    },
    {
        "prompt_name": "summarize_v2_with_role",
        "accuracy": 0.88,
        "safety_flags": 0,
        "avg_latency_ms": 510,
    },
    {
        "prompt_name": "extract_json_v1_few_shot",
        "accuracy": 0.96, # % of fields correctly extracted
        "safety_flags": 0,
        "avg_latency_ms": 820,
    },
])

print("--- Prompt Evaluation Log ---")
prompt_experiments

## ✅ Next Steps

You now have a powerful toolkit for guiding LLMs. The key is to be systematic:

1.  **Start Simple:** Always begin with a zero-shot prompt.
2.  **Add Complexity:** If it fails, add examples (few-shot), roles, and chain-of-thought reasoning.
3.  **Enforce Structure:** Use templates and request specific output formats like JSON.
4.  **Test Everything:** Never assume a prompt is good. Evaluate it against your criteria.

In the next notebooks, we will combine these prompting techniques with **Retrieval-Augmented Generation (RAG)** to build a system that can answer questions based on your private documents.