# 🎯 Week 5-6 · Notebook 06 · Few-Shot Learning for Actionable Insights

**Module:** LLMs, Prompt Engineering & RAG  
**Project:** Build the Knowledge Core for the Manufacturing Copilot

---

Fine-tuning a model is expensive and time-consuming. **Few-Shot Learning** offers a powerful alternative: guiding a pre-trained LLM with a handful of high-quality examples directly in the prompt. This notebook demonstrates how to use this technique to build a system that suggests corrective actions for machine operators, a core feature of our Manufacturing Copilot.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
1. ✅ **Understand Few-Shot vs. Zero-Shot:** Know when to provide examples to improve performance.
2. ✅ **Curate High-Quality Exemplars:** Create a dataset of examples that reflect real-world manufacturing problems and solutions.
3. ✅ **Automate Example Selection:** Use semantic similarity to find the most relevant examples for a new query.
4. ✅ **Build and Evaluate a Few-Shot System:** Combine prompts, exemplars, and an LLM to generate useful, actionable recommendations.

## ⚙️ Setup: Model and Data

We'll continue using `google/flan-t5-base` for its balance of performance and resource efficiency. We will also create a small, curated dataset of incident tickets and the corresponding corrective actions taken by experienced engineers.

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer
import pandas as pd
import torch

device = 0 if torch.cuda.is_available() else -1
model_name = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

generator = pipeline(
    'text2text-generation',
    model=model,
    tokenizer=tokenizer,
    device=device,
    max_length=100, # Actions should be concise
    temperature=0.1,
)
# Our curated dataset of high-quality examples (exemplars)
exemplars_df = pd.DataFrame([
    {"ticket": "Vibration spike on compressor #7 after bearing replacement.", "action": "Inspect alignment and re-torque fasteners immediately."},
    {"ticket": "Hydraulic leak detected on the main clamp cylinder of Press-3.", "action": "Isolate the machine using LOTO protocol and replace the primary seals."},
    {"ticket": "Camera on SMT line is misreading part numbers due to glare.", "action": "Adjust the angle of the overhead light and recalibrate vision system thresholds."},
    {"ticket": "The packing line robot #2 is flagging repeated overcurrent alarms on its wrist axis.", "action": "Check the robot's payload to ensure it is within spec and recalibrate torque limits."},
    {"ticket": "AGV-5 is slowing down near charging station B, reporting intermittent lidar faults.", "action": "Clean the lidar sensor lens and check for firmware updates."}
])

print("Exemplar dataset loaded:")
exemplars_df

## 🛠️ Building a Few-Shot Prompt

The core idea is to construct a prompt that includes both the instructions and the examples. The LLM learns the desired input-output pattern from the examples.

A good few-shot prompt has three parts:
1.  **Instruction:** A clear command telling the model what to do.
2.  **Exemplars:** A few `input -> output` pairs.
3.  **Query:** The new input for which we want a response.

In [None]:
def build_few_shot_prompt(examples, query):
    """Constructs a prompt with instructions and examples."""
    
    # 1. The instruction
    instruction = "You are a senior reliability engineer. Based on the incident ticket, recommend a single, concise corrective action."
    
    # 2. The exemplars
    exemplar_texts = []
    for _, row in examples.iterrows():
        exemplar_texts.append(f"Ticket: {row['ticket']}\nAction: {row['action']}")
        
    # 3. The new query
    query_text = f"Ticket: {query}\nAction:"
    
    # Combine all parts
    return "\n===\n".join([instruction] + exemplar_texts + [query_text])

# Let's test it with a new incident
new_ticket = "The coolant pump for CNC-12 is showing a gradual pressure drop over the last 3 hours."
# We'll use the first 3 exemplars for our prompt
few_shot_prompt = build_few_shot_prompt(exemplars_df.head(3), new_ticket)
print("--- Generated Few-Shot Prompt ---")
print(few_shot_prompt)

print("--- LLM Response (Few-Shot) ---")
response = generator(few_shot_prompt)
print(response[0]['generated_text'])

## 🤖 Automating Exemplar Selection with Semantic Search

Manually picking examples doesn't scale. For a real system, we need to automatically find the most *relevant* examples for any given ticket. We can do this using **semantic search**.

The process is:
1.  **Embed:** Convert all our exemplar tickets into numerical vectors (embeddings) using a sentence transformer model.
2.  **Compare:** Convert the new ticket into an embedding.
3.  **Select:** Find the exemplar embeddings that are most similar (closest) to the new ticket's embedding.

In [None]:
# We need to install the sentence-transformers library
# !pip install -q sentence-transformers

from sentence_transformers import SentenceTransformer, util

# Use a lightweight but effective model for embedding
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# 1. Embed our exemplar tickets
corpus_embeddings = embedder.encode(exemplars_df['ticket'].tolist(), convert_to_tensor=True)

# 2. Embed our new ticket
query_embedding = embedder.encode(new_ticket, convert_to_tensor=True)

# 3. Calculate cosine similarity to find the closest matches
similarities = util.cos_sim(query_embedding, corpus_embeddings).squeeze()

# Add similarities to our dataframe and sort
exemplars_df['similarity'] = similarities.cpu().numpy()
retrieval_df = exemplars_df.sort_values("similarity", ascending=False)

print(f"--- Most Relevant Exemplars for query: '{new_ticket}' ---")
retrieval_df

Now we can build a prompt using the *best* examples found by semantic search.

In [None]:
# Select the top 3 most similar exemplars
top_k = 3
best_exemplars = retrieval_df.head(top_k)

# Build a new prompt with these dynamically selected examples
dynamic_prompt = build_few_shot_prompt(best_exemplars, new_ticket)

print("--- Dynamically Generated Few-Shot Prompt ---")
print(dynamic_prompt)

print("\n--- LLM Response (Dynamic Few-Shot) ---")
dynamic_response = generator(dynamic_prompt)
print(dynamic_response[0]['generated_text'])

## ✅ Next Steps

This notebook demonstrated the power of few-shot learning with dynamic exemplar selection. By providing relevant, high-quality examples, we can guide a general-purpose LLM to perform a specific, domain-intensive task without any fine-tuning.

Key takeaways:
- **Quality over Quantity:** A few good examples are better than many poor ones.
- **Relevance is Key:** Semantic search is crucial for finding the right examples for a given query.
- **Systematic Approach:** Combining a prompt template, an exemplar database, and a retrieval mechanism creates a robust and scalable system.

In the next notebook, we will explore the fundamentals of **Retrieval-Augmented Generation (RAG)**, a technique that takes this idea a step further by retrieving information from a large knowledge base (like technical manuals or past incident reports) to answer questions.