# 🧠 Week 5-6 · Notebook 07 · Retrieval-Augmented Generation (RAG) Foundations

Understand when to pair LLMs with retrievers, how to architect end-to-end pipelines, and which manufacturing scenarios benefit most from RAG.

## 🎯 Learning Objectives
- Map RAG components and how they integrate with existing manufacturing systems.
- Compare RAG vs. fine-tuning vs. prompt-only approaches across latency, accuracy, and governance.
- Draft a pilot plan including data sources, privacy controls, and KPIs.
- Recognize common failure modes (stale data, hallucinations) and mitigation strategies.

## 🤝 When to use RAG?
| Scenario | Fine-tune? | RAG? | Rationale |
| --- | --- | --- | --- |
| Rapidly changing SOPs | ❌ | ✅ | Retrieval keeps procedures current without retraining. |
| Structured tabular inputs | ✅ | ⚠️ | Consider seq2seq tuner; RAG for hybrid text/table workflows. |
| Supplier compliance QA | ⚠️ | ✅ | Blend policies, contracts, and multilingual docs. |
| Safety incident classification | ✅ | ⚠️ | Fine-tune small classifier; optionally add retrieval for references. |
| Knowledge gaps across shifts | ❌ | ✅ | LLM + vector DB surfaces tribal knowledge on-demand. |

## 🏗️ Reference Architecture
1. **Ingest** — parse PDFs, DOCX, ticketing systems, MES data. Normalize metadata (machine, shift, language).
2. **Chunk** — split content into 200-400 token segments with overlap; retain source IDs.
3. **Embed** — encode chunks using domain-tuned models (see Notebook 09).
4. **Index** — store vectors + metadata in FAISS/Chroma/PGVector.
5. **Retrieve** — rank top-k results using cosine similarity and re-ranking.
6. **Generate** — compose the prompt with instructions, retrieved context, and safety reminders.
7. **Feedback** — log responses, user votes, and retrieval stats for continuous improvement.

In [None]:
from sentence_transformers import SentenceTransformer, util
import pandas as pd

corpus = pd.DataFrame([
    {
        "doc_id": "sop_press_2025",
        "text": "Hydraulic press maintenance: inspect seals, verify pressure gauges, top off fluid, confirm lockout-tagout steps.",
    },
    {
        "doc_id": "conveyor_guide",
        "text": "Conveyor troubleshooting: check belt tension, motor load, and sensor alignment before restarting.",
    },
    {
        "doc_id": "robot_pm",
        "text": "Robot arm preventive maintenance: grease joints, recalibrate torque sensors, validate vision alignment.",
    },
])

retriever_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
corpus_embeddings = retriever_model.encode(corpus.text.tolist(), convert_to_tensor=True)

In [None]:
question = "Press 12 shows inconsistent hydraulic pressure. What should the technician verify first?"
query_embedding = retriever_model.encode(question, convert_to_tensor=True)

top_k = 2
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=top_k)[0]

retrieved = corpus.iloc[[hit['corpus_id'] for hit in hits]].assign(score=[round(hit['score'], 3) for hit in hits])
retrieved

In [None]:
context_block = "\n".join(
    f"- ({row.doc_id}) {row.text}" for _, row in retrieved.iterrows()
)

prompt = f"""
SYSTEM: You are a maintenance advisor. Cite sources by doc_id.
CONTEXT:
{context_block}

QUESTION: {question}
RESPONSE:
""".strip()

print(prompt)

In [None]:
from transformers import pipeline

rag_generator = pipeline(
    "text-generation",
    model="tiiuae/falcon-7b-instruct",
    max_new_tokens=180,
    temperature=0.2,
)

rag_response = rag_generator(prompt)[0]["generated_text"]
print(rag_response)

## Prompt Composition
````text
SYSTEM: You are a maintenance advisor. Use only the provided context.
CONTEXT:
{retrieved_passages}

QUESTION: {question}
RESPONSE:
````

## 🏭 Manufacturing Use Cases
| Persona | Problem | RAG Value |
| --- | --- | --- |
| Reliability engineer | Search historical tickets + SOPs | Retrieve tribal knowledge with citations |
| Shift supervisor | Answer "what changed" during handover | Blend shift logs + alerts |
| EHS specialist | Check compliance requirements | Surface OSHA clauses and internal policies |
| Procurement | Compare supplier clauses | Query contracts and standards side-by-side |
| Technician | Troubleshoot on the floor | Access quick steps from manuals via mobile

## ⚠️ Risks & Mitigations
| Risk | Impact | Mitigation |
| --- | --- | --- |
| Hallucinated citations | Lose user trust | Enforce answer templates, confidence scores |
| Stale documents | Out-of-date procedures | Nightly ingestion jobs, freshness metadata |
| Latency spikes | User abandonment | Cache frequent questions, approximate ANN |
| Access control gaps | Data leakage | Row-level security, signed URLs |
| Missing context | Partial answers | Retrieval fallback (keywords) + user feedback loop |

## 📊 Success Metrics
- **Answer accuracy:** SME review score ≥ 4/5 across 50 queries.
- **Citation coverage:** ≥ 95% of answers include source references.
- **Latency:** < 3 seconds for top-5 retrieval + generation.
- **Adoption:** 60% of reliability engineers active weekly.
- **Safety compliance:** Zero unsupported safety recommendations.

## 🛠️ Pilot Blueprint
1. **Scope** — pick one line (e.g., press shop) and 3 document types.
2. **Data** — ingest last 12 months of SOPs, tickets, audit findings.
3. **Infrastructure** — configure vector DB (Chroma or PGVector) + access policies.
4. **LLM** — start with instruction-tuned 7B model; define prompt template.
5. **Evaluation** — run 100 curated questions, collect SME thumbs-up/down.
6. **Rollout** — integrate into Teams/Slack bot with analytics dashboards.

## 🔐 Governance & Compliance Checklist
- Classify documents (public/internal/confidential) before ingestion.
- Apply row-level security or embeddings encryption for sensitive data.
- Log user queries and retrieved chunks for audits.
- Align with IT/OT policies for network segregation.

## ✅ Implementation Checklist
- [ ] Document inventory categorized and access-controlled
- [ ] Vector store provisioned with monitoring hooks
- [ ] Prompt template reviewed by SMEs
- [ ] Evaluation harness with accuracy/citation metrics
- [ ] Governance plan signed off by IT, EHS, Quality

## 📚 References
- *Retrieval-Augmented Generation for Industry*, Siemens 2024
- Chroma & FAISS documentation
- Week 09 Vector Embeddings notebook (coming up)
- Manufacturing data governance standard (2025)

## 🧪 Lab Assignment
1. Select 50 SOP snippets + 50 maintenance tickets and ingest into a lightweight vector store.
2. Define a retrieval prompt template with citation requirements.
3. Evaluate the system on 30 curated questions, logging accuracy, latency, and citation coverage.
4. Identify two failure cases and propose mitigation steps.
5. Summarize pilot recommendations (timeline, resources, success KPIs).