# ⚙️ Week 5-6 · Notebook 01 · Introduction to Large Language Models

**Module:** LLMs, Prompt Engineering & RAG  
**Estimated time:** 4.5 hours  
**Prerequisites:** Transformers (Week 3-4) + Attention + HuggingFace basics

---

Manufacturing leaders need copilots that speak the language of maintenance logs, quality checklists, and supplier bulletins. This notebook lays the foundation for working with Large Language Models (LLMs) in industrial settings.

## 🎯 Learning Objectives
By the end of this notebook you will be able to:
1. Explain how modern LLMs are pre-trained, aligned, and deployed.
2. Compare open-source and proprietary model families for plant-floor tasks.
3. Run baseline inference on domain text using HuggingFace pipelines.
4. Build a decision matrix to choose the right model for latency, privacy, and safety constraints.
5. Design evaluation loops that incorporate manufacturing-specific metrics.

## 🧭 Roadmap
1. LLM evolution timeline and terminology
2. Anatomy of foundation model training
3. Industrial deployment considerations
4. Hands-on inference walkthroughs
5. Case study: Downtime incident assistant
6. Evaluation, safety, and governance checklists

## 🕰️ Evolution of Language Models
| Era | Representative Models | Breakthrough | Manufacturing Impact |
| --- | --- | --- | --- |
| 2013-2017 | word2vec, GloVe, ELMo | Contextual embeddings | Keyword search in maintenance manuals |
| 2018-2020 | GPT, BERT, T5 | Transformer encoder/decoder scale | Automated report summaries |
| 2021-2023 | GPT-3, PaLM, LLaMA | 100B+ parameters + instruction tuning | Conversational plant copilots |
| 2024+ | Mixtral, Claude, Llama-3 | Safety-aligned, multi-modal | Real-time troubleshooting across modalities |

### Key Definitions
- **Foundation model:** large model trained on broad corpus, adaptable to downstream tasks.
- **Instruction tuning:** supervised fine-tuning on prompt/response pairs to follow instructions.
- **RLHF (Reinforcement Learning from Human Feedback):** optimize responses for helpfulness and safety.
- **Alignment:** ensuring outputs respect policies (quality, safety, compliance).

## 🏭 Manufacturing Perspective
- **Maintenance analytics:** interpret vibration logs, create structured work orders.
- **Quality control:** summarize defect tickets, recommend countermeasures.
- **Supply chain:** draft vendor communications or translate manuals.
- **Safety:** generate checklists compliant with OSHA/ISO standards.

## ⚙️ Anatomy of an LLM Training Pipeline
1. **Data curation:** mixture of public text + domain corpora (SOPs, logs).
2. **Tokenization:** SentencePiece/BPE with <> tokens for units like `°C`, `Nm`.
3. **Pre-training:** unsupervised objectives (next-token, span corruption).
4. **Supervised fine-tuning:** align to domain tasks (incident classification).
5. **RLHF / DPO:** incorporate human feedback, safety rules, risk controls.
6. **Evaluation:** perplexity, domain accuracy, hallucination tests.
7. **Deployment:** on-prem GPU, managed endpoints, or edge devices.

### Data Stack Considerations
- Include multilingual logs (e.g., supplier emails in German/Japanese).
- Govern PII and trade secrets with masking / redaction.
- Track dataset drift across shifts, product variants, seasons.
- Maintain data cards describing lineage and quality.

In [None]:
from transformers import pipeline

downtime_report = (
    "Press-42 tripped due to hydraulic accumulator pressure drop. Operators rerouted flow to backup line. "
    "Recommend inspection of seals and replenish fluid before restart."
)
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
summary = summarizer(downtime_report, max_length=60, min_length=25, do_sample=False)[0]['summary_text']
summary

### Discussion
- What context is missing from the summary?
- Which stakeholders (maintenance planner, plant manager) benefit from condensed reports?
- When would you prefer raw logs over summaries?

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('tiiuae/falcon-7b-instruct')
text = 'Perform a predictive maintenance check on furnace line A before cycle 540.'
tokens = tokenizer.tokenize(text)
tokens[:12], len(tokens)

## 🧮 Parameter Counts & Hardware Sizing
| Model | Parameters | VRAM (fp16) | Typical Use Case |
| --- | --- | --- | --- |
| `distilbert-base-uncased` | 66M | 1.2 GB | Edge classification |
| `falcon-7b-instruct` | 7B | 14 GB | On-prem copilots |
| `mistralai/Mixtral-8x7B` | 46.7B (MoE) | 80 GB | Research assistants |
| `meta-llama/Meta-Llama-3-70B-Instruct` | 70B | 140 GB | High-accuracy digital workers |

> **Tip:** Quantization (4-bit, 8-bit) reduces VRAM footprint but may impact accuracy.

## 📝 Model Selection Framework
1. **Business objective:** Explain downtime vs. automate SOP authoring.
2. **Latency:** Real-time under 500 ms vs. offline batch.
3. **Context window:** Do you need 8k or 200k tokens for long shift logs?
4. **Privacy:** Can data leave the plant? If not, prefer open-source on-prem.
5. **Cost:** GPU availability, inference pricing, licensing terms.
6. **Safety:** Guardrails for high-risk recommendations (e.g., lock-out/tag-out).

In [None]:
from huggingface_hub import model_info

candidates = ['distilbert-base-uncased', 'hf-allenai/longformer-base-4096', 'tiiuae/falcon-7b-instruct']
decision_table = []
for name in candidates:
    info = model_info(name)
    decision_table.append({
        'model': name,
        'params': info.cardData.get('params', 'n/a'),
        'context_length': info.cardData.get('context_length', 'n/a'),
        'pipeline_tag': info.pipeline_tag
    })
decision_table

### Exercise: Model Matrix
- Add columns for latency (edge/cloud), licensing (Apache, Llama 2), and support level.
- Rate each model 1-5 against your plant's needs.
- Present the matrix to stakeholders to justify model selection.

## 🔄 Case Study · Downtime Incident Assistant
**Scenario:** create a helper that categorizes incidents and suggests first actions.

### Step 1 · Zero-shot classification
Use a generalist LLM to label ticket categories before training bespoke models.

In [None]:
from transformers import pipeline

incident = 'Vision system flagged misaligned solder joints on PCB lot 2025-A34 during night shift.'
labels = ['safety', 'quality', 'maintenance', 'supply-chain']
classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
result = classifier(incident, candidate_labels=labels, multi_label=False)
result
<VSCode.Cell id="#VSC-8fbf6d95" language="markdown">
### Step 2 · Suggest First Actions
Now, let's use a text generation model to suggest a course of action.

In [None]:
# Note: Running this cell requires significant GPU memory (~15GB) and may be slow.
# It is provided as a demonstration of how to use a large instruction-tuned model.
try:
    assistant = pipeline('text-generation', model='tiiuae/falcon-7b-instruct', trust_remote_code=True, device_map="auto")
    prompt = f"""
    Given the incident report: '{incident}'
    What is the recommended first action for a technician? Be concise.
    """
    result = assistant(prompt, max_new_tokens=50, do_sample=True, temperature=0.7)
    print(result[0]['generated_text'])
except Exception as e:
    print(f"Could not run text generation pipeline, likely due to resource constraints. Error: {e}")
    print("Skipping this step. This is expected on most consumer hardware.")

### Step 3 · Evaluate Response
How can we validate the LLM's suggestion? We can use another model for question-answering against a known-good Standard Operating Procedure (SOP).

In [None]:
sop_context = """
Standard Operating Procedure for Quality Alerts (QA-SOP-004):
1. Upon receiving a quality alert, the first action is to quarantine the affected batch to prevent further use.
2. Notify the shift supervisor and the Quality Assurance department immediately.
3. Document the incident in the Quality Management System (QMS) with all relevant details.
4. An assigned engineer will then conduct a root cause analysis.
"""

qa_pipeline = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')
question = "What is the first action for a quality alert?"
answer = qa_pipeline(question=question, context=sop_context)
answer

## 🛡️ Safety, Governance, and Responsible AI
- **Guardrails:** Implement input/output filters to prevent harmful or off-topic responses.
- **Hallucination Mitigation:** Use RAG (Week 5-6) to ground responses in factual documents.
- **Bias Audits:** Test for biases related to shifts, roles, or demographics.
- **Human-in-the-Loop:** For high-stakes decisions (e.g., machine shutdown), require human approval.

---

## 📚 Further Reading
- "Attention Is All You Need" (Vaswani et al., 2017)
- "Building Safe LLM Systems" (Anthropic, 2024)
- Llama 3, Mixtral, and Phi-3 Technical Reports