## 0) Setup & Configuration
We will use `transformers` for modern NLP, `nltk` for educational comparisons, and `scikit-learn` for classic ML.

In [None]:
# !pip install nltk transformers scikit-learn torch --quiet
import os
import json
import re
import random
from pathlib import Path
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple
from collections import defaultdict

# NLP libraries
import nltk
from nltk.tokenize import word_tokenize
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

from transformers import (
    pipeline,
    AutoTokenizer,
    AutoModelForTokenClassification,
)

# Classic ML
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# ----- Notebook configuration toggles -----
METHOD_INTENT = "zero_shot"     # options: "zero_shot" | "logreg"
METHOD_POLICY = "bandit"        # options: "rules" | "bandit" | "llm_planner"

# Paths for simulated logging (learning from data)
LOG_DIR = Path("logs")
LOG_DIR.mkdir(exist_ok=True)
LOG_FILE = LOG_DIR / "conversations.jsonl"

print("Setup complete. Configuration:")
print(f"Intent Method: {METHOD_INTENT}")
print(f"Policy Method: {METHOD_POLICY}")

## 1) Tokenization — What & Why
**What:** Split text into tokens (words/subwords).
**Why:** Models operate on tokens; modern transformers use subword tokenization for robust handling of rare words.

*   **NLTK** is fine for learning (word-level tokens).
*   **Transformers** ship with their own tokenizer (WordPiece/BPE/SentencePiece). **Always use the model’s tokenizer when feeding transformer models.**

In [None]:
# Sample query
query = "My laptop is overheating after the latest update."

# NLTK word tokenization (educational)
tokens_nltk = word_tokenize(query)
print("NLTK tokens:", tokens_nltk)

# Model tokenizer (example: BERT base uncased)
# Note: This downloads the tokenizer vocabulary
tok = AutoTokenizer.from_pretrained("bert-base-uncased")
tokens_model = tok.tokenize(query)
print("Model tokens:", tokens_model)

## 2) Named Entity Recognition (NER) — What & Why
**What:** Extract entities like org names, dates, products.
**Why:** Helps tailor solutions and route KB lookups (e.g., “update”, “laptop”, “Microsoft”, “1975”).

In [None]:
# Using a pre-trained NER pipeline
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
entities = ner(query)
print("NER:", entities)

## 3) POS Tagging — What & Why
**What:** Assign parts of speech (noun, verb, adjective).
**Why:** Useful for shallow parsing, understanding structure (e.g., the subject “laptop”, the predicate “is overheating”).

In [None]:
# Hugging Face POS model (token classification)
pos_model_name = "vblagoje/bert-english-uncased-finetuned-pos"
pos_tokenizer = AutoTokenizer.from_pretrained(pos_model_name)
pos_model = AutoModelForTokenClassification.from_pretrained(pos_model_name)

pos_pipeline = pipeline("token-classification", model=pos_model, tokenizer=pos_tokenizer, aggregation_strategy="simple")
pos_tags = pos_pipeline(query)
print("POS tags:", pos_tags)

## 4) Sentiment Analysis — What & Why
**What:** Classify the tone (positive/negative).
**Why:** Use empathetic language, decide when to escalate sooner.

In [None]:
sentiment = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
sent = sentiment(query)[0]
print("Sentiment:", sent)

## 5) Intent Detection — Two Options

### A) Zero-shot Intent (Transformers)
**Why:** No labeled data needed; works well to start.
**How:** Compare user text against candidate intents and pick the highest score.

### B) Classic ML (TF‑IDF + Logistic Regression)
**Why:** Teaches supervised learning foundations; requires small labeled examples.

In [None]:
# --- Option A: Zero-shot ---
candidate_intents = ["overheating", "slow_performance", "battery_issue", "network_issue", "update_issue"]
zero_shot = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

z = zero_shot(query, candidate_intents, multi_label=False)
intent_label = z["labels"][0] if z["scores"][0] >= 0.5 else None

# --- Option B: Classic ML (TF-IDF + LogReg) ---
# Synthetic labeled training data (for demo)
train_texts = [
    "My laptop is overheating after an update",
    "The system is very slow today",
    "Battery drains quickly while idle",
    "Wi-Fi disconnects frequently",
    "Update caused errors and rollback failed",
]
train_labels = ["overheating", "slow_performance", "battery_issue", "network_issue", "update_issue"]

vectorizer = TfidfVectorizer(ngram_range=(1,2), min_df=1)
X = vectorizer.fit_transform(train_texts)
clf = LogisticRegression(max_iter=500)
clf.fit(X, train_labels)

Xq = vectorizer.transform([query])
intent_ml = clf.predict(Xq)[0]
intent_proba = clf.predict_proba(Xq).max()

print("LogReg intent:", intent_ml, "confidence:", round(intent_proba, 3))

# --- Selection ---
if METHOD_INTENT == "zero_shot":
    intent = intent_label
elif METHOD_INTENT == "logreg":
    intent = intent_ml
else:
    intent = None

print(f"Selected intent ({METHOD_INTENT}):", intent)

## 6) Slot Filling (Context Capture)
Define minimal slots required per intent. These drive the decision policy.

In [None]:
REQUIRED_SLOTS = {
    "overheating": ["is_after_update", "fan_noise", "when_started", "ambient_temp"],
    "slow_performance": ["cpu_usage", "recent_changes", "when_started"],
    "battery_issue": ["when_started", "charge_cycles", "power_settings"],
    "network_issue": ["is_wifi", "other_devices_ok", "when_started"],
    "update_issue": ["which_update", "rollback_possible", "error_codes"],
}

SLOT_QUESTIONS = {
    "is_after_update": "Did this start after a recent OS or driver update? (yes/no)",
    "fan_noise": "Is the fan noticeably louder than usual? (yes/no)",
    "when_started": "When did the issue start? (e.g., yesterday, after an update)",
    "ambient_temp": "Is the room temperature unusually hot? (yes/no)",
    "cpu_usage": "Have you noticed high CPU usage in Task Manager? (yes/no)",
    "recent_changes": "Any recent software installs or configuration changes? (list or 'no')",
    "charge_cycles": "Approximately how many battery charge cycles?",
    "power_settings": "Are you using performance or power-saver plan?",
    "is_wifi": "Is the issue on Wi‑Fi, Ethernet, or both?",
    "other_devices_ok": "Do other devices on the same network work fine? (yes/no)",
    "which_update": "Which update was installed? (version/build/date if known)",
    "rollback_possible": "Can you try rolling back the update? (yes/no)",
    "error_codes": "Any error codes or Event Viewer logs you can share?",
}

# Simple heuristic pre-fill using NER and regex
slots = {}
if intent == "overheating" and re.search(r"\bupdate\b", query, flags=re.I):
    slots["is_after_update"] = "yes"

if any(e.get("entity_group") == "DATE" for e in entities):
    slots["when_started"] = next((e["word"] for e in entities if e["entity_group"] == "DATE"), None)

print("Initial slots:", slots)

## 7) Decision-Making Policy — Three Options

### A) Rules (Baseline)
Simple if-then logic to ask for missing slots in order.

### B) Bandit (Thompson Sampling)
**Why:** Learn which follow-up question works best over time (no manual ordering).
**Reward:** Fast resolution, fewer turns, positive feedback.

### C) LLM Planner (Guardrailed)
**Why:** Dynamic planning without hard-coded flow; great for exploration.
**How:** Provide state to an LLM; it returns a structured action (e.g., `ask("fan_noise")` or `solve()`).

In [None]:
# --- A) Rules ---
def next_question_rules(intent: Optional[str], slots: Dict[str, str]) -> Optional[str]:
    if not intent:
        return "Is it overheating, slow performance, battery, network, or an update problem?"
    required = REQUIRED_SLOTS.get(intent, [])
    missing = [s for s in required if s not in slots or not slots[s]]
    return SLOT_QUESTIONS[missing[0]] if missing else None

# --- B) Bandit (Thompson Sampling) ---
bandit_params = defaultdict(lambda: {"a": 1, "b": 1})  # Beta prior per intent:slot

def thompson_select(intent: str, slots: Dict[str, str]) -> Optional[str]:
    candidates = REQUIRED_SLOTS.get(intent, [])
    remaining = [s for s in candidates if s not in slots or not slots[s]]
    if not remaining:
        return None
    
    sampled = []
    for s in remaining:
        key = f"{intent}:{s}"
        a, b = bandit_params[key]["a"], bandit_params[key]["b"]
        sampled.append((random.betavariate(a, b), s))
    
    sampled.sort(reverse=True)
    chosen_slot = sampled[0][1]
    return SLOT_QUESTIONS[chosen_slot]

def update_bandit(intent: str, slot_name: str, reward: float):
    key = f"{intent}:{slot_name}"
    if reward >= 0.5:
        bandit_params[key]["a"] += 1
    else:
        bandit_params[key]["b"] += 1

# --- C) LLM Planner (Stub) ---
def llm_plan_next(intent: Optional[str], slots: Dict[str, str], sentiment_label: str, turn: int) -> str:
    """
    Pseudocode stub. In a real system, call your LLM with a system prompt:
    - Tools: ask(slot), solve(), escalate()
    - Guardrails: ask at most one follow-up; escalate if sentiment very negative and confidence low.
    - State: intent, missing slots, sentiment, turn count.
    Return a string action like: 'ASK:fan_noise' or 'SOLVE' or 'ESCALATE'.
    """
    if not intent:
        return "ASK:intent_clarification"
    
    required = REQUIRED_SLOTS.get(intent, [])
    missing = [s for s in required if s not in slots or not slots[s]]
    
    if missing:
        # Simple heuristic for the stub: just ask the first missing one
        return f"ASK:{missing[0]}"
    else:
        return "SOLVE"

# --- Selector ---
def select_next_step(intent: Optional[str], slots: Dict[str, str], method_policy: str, turn: int, sentiment_label: str) -> Optional[str]:
    if method_policy == "rules":
        return next_question_rules(intent, slots)
    elif method_policy == "bandit":
        return thompson_select(intent, slots)
    elif method_policy == "llm_planner":
        action = llm_plan_next(intent, slots, sentiment_label, turn)
        if action.startswith("ASK:"):
            slot = action.split(":", 1)[1]
            return SLOT_QUESTIONS.get(slot, "Could you share more details?")
        elif action == "SOLVE":
            return None
        else:
            return "Escalating to human support."
    else:
        return "Method policy not recognized."

## 8) Solution Templates (Tailored Steps)
Once enough slots are filled (or the planner decides), we provide a solution.

In [None]:
def solve(intent: Optional[str], slots: Dict[str, str]) -> str:
    if intent == "overheating":
        steps = [
            "Ensure vents are unobstructed; use the laptop on a hard surface.",
            "Clean dust from fans/vents; consider compressed air.",
            "Open Task Manager → sort by CPU/GPU; close heavy background apps.",
            "Update OEM BIOS/firmware and thermal management drivers.",
        ]
        if slots.get("is_after_update") == "yes":
            steps.insert(2, "If started after an update: roll back that update or reinstall thermal drivers.")
        return "Troubleshooting steps for overheating:\n" + "\n".join(steps)
    
    elif intent == "slow_performance":
        return (
            "Troubleshooting steps for slow performance:\n"
            "1) Check CPU/RAM/Disk in Task Manager; identify top processes.\n"
            "2) Scan for malware; ensure Defender signatures are current.\n"
            "3) Disable startup apps; check indexing and background updates.\n"
            "4) Verify drivers and updates; roll back recent changes if needed.\n"
            "5) Check thermal throttling; clean fans and ensure good ventilation."
        )
    
    elif intent == "update_issue":
        return (
            "Troubleshooting steps for update issues:\n"
            "1) Identify update build/version; check known issues.\n"
            "2) Roll back or uninstall problematic update.\n"
            "3) Reinstall OEM drivers (chipset/thermal/graphics).\n"
            "4) Check Event Viewer for error codes; share any logs.\n"
            "5) Pause updates temporarily; retry after cleanup."
        )
    
    else:
        return "I have a hypothesis but need a bit more information to be sure. Please contact support."

## 9) Telemetry (Learning from Data) — Where & How
We simulate logging conversation turns and outcomes into JSONL.

**Privacy note:** Log intent/slots/outcome, not raw user text, to minimize PII.

In [None]:
def log_event(event: Dict):
    with open(LOG_FILE, "a", encoding="utf-8") as f:
        f.write(json.dumps(event) + "\n")

def reward_from_outcome(resolved: bool, turns: int, feedback_sentiment: Optional[str]) -> float:
    """
    Simple reward shaping:
    +1.0 for resolved
    -0.05 per turn
    +0.2 if feedback is positive
    """
    r = 1.0 if resolved else 0.0
    r -= 0.05 * max(turns - 1, 0)
    if feedback_sentiment == "POSITIVE":
        r += 0.2
    return r

## 10) Putting It All Together (Single-turn Demo + Follow-ups)

In [None]:
# Dialogue state
turn = 0
sent_label = sent["label"]  # initial sentiment on query

print(f"User Query: {query}")
print(f"Detected Intent: {intent}")

# Ask follow-ups until slots are complete (or planner decides to solve)
while True:
    turn += 1
    question = select_next_step(intent, slots, METHOD_POLICY, turn, sent_label)
    
    if question is None:
        # Solve
        solution = solve(intent, slots)
        print("\n=== Solution ===\n", solution)
        break
    else:
        print(f"\nQ{turn}: {question}")
        # Simulate user's answer (in a real app: capture input)
        # For demo, we auto-fill a plausible answer:
        if "update" in question:
            slots["is_after_update"] = "yes"
        elif "fan" in question:
            slots["fan_noise"] = "yes"
        elif "When did" in question:
            slots["when_started"] = "yesterday"
        elif "room temperature" in question:
            slots["ambient_temp"] = "no"
        elif "error codes" in question:
            slots["error_codes"] = "none"
        else:
            # generic fill
            slots_key = next((k for k,v in SLOT_QUESTIONS.items() if v == question), None)
            if slots_key:
                slots[slots_key] = "yes"
        
        print(f"(User answers... filling slot)")

# ----- Collect feedback -----
user_feedback = "Thanks, this resolved my issue quickly."
feedback = sentiment(user_feedback)[0]
print("\nFeedback sentiment:", feedback)

# ----- Log event & update bandit -----
resolved = True
event = {
    "intent": intent,
    "slots_filled": [k for k in slots.keys()],
    "policy": METHOD_POLICY,
    "turns": turn,
    "resolved": resolved,
    "feedback_sentiment": feedback["label"]
}
log_event(event)

# Reward learning (only meaningful for bandits)
if METHOD_POLICY == "bandit":
    # Reward the *last* asked slot as a toy proxy
    last_slot = event["slots_filled"][-1] if event["slots_filled"] else None
    if last_slot:
        r = reward_from_outcome(resolved, turn, feedback["label"])
        update_bandit(intent, last_slot, r)
        print(f"Bandit updated for {intent}:{last_slot} with reward {round(r,2)}")

## 11) Theory Recap (Cheat Sheet)

*   **Tokenization:**
    *   NLTK: word-level (for learning).
    *   Transformers: subword; always use the model’s tokenizer before feeding models.
*   **NER & POS:** Context-aware via transformers; better than rule-based for real text.
*   **Sentiment:** Guides empathy and escalation logic.
*   **Intent:**
    *   Zero-shot (fast start, no labels).
    *   LogReg (learn from labeled examples; teaches ML fundamentals).
*   **Policy:**
    *   Rules (baseline, predictable).
    *   Bandits (Thompson Sampling): Learn best next question per context; update with rewards from telemetry.
    *   LLM Planner: Delegate decision-making to an LLM with guardrails; great for iterative exploration.
*   **Learning from data:**
    *   Store compact conversation outcomes (intent, slots, turns, success, feedback sentiment).
    *   Update bandit/RL policies from historical logs; fine-tune models later if needed.
    *   Prioritize privacy & security by minimizing raw text in logs.

## 12) Extensions (Next Steps)

*   **RAG (Retrieval-Augmented Generation):** Use intent + slots to query a KB and summarize with citations.
*   **Offline RL:** Train a multi-turn policy (states, actions, rewards) from logs.
*   **Evaluation:** Track resolution rate, average turns, CSAT proxy (feedback sentiment).
*   **Guardrails:** Limit follow-ups; escalate under low confidence + negative sentiment.