# RL + RAG Cyber Chatbot — Project Notebook

This notebook documents the project, explains each step, provides runnable demo cells, and includes test inputs + expected outputs.

## 1) Setup & purpose
Purpose: small RL environment where an agent chooses what to respond to users to increase 'awareness'. A small RAG index provides contextual snippets used inside responses. This is a demo-level project suitable for a course assignment.

Files included:
- `simulator.py` : Gymnasium env simulating user awareness
- `rag_index.py` : RAG retrieval using sentence-transformers + faiss-cpu
- `backend_stub.py`: policy loading and response templates
- `train_ppo.py` : training script (uses Stable-Baselines3 + Gymnasium)

Requirements: See `requirements.txt`. Use a conda env (Python 3.9/3.10 recommended).

In [None]:
# Setup cell: import local modules and build a small RAG corpus.
import os, numpy as np
from rag_index import SimpleRAG
from backend_stub import ACTION_TEMPLATES, load_policy, policy_infer
from simulator import CyberEnv

# If you didn't train, it's OK -- the notebook will fall back to a rule-based behavior.
model = None
if os.path.exists('ppo_cyber.zip'):
    try:
        model = load_policy('ppo_cyber')
        print('Loaded PPO policy from ppo_cyber.zip')
    except Exception as e:
        print('Could not load PPO policy:', e)
else:
    print('ppo_cyber.zip not found — the demo will use a simple fallback policy.')

docs = [
    "Phishing emails often use urgency and ask to click links.",
    "Check the sender domain closely; hovering reveals the URL.",
    "Use 2FA and strong unique passwords for each account.",
    "Never share your MFA codes or OTPs with anyone claiming to be support."
]
rag = SimpleRAG()
rag.build_index(docs)
print('Built RAG with %d docs.' % len(docs))


## 2) chat_local function

The `chat_local` function takes a user message plus a state vector (state_vec) whose first element is `awareness` in [0..1].
It returns `(action, response, ctx, new_awareness)` where `ctx` is the retrieved RAG snippet and `new_awareness` is an updated awareness value.
Below is the code used by the demo (already provided in `project_demo.ipynb`).

In [None]:
def chat_local(message, state_vec):
    # Ensure string input
    if message is None:
        message = ''
    message = str(message).strip()
    # Retrieve 1 contextual snippet from RAG
    ctx = rag.retrieve(message, k=1)[0] if message and hasattr(rag, 'retrieve') else (rag.docs[0] if hasattr(rag, 'docs') else 'No context.')
    aw = float(state_vec[0])

    # Use RL policy if present; otherwise fallback to simple rule
    if model is not None:
        # stable-baselines3 models expect 1D or batched observations; policy_infer handles reshape
        action = policy_infer(model, state_vec)
    else:
        # fallback: quiz only when awareness is moderately high
        action = 0 if aw < 0.45 else 2

    template = ACTION_TEMPLATES.get(int(action), 'OK.')
    response = template.format(rag=ctx) if '{rag}' in template else template

    # small deterministic awareness update for display
    new_aw = aw
    if action == 0:
        new_aw = min(1.0, aw + 0.02)
    elif action == 1:
        new_aw = min(1.0, aw + 0.03)
    elif action == 2:
        # quiz is risky — assume user learns if they answer correctly in demo
        new_aw = min(1.0, aw + 0.01)
    elif action == 3:
        new_aw = min(1.0, aw + 0.03)
    elif action == 4:
        new_aw = min(1.0, aw + 0.01)
    elif action == 5:
        new_aw = max(0.0, aw - 0.05)
    return int(action), response, ctx, float(new_aw)

# Example: unit test cases (inputs -> expected behavior)
tests = [
    ("I got an urgent email asking to reset my password", 0.2),
    ("My coworker asked me to click this link", 0.6),
    ("How to make a strong password?", 0.8)
]

for msg,aw in tests:
    state = np.zeros(32, dtype=np.float32); state[0]=aw
    a,resp,ctx,newaw = chat_local(msg, state)
    print('INPUT:', msg)
    print('START_AW:', aw, 'ACTION:', a, 'NEW_AW:', round(newaw,2))
    print('RESPONSE:', resp)
    print('-'*60)


## 3) Interactive demo
Run the cell and follow prompts. Provide an initial `awareness` (0.0-1.0). The bot will use the RAG snippet inside certain templates.
You can answer quizzes (the demo handles a very small quiz bank).

In [None]:
# Interactive REPL demo (text-mode)
print('Type EXIT to stop the demo.')
try:
    aw = float(input('Initial awareness [0.0-1.0] (e.g. 0.3): ') or '0.3')
except:
    aw = 0.3
state = np.zeros(32, dtype=np.float32); state[0]=aw

QUIZZES = [
    {'q':'Which is safer — clicking an emailed link or visiting your account directly?','choices':['A) Click emailed link','B) Visit account directly'],'answer':'B'},
    {'q':'If someone asks for your OTP, you should:','choices':['A) Give it','B) Refuse and report','C) Ask why'],'answer':'B'}
]

import random
while True:
    msg = input('\nYou: ').strip()
    if msg.lower() in ('exit','quit','stop'):
        break
    if not msg:
        print('Please type something or "exit" to quit.'); continue
    action, resp, ctx, new_aw = chat_local(msg, state)
    if action == 2:
        # Show a real quiz from small bank
        quiz = random.choice(QUIZZES)
        print('Bot (QUIZ):', quiz['q'])
        for c in quiz['choices']:
            print('  ', c)
        ans = input('Your answer (A/B/C): ').strip().upper()
        correct = (ans == quiz['answer'])
        if correct:
            print('Result: Correct ✅')
            state[0] = min(1.0, state[0] + 0.08)
        else:
            print('Result: Incorrect ❌ (Correct:', quiz['answer'] + ')')
            state[0] = min(1.0, state[0] + 0.01)
        print('Bot (feedback):', 'Good job!' if correct else 'Here is the right answer.')
    else:
        print('Bot:', resp)
        state[0] = new_aw
    print('(Awareness now: %.2f)' % state[0])
print('Demo ended.')

## 4) Evaluation cells and expected outputs
Below we include sample deterministic testcases and expected model behavior for a working demo. If you run without training, the fallback rule-based policy will be used (no external model required).

In [None]:
# Deterministic tests (same as above) and expected outputs (approximate)
cases = [
    ('urgent reset password', 0.2),
    ('click link from coworker', 0.6),
    ('how to create password', 0.8)
]
print('Expected behavior: low awareness -> short tip (action 0). medium/high -> quiz (action 2).')
for text,aw in cases:
    s = np.zeros(32, dtype=np.float32); s[0]=aw
    a,resp,ctx,newaw = chat_local(text,s)
    print('MSG:', text)
    print('start_aw:',aw,'action:',a,'new_aw:',round(newaw,2))
    print('response sample:', resp[:120])
    print('-'*40)
