### Environment Setup

In [1]:
import torch

if torch.cuda.is_available():
    print("Using GPU:", torch.cuda.get_device_name(0))
    print("VRAM usage:", torch.cuda.memory_allocated() / 1e9, "GB")
else:
    print("Using CPU only")


Using GPU: Tesla T4
VRAM usage: 0.0 GB


In [2]:
!pip install bitsandbytes accelerate datasets peft transformers trl \
            faiss-cpu sentence-transformers



# Preprocessing

## Clean & Chunk & Embed Blog Content

In [3]:
from sentence_transformers import SentenceTransformer
import json
import re

with open("withsandra_blogs.json", "r", encoding="utf-8") as f:
    blogs = json.load(f)

def clean_content(text):
    text = re.sub(r'^.*Sandra.*\d{4}.*\n?', '', text, flags=re.IGNORECASE)
    text = re.sub(r'(?i)P\.?S\.?:?.*', '', text)
    text = re.sub(r'^-?Sandra.*', '', text, flags=re.IGNORECASE)
    text = re.sub(r'👉.*', '', text)
    return re.sub(r'\s+', ' ', text).strip()

def remove_emojis(text):
    emoji_pattern = re.compile(
        "["
        "\U0001F600-\U0001F64F"
        "\U0001F300-\U0001F5FF"
        "\U0001F680-\U0001F6FF"
        "\U0001F1E0-\U0001F1FF"
        "\U00002702-\U000027B0"
        "\U000024C2-\U0001F251"
        "]+", flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)

def clean_title(text):
    text = clean_content(text)
    return remove_emojis(text)

def chunk_text(text, chunk_size=200):
    words = text.split()
    return [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

chunks = []
metadata = []

for blog in blogs:
    raw_title = blog["title"]
    title_cleaned = clean_title(raw_title)

    raw_content = blog["content"]
    cleaned = clean_content(raw_content)
    cleaned = remove_emojis(cleaned)

    blog_chunks = chunk_text(cleaned)
    for chunk in blog_chunks:
        chunks.append(chunk)
        metadata.append({
            "title": title_cleaned,
            "url":   blog["url"]
        })


Created 126 text chunks from 59 blogs.
Example chunk with cleaned title in metadata:
Not too long ago, cybersecurity felt like a niche field—something only banks, governments, and big tech companies worried about. It was important, sure, but for most businesses? Not exactly top of mind. Then, everything changed. Tech exploded. AI, cloud computing, automation, and blockchain reshaped industries practically overnight. Cybercrime leveled up. Hackers got smarter, attacks got more devastating, and suddenly even hospitals and supply chains were targets. Security became non-negotiable. Every company—big or small—realized they needed cybersecurity professionals. ASAP. However, there’s one platform that’s well ahead of the curve to keeping up with these changes in the cyber threat landscape. Today’s newsletter is in partnership with Palo Alto Networks is reimagining Prisma Cloud, their Cloud Native Application Protection Platform, onto Cortex, their Security Operations Platform for streamlined,

## Embeddings in a Vector Database

In [4]:
from sentence_transformers import SentenceTransformer
import json
import re

embedder = SentenceTransformer("BAAI/bge-small-en-v1.5")
embeddings = embedder.encode(chunks, batch_size=32, show_progress_bar=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Batches:   0%|          | 0/4 [00:00<?, ?it/s]

(126, 384)

In [5]:
import faiss
import numpy as np
import json

embedding_dim = len(embeddings[0])
embedding_matrix = np.vstack(embeddings).astype('float32')

index = faiss.IndexFlatL2(embedding_dim)
index.add(embedding_matrix)

print(f"FAISS index created. Total vectors: {index.ntotal}")

faiss.write_index(index, "blog_faiss.index")

with open("blog_metadatas.json", "w", encoding="utf-8") as f:
    json.dump(metadata, f, ensure_ascii=False, indent=2)

FAISS index created. Total vectors: 126


# Inference

## Load Model

In [6]:
from huggingface_hub import login

login(token="HF_TOKEN")

In [7]:
import json
import faiss
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer

# 3) Load your FAISS index + metadata
index = faiss.read_index("blog_faiss.index")
with open("blog_metadatas.json", "r", encoding="utf-8") as f:
    metadatas = json.load(f)

# with open("chunks.json", "r", encoding="utf-8") as f:
#     chunks = json.load(f)

embedder = SentenceTransformer("BAAI/bge-small-en-v1.5")

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "zy2582/llama3-cybersec-lora"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"  # "cuda" or "auto"
)

tokenizer.add_special_tokens({"pad_token": "<PAD>"})  # Add <PAD> token
tokenizer.add_special_tokens({
    "additional_special_tokens": ["<|system|>", "<|user|>", "<|assistant|>"]
})
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id    # Set pad token ID in config

model.eval()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128261, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((409

## RAG & Chat

In [11]:
# -----------------------------  CONFIG  ----------------------------- #

SYSTEM_PROMPT = (
"You are Sandra, the friendly cybersecurity career mentor from the WithSandra blog. Write in an upbeat, practical, punchy tone, using first-person singular. Keep the answer under 75 words. Avoid fluff, jargon, and filler. Use only information from the CONTEXT section; if missing, say so. Be factual, concise, and professional. Do not fabricate examples or elaborate beyond the provided context. Return exactly two blocks: (1) Answer – one short paragraph (≤75 words), start exactly with 'Answer:'; (2) Recommended Reads – 2–3 recommended blog posts, each as: • Title – URL, no extra commentary. This is a chat not mail.")
MAX_HISTORY_PAIRS = 3
TOP_K_DOCS       = 3
MAX_NEW_TOKENS   = 220
# -------------------------------------------------------------------- #

import re

VALID_URLS   = {m["url"] for m in metadatas}
URL_TO_TITLE = {m["url"]: m["title"] for m in metadatas}

URL_RE   = re.compile(r"https?://\S+")
PUNCT_RE = re.compile(r"[()<>\[\]]")


def _post_filter_recommendations(text: str) -> str:
    cleaned, seen = [], set()
    for line in text.splitlines():
        if not line.lstrip().startswith("•"):
            cleaned.append(line)
            continue

        url_match = URL_RE.search(line)
        if not url_match:
            continue

        url = PUNCT_RE.sub("", url_match.group()).rstrip(".,)")
        if url in VALID_URLS and url not in seen:
            seen.add(url)
            canon_title = URL_TO_TITLE[url]
            cleaned.append(f"• {canon_title} – {url}")

    return "\n".join(cleaned)


def retrieve(query: str, top_k: int = TOP_K_DOCS):
    q_emb = embedder.encode([query], convert_to_numpy=True).astype("float32")
    distances, indices = index.search(q_emb, top_k)

    res = []
    for dist, idx in zip(distances[0], indices[0]):
        meta = metadatas[idx]
        res.append({
            "title": meta["title"],
            "url":   meta["url"],
            "score": float(dist)
        })
    return res


def build_prompt(user_input: str, top_k_docs: int = TOP_K_DOCS) -> str:
    docs = retrieve(user_input, top_k=top_k_docs)
    context_lines = [f"[{d['title']}]({d['url']})" for d in docs]

    parts = []
    parts.append(f"<|system|>\n{SYSTEM_PROMPT.strip()}")
    parts.append(f"\n\nUse these blog titles as context:\n{context_lines}\n")
    parts.append(f"<|user|>\n{user_input}\n<|assistant|>\n")

    return "\n".join(parts)


def chat(user_input: str, max_new_tokens: int = MAX_NEW_TOKENS, debug: bool = False):
    prompt = build_prompt(user_input)
    if debug:
        print("=== RAW PROMPT ===")
        print(prompt)
        print("=== RAW PROMPT ENDS ===")

    tok_inp = tokenizer(prompt, return_tensors="pt",
                        padding=True, truncation=True).to(model.device)

    gen_out = model.generate(
        **tok_inp,
        max_new_tokens=max_new_tokens,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1
    )

    text = tokenizer.decode(gen_out[0], skip_special_tokens=True)
    if debug:
        print("=== RAW TEXT ===")
        print(text)
        print("=== RAW TEXT ENDS === \n")

    if user_input and user_input in text:
        text = text.split(user_input)[-1].strip()

    text = _post_filter_recommendations(text)

    if debug:
        print("=== USER INPUT ===")
        print(user_input)
        print("=== USER INPUT ENDS ===")
    return text


# Chatbot Interface

### Style 3

In [12]:
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
import warnings
import re

warnings.filterwarnings('ignore', category=FutureWarning, module='huggingface_hub')

chat_history = []  # Format: [(user_msg1, assistant_msg1), ...]

chat_styles = """
<style>
.chatbot-main-container {
    background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
    padding: 20px;
    border-radius: 15px;
    box-shadow: 0 4px 15px rgba(0, 0, 0, 0.1);
}

.output-widget-container {
    background-color: #ffffff;
    border: 1px solid #d1d9e6;
    border-radius: 10px;
    scrollbar-width: thin;
    scrollbar-color: #adb5bd #ffffff;
}
.output-widget-container::-webkit-scrollbar { width: 8px; }
.output-widget-container::-webkit-scrollbar-track { background: #f1f1f1; border-radius: 10px; }
.output-widget-container::-webkit-scrollbar-thumb { background: #adb5bd; border-radius: 10px; }
.output-widget-container::-webkit-scrollbar-thumb:hover { background: #555; }

.message-bubble {
    padding: 10px 15px; margin-bottom: 12px; border-radius: 15px;
    max-width: 85%; overflow-wrap: break-word; line-height: 1.4;
    color: #333; box-shadow: 0 2px 5px rgba(0, 0, 0, 0.05);
}
.user-msg {
    background-color: #dcf8c6; float: right; clear: both;
    border-bottom-right-radius: 5px; margin-left: auto;
}
.sandra-msg {
    background-color: #eef5ff; float: left; clear: both;
    border: 1px solid #d1d9e6; border-bottom-left-radius: 5px;
    margin-right: auto; color: #333;
}
.initial-greeting { background-color: #e8f0fe; }
.msg-label {
    font-weight: bold; margin-bottom: 4px; display: block;
    font-size: 0.9em; color: #555;
}
.output-widget-container a {
    color: #0056b3; text-decoration: none; word-break: break-all;
}
.output-widget-container a:hover { text-decoration: underline; }
.thinking-msg {
    color: #888; font-style: italic; background-color: #f0f0f0;
    float: left; clear: both; border: 1px solid #ddd;
    border-bottom-left-radius: 5px; margin-right: auto;
}
.clearfix::after { content: ""; clear: both; display: table; }
</style>
"""
display(HTML(chat_styles))

def linkify(text):
    url_pattern = re.compile(
        r'((?:https?://|www\.)[^\s<>"\')]+(?:\([^\s<>"]*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))'
    )
    def add_scheme(match):
        url = match.group(1)
        if '@' in url and not url.startswith(('http', 'www')):
             return url
        href = url
        if url.startswith('www.'):
            href = 'http://' + url
        return f'<a href="{href}" target="_blank" rel="noopener noreferrer">{url}</a>'
    processed_text = url_pattern.sub(add_scheme, text)
    return processed_text

initial_greeting_html = f"""
<div class='message-bubble sandra-msg initial-greeting'>
    <span class='msg-label'>Sandra:</span>Hi! I'm Sandra, your cybersecurity career mentor. How can I help you today?
</div>
<div class='clearfix'></div>
"""

title = widgets.HTML("<h2 style='color: #4a5568;'><img src='https://cdn-icons-png.flaticon.com/512/16967/16967032.png' style='width:30px; vertical-align: middle; margin-right: 8px;'>Sandra - Cybersecurity Mentor</h2>")

output_area = widgets.Output(
    layout=widgets.Layout(
        height='450px',
        overflow_y='auto',
        padding='15px',
        width='100%',
    )
)
output_area.add_class("output-widget-container")

input_box = widgets.Text(value='', placeholder='Ask Sandra about cybersecurity careers...', disabled=False, layout=widgets.Layout(flex='1 1 auto', margin='0 5px 0 0'))

submit_button = widgets.Button(description='Send', button_style='info', tooltip='Send message', icon='paper-plane', layout=widgets.Layout(flex='0 0 auto'))

def render_chat(history, current_user_input=None, thinking=False, current_assistant_msg=None):
    chat_html = initial_greeting_html

    for user_msg, assistant_msg in history:
        chat_html += f"<div class='message-bubble user-msg'><span class='msg-label'>You:</span>{user_msg}</div><div class='clearfix'></div>"

        processed_msg = assistant_msg.strip()
        if processed_msg.lower().startswith("answer"):
            prefix_len = len("Answer")
            if len(processed_msg) > prefix_len and processed_msg[prefix_len] in (' ', '\n', ':'):
                processed_msg = processed_msg[prefix_len:].lstrip(' :')

        rec_reads_marker = "Recommended Reads"
        rec_reads_html = "<br><br><b>Recommended Reads</b>"
        try:
            marker_index = processed_msg.lower().index(rec_reads_marker.lower())
            before_marker = processed_msg[:marker_index]
            after_marker = processed_msg[marker_index + len(rec_reads_marker):]
            if after_marker.strip():
                processed_msg = before_marker.rstrip() + rec_reads_html + after_marker
            else:
                processed_msg = before_marker.rstrip()
        except ValueError:
            pass

        processed_msg = processed_msg.replace('\n', '<br>')
        processed_msg = linkify(processed_msg)
        chat_html += f"<div class='message-bubble sandra-msg'><span class='msg-label'>Sandra:</span>{processed_msg}</div><div class='clearfix'></div>"

    if current_user_input:
         chat_html += f"<div class='message-bubble user-msg'><span class='msg-label'>You:</span>{current_user_input}</div><div class='clearfix'></div>"

    if thinking:
        chat_html += f"<div class='message-bubble thinking-msg'><span class='msg-label'>Sandra:</span><i>Thinking...</i></div><div class='clearfix'></div>"
    elif current_assistant_msg:
        processed_msg = current_assistant_msg.strip()
        if processed_msg.lower().startswith("answer"):
             prefix_len = len("Answer")
             if len(processed_msg) > prefix_len and processed_msg[prefix_len] in (' ', '\n', ':'):
                 processed_msg = processed_msg[prefix_len:].lstrip(' :')

        rec_reads_marker = "Recommended Reads"
        rec_reads_html = "<br><br><b>Recommended Reads</b>"
        try:
            marker_index = processed_msg.lower().index(rec_reads_marker.lower())
            before_marker = processed_msg[:marker_index]
            after_marker = processed_msg[marker_index + len(rec_reads_marker):]
            if after_marker.strip():
                processed_msg = before_marker.rstrip() + rec_reads_html + after_marker
            else:
                processed_msg = before_marker.rstrip()
        except ValueError:
            pass

        processed_msg = processed_msg.replace('\n', '<br>')
        processed_msg = linkify(processed_msg)
        chat_html += f"<div class='message-bubble sandra-msg'><span class='msg-label'>Sandra:</span>{processed_msg}</div><div class='clearfix'></div>"

    return HTML(chat_html)

def on_button_clicked(b):
    user_input = input_box.value
    if not user_input.strip():
        return

    current_input = user_input
    input_box.value = ''
    submit_button.disabled = True
    submit_button.description = 'Processing...'
    submit_button.icon = 'spinner'

    with output_area:
        clear_output(wait=True)
        display(render_chat(chat_history, current_user_input=current_input, thinking=True))

    raw_response_text = ""
    try:
        raw_response_text = chat(current_input, debug=True)
    except NameError as e:
         raw_response_text = f"Error: The 'chat' function is not defined. Make sure your core logic is loaded. ({e})"
         print(raw_response_text)
    except Exception as e:
        raw_response_text = f"Sorry, an error occurred: {e}"
        print(f"Error during chat call: {e}")

    chat_history.append((current_input, raw_response_text))

    with output_area:
        clear_output(wait=True)
        display(render_chat(chat_history, thinking=False))

    submit_button.disabled = False
    submit_button.description = 'Send'
    submit_button.icon = 'paper-plane'

submit_button.on_click(on_button_clicked)

input_row = widgets.HBox([input_box, submit_button], layout=widgets.Layout(width='100%', padding='10px 0 0 0'))
ui_content = widgets.VBox([title, output_area, input_row], layout=widgets.Layout(width='100%'))
ui_container = widgets.VBox(
    [ui_content],
    layout=widgets.Layout(width='70%', margin='20px auto')
)
ui_container.add_class("chatbot-main-container")

display(ui_container)

with output_area:
    clear_output()
    display(render_chat(chat_history))


VBox(children=(VBox(children=(HTML(value="<h2 style='color: #4a5568;'><img src='https://cdn-icons-png.flaticon…

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


=== RAW PROMPT ===
<|system|>
You are Sandra, the friendly cybersecurity career mentor from the WithSandra blog. Write in an upbeat, practical, punchy tone, using first-person singular. Keep the answer under 75 words. Avoid fluff, jargon, and filler. Use only information from the CONTEXT section; if missing, say so. Be factual, concise, and professional. Do not fabricate examples or elaborate beyond the provided context. Return exactly two blocks: (1) Answer – one short paragraph (≤75 words), start exactly with 'Answer:'; (2) Recommended Reads – 2–3 recommended blog posts, each as: • Title – URL, no extra commentary. This is a chat not mail.


Use these blog titles as context:
['[Overwhelmed with Learning Cybersecurity in 2025? Follow This Simple Path.](https://www.withsandra.dev/p/overwhelmed-with-learning-cybersecurity-in-2025-follow-this-simple-path)', '[5 Ste](https://www.withsandra.dev/p/5-steps-to-pass-any-cybersecurity-certification)', '[The Rise of AI-Driven Cyberattacks: What 



=== RAW TEXT ===

You are Sandra, the friendly cybersecurity career mentor from the WithSandra blog. Write in an upbeat, practical, punchy tone, using first-person singular. Keep the answer under 75 words. Avoid fluff, jargon, and filler. Use only information from the CONTEXT section; if missing, say so. Be factual, concise, and professional. Do not fabricate examples or elaborate beyond the provided context. Return exactly two blocks: (1) Answer – one short paragraph (≤75 words), start exactly with 'Answer:'; (2) Recommended Reads – 2–3 recommended blog posts, each as: • Title – URL, no extra commentary. This is a chat not mail.


Use these blog titles as context:
['[Overwhelmed with Learning Cybersecurity in 2025? Follow This Simple Path.](https://www.withsandra.dev/p/overwhelmed-with-learning-cybersecurity-in-2025-follow-this-simple-path)', '[5 Ste](https://www.withsandra.dev/p/5-steps-to-pass-any-cybersecurity-certification)', '[The Rise of AI-Driven Cyberattacks: What You Need to 