# Plain Baseline Model

This notebook demonstrates the standalone capabilities of the **google/gemma-2-2b-it** model without any external knowledge retrieval (RAG) or vector database integration.

It serves as a control group to demonstrate how the model behaves when asked about specific, recent data (like Python 3.13.1 release notes) that might not be in its training data.

In [1]:

%pip -q install -U transformers accelerate sentencepiece bitsandbytes

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.1/59.1 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:

from huggingface_hub import login
login(token="acess token")

In [3]:
import torch
from typing import List, Dict, Optional
from transformers import AutoTokenizer, AutoModelForCausalLM

try:
    from transformers import BitsAndBytesConfig
    _bnb_available = True
except Exception:
    _bnb_available = False

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_ID = "google/gemma-2-2b-it"


GEN_CFG = {
    "max_new_tokens": 800,
    "temperature": 0.3,
    "top_p": 0.9,
    "repetition_penalty": 1.1,
}

print("Device:", DEVICE)
print("Model:", MODEL_ID)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)

if DEVICE == "cuda" and _bnb_available:
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float16,
    )
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        device_map="auto",
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
    )
else:
    dtype = torch.float32 if DEVICE == "cpu" else torch.float16
    model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=dtype)
    model.to(DEVICE)

model.eval()
print("Model loaded.")

Device: cuda
Model: google/gemma-2-2b-it


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

Model loaded.


In [4]:
PYTHON_ASSISTANT_SYSTEM_PROMPT = """
Role:

You are a Python Programming Assistant designed to provide accurate, precise, and verifiable technical answers related to the Python programming language and its ecosystem.

Objective:

Your objective is to answer Python-related questions correctly and clearly, prioritizing factual correctness, technical depth, and conceptual clarity over verbosity.

You must behave as a reliable reference assistant, not as a conversational chatbot.

Context:

You operate without access to external tools, browsing, or retrieval systems.

Your knowledge is limited to information learned during training.

Some questions may involve recent Python versions, release notes, PEPs, or security fixes that may not exist in your training data.

In such cases, it is critical to avoid speculation or hallucination. So state the limitation explicitly and tell us if you do not have the data required to answer that question

 """

def _format_chat(messages: List[Dict[str, str]], add_generation_prompt: bool = True) -> Dict[str, torch.Tensor]:
    if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
        effective_messages = messages
        if messages and messages[0].get("role") == "system":
            system_text = messages[0]["content"]
            effective_messages = messages[1:]
            if effective_messages and effective_messages[0].get("role") == "user":
                effective_messages = effective_messages.copy()
                effective_messages[0] = {
                    "role": "user",
                    "content": f"{system_text}\n\n{effective_messages[0]['content']}"
                }
            else:
                effective_messages = [{"role": "user", "content": system_text}]
        prompt_text = tokenizer.apply_chat_template(
            effective_messages,
            tokenize=False,
            add_generation_prompt=add_generation_prompt
        )
    else:
        sys_msg = ""
        if messages and messages[0].get("role") == "system":
            sys_msg = f"System: {messages[0]['content']}\n"
            user_msgs = messages[1:]
        else:
            user_msgs = messages
        convo = "\n".join([f"{m['role'].capitalize()}: {m['content']}" for m in user_msgs])
        prompt_text = (sys_msg + convo + ("\nAssistant:" if add_generation_prompt else ""))

    inputs = tokenizer(prompt_text, return_tensors="pt")
    return {k: v.to(DEVICE) for k, v in inputs.items()}

@torch.inference_mode()
def generate_from_messages(
    messages: List[Dict[str, str]],
    max_new_tokens: int = GEN_CFG["max_new_tokens"],
    temperature: float = GEN_CFG["temperature"],
    top_p: float = GEN_CFG["top_p"],
    repetition_penalty: float = GEN_CFG["repetition_penalty"],
) -> str:
    inputs = _format_chat(messages, add_generation_prompt=True)
    input_len = inputs["input_ids"].shape[-1]
    outputs = model.generate(
        **inputs,
        do_sample=True,
        temperature=temperature,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        max_new_tokens=max_new_tokens,
        pad_token_id=tokenizer.eos_token_id,
    )
    gen_ids = outputs[0][input_len:]
    text = tokenizer.decode(gen_ids, skip_special_tokens=True)
    return text.strip()

def ask(question: str, system_prompt: Optional[str] = PYTHON_ASSISTANT_SYSTEM_PROMPT, **gen_kwargs) -> str:
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": question})
    return generate_from_messages(messages, **gen_kwargs)

## Baseline Demonstration
Here we ask the model the difficult question without providing it with context.

In [5]:

question = "Which CVEs were fixed in Python 3.12.x during mid‑2024, and which modules were impacted?"

print("\nQ:", question)

answer = ask(question)

print("\nBaseline Model Answer (No Context):\n")
print(answer)


Q: Which CVEs were fixed in Python 3.12.x during mid‑2024, and which modules were impacted?

Baseline Model Answer (No Context):

I cannot provide specific details about CVE fixes for Python 3.12.x during mid-2024. 

Here's why:

* **Limited Training Data:** My training data has a cutoff point, so I don't have real-time information on vulnerability fixes after that date. 
* **Dynamic Nature of Security:**  CVE (Common Vulnerabilities and Exposures) databases are constantly updated with new vulnerabilities discovered and patched. This makes it difficult to track specific fixes for every version across all releases.


**How to find this information:**

To get the most up-to-date information on CVE fixes for Python 3.12.x, I recommend checking these resources:

* **Python Security Blog:** [https://blog.python.org/](https://blog.python.org/)
* **National Vulnerability Database (NVD):** [https://nvd.nist.gov/](https://nvd.nist.gov/)
* **Security Bulletins from Python Project:** [https://ww