# 🎓 Master Thesis: Prompt Engineering with Gemma-2B
This notebook demonstrates Prompt Engineering using the Mistral-7B-Instruct model on a dataset of beginner-level Python Q&A pairs.

We'll test zero-shot and few-shot prompting strategies and store the results for evaluation.

In [1]:
# ✅ Install Required Libraries
!pip install -q transformers accelerate datasets huggingface_hub

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m46.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# ✅ Login to HuggingFace (insert your token below)
from huggingface_hub import login
login('acess token here')  # Replace with your actual token inside quotes

In [3]:
# ✅ Import Libraries
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch, json
from pathlib import Path

In [4]:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_ID = "google/gemma-2b-it"  # keep the instruct variant

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)

# Deterministic decoding config (prevents random guessing)
GEN_KW = dict(
    max_new_tokens=256,
    do_sample=False,  # <- crucial: disables sampling (temperature/top_p ignored)
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [5]:
GEN_KW = dict(
    max_new_tokens=512,           # give enough room to finish answers
    do_sample=False,              # deterministic
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)

def build_prompt(user_question: str, context: str = "") -> str:
    rules = (
        "You are a careful assistant. Follow STRICTLY:\n"
        "1) Only answer using the provided context (if any).\n"
        "2) If the answer is not in the context or you are uncertain, reply EXACTLY with: Sorry I do not have that information\n"
        "3) Do not add any explanation, punctuation, or extra words when refusing.\n"
        "4) When you do know the answer, explain it in detail with at least 3 sentences and examples if possible.\n"
        "5) Do not rephrase the refusal.\n"
    )

    if context.strip():
        content = rules + f"\nUse ONLY this context to answer:\n---\n{context}\n---\n\nQuestion: {user_question}"
    else:
        content = rules + f"\nQuestion: {user_question}"

    messages = [{"role": "user", "content": content}]
    return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

In [6]:
def ask_model(question_text: str, context: str = "") -> str:
    prompt = build_prompt(question_text, context)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, **GEN_KW)

    # Decode without prompt-length slicing
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Try to find where the answer starts by splitting on the question
    if question_text in generated_text:
        answer = generated_text.split(question_text, 1)[-1].strip()
    else:
        answer = generated_text.strip()

    # If it's exactly the refusal phrase, keep it clean
    if answer.lower().startswith("sorry i do not have that information"):
        return "Sorry I do not have that information"

    return answer


In [25]:
question = "What is the new --no-gitignore flag in Python 3.13 venv and what does it do?"
answer = ask_model(question)
print(answer)


model
Sorry, I do not have access to real-time information, therefore I cannot answer this question.


# **Part 2: Retrieval-Augmented Generation (RAG) Implementation**



In [13]:
# Step 1 : Load all records
import json

path = "/content/python_release_kb.jsonl"
kb_records = []
with open(path, "r", encoding="utf-8") as f:
    for line in f:
        kb_records.append(json.loads(line))

print("Total records:", len(kb_records))
print("Keys in first record:", list(kb_records[0].keys()))
print("First record title:", kb_records[0]["title"])


Total records: 8
Keys in first record: ['id', 'title', 'kind', 'version', 'released', 'urls', 'content', 'answer_card', 'timestamp']
First record title: What's Actually New in Python 3.13


In [14]:
# Step 2: Simple keyword retrieval (BM25-lite using scikit-learn TF-IDF)
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Prepare the corpus for retrieval (content + title for each record)
documents = [rec["title"] + " " + rec["content"] for rec in kb_records]

# Fit TF-IDF (tiny, runs on CPU fast)
vectorizer = TfidfVectorizer(stop_words="english")
doc_vectors = vectorizer.fit_transform(documents)

def retrieve(query, top_k=3):
    """Return top_k most relevant KB records for the query."""
    query_vec = vectorizer.transform([query])
    sims = cosine_similarity(query_vec, doc_vectors).flatten()
    top_idx = sims.argsort()[::-1][:top_k]
    return [(sims[i], kb_records[i]) for i in top_idx]

# Quick test
query = "What changes did PEP 701 bring to f-strings in Python 3.12?"
results = retrieve(query, top_k=2)
for score, rec in results:
    print(f"[score={score:.2f}] {rec['title']}")


[score=0.51] PEP 701: f-strings formalized (Python 3.12)
[score=0.12] PEP 688: Python-level buffer protocol (3.12)


In [16]:
# Step 3: Retrieval + Model Answer
from IPython.display import display, Markdown

def build_context(query, top_k=1):
    """Retrieve top-k records and build a context string for the model."""
    results = retrieve(query, top_k=top_k)
    context_blocks = []
    for score, rec in results:
        block = f"### {rec['title']}\n{rec['content']}\nSources: " + ", ".join([s['url'] for s in rec['answer_card']['sources']])
        context_blocks.append(block)
    return "\n\n".join(context_blocks)

def rag_answer(query, top_k=1):
    """Generate an answer using retrieval + Gemma2B."""
    context = build_context(query, top_k=top_k)

    prompt = f"""
You are a Python release assistant.
Use ONLY the context below to answer the user’s question.
If the premise in the question is wrong (e.g., wrong version), politely correct it.
Always mention version/PEP/module, include one short example if possible, and show sources at the end.

User Question:
{query}

Context:
{context}

Answer:
"""

    # Call your model (adjust if your wrapper is different)
    answer = ask_model(prompt)   # <-- replace this with however you call Gemma2B in Part 1
    display(Markdown(answer))
    return answer

# Quick demo with one of your failed questions
rag_answer("What changes did PEP 701 bring to f-strings in Python 3.12?")


model
Sure, here's the answer to the user's question:

PEP 701 introduced support for f-strings in Python 3.12. This feature allows for more flexible and efficient string formatting.

**Example:**

```python
name = "John"
age = 35

message = f"Hello, {name}! You are {age} years old."

print(message)
```

**Output:**

```
Hello, John! You are 35 years old.
```

**Sources:**

* PEP 701: f-strings formalized (Python 3.12): This PEP introduced support for f-strings, which allow for more flexible and efficient string formatting.
* Python 3.12 Release Notes: This page provides more details about the changes introduced in Python 3.12, including the introduction of f-strings.

'model\nSure, here\'s the answer to the user\'s question:\n\nPEP 701 introduced support for f-strings in Python 3.12. This feature allows for more flexible and efficient string formatting.\n\n**Example:**\n\n```python\nname = "John"\nage = 35\n\nmessage = f"Hello, {name}! You are {age} years old."\n\nprint(message)\n```\n\n**Output:**\n\n```\nHello, John! You are 35 years old.\n```\n\n**Sources:**\n\n* PEP 701: f-strings formalized (Python 3.12): This PEP introduced support for f-strings, which allow for more flexible and efficient string formatting.\n* Python 3.12 Release Notes: This page provides more details about the changes introduced in Python 3.12, including the introduction of f-strings.'

In [23]:
def answer_with_card_or_rag(query, top_k=1):
    for rec in kb_records:
        pattern = rec["answer_card"]["question_pattern"].lower()
        if pattern in query.lower():
            card = rec["answer_card"]
            answer = (
                f"**Answer:** {card['one_sentence']}\n\n"
                f"**Example:**\n"
                f"```python\n{card['example']}\n```\n\n"
                f"**Why it matters:** {card['why_it_matters']}\n\n"
                f"**Sources:**\n" + "\n".join(
                    [f"- {s['title']}: {s['url']}" for s in card["sources"]]
                )
            )
            display(Markdown(answer))
            return  # don’t return raw text

    # fallback if no direct match
    return rag_answer(query, top_k=top_k)


In [28]:
# Test with one of your failed questions
answer_with_card_or_rag("Which new module for TOML parsing was added in Python 3.13?")

**Answer:** None—TOML parsing arrived earlier: the stdlib module tomllib was added in Python 3.11 (not 3.13).

**Example:**
```python
import tomllib; data = tomllib.loads('x = 1')
```

**Why it matters:** Prevents version confusion when building features that depend on TOML parsing.

**Sources:**
- tomllib — Parse TOML files (Added in 3.11): https://docs.python.org/3/library/tomllib.html
- What's New In Python 3.11: https://docs.python.org/3/whatsnew/3.11.html