# Step 1: Install dependencies

In [2]:
!pip install -q -U transformers accelerate datasets sentencepiece pandas

# Step 2: Imports & device

In [12]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)
alt_model = "Qwen/Qwen2-1.5B-Instruct"
base_model = "gpt2"

Device: cuda


# Step 3: Load model

In [13]:
gen_alt_model = pipeline("text-generation", model=alt_model, device=0 if device == "cuda" else -1)
gen_base_model = pipeline("text-generation", model=base_model, device=0 if device == "cuda" else -1)

# New topic prompt
science_prompt = "Explain how quantum computing could impact climate modeling, in exactly 3 sentences."


config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


In [14]:
result_alt = gen_alt_model(science_prompt, max_new_tokens=80, do_sample=True, temperature=0.6, top_p=0.85)[0]["generated_text"]
print("Qwen2-1.5B-Instruct output:\n", result_alt)

result_base = gen_base_model(science_prompt, max_new_tokens=80, do_sample=True, temperature=0.6, top_p=0.85)[0]["generated_text"]
print("\nGPT-2 output:\n", result_base)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Qwen2-1.5B-Instruct output:
 Explain how quantum computing could impact climate modeling, in exactly 3 sentences. Quantum computing has the potential to revolutionize climate modeling by allowing for more accurate and efficient simulations of complex systems like weather patterns and greenhouse gas emissions. This is because quantum computers can process information at a much faster rate than classical computers, making it possible to model larger datasets and incorporate more variables into models. Additionally, quantum computing could enable the development of new algorithms that are better suited for solving optimization problems related

GPT-2 output:
 Explain how quantum computing could impact climate modeling, in exactly 3 sentences.

The final sentence is:

The results of a recent study of climate models suggest that a large portion of the warming we are seeing in the past century could be caused by the greenhouse effect.

The problem is that we don't know how to quantify the gr

When comparing the two models, Qwen2 stands out by offering a richer and more domain-focused explanation of knowledge graphs in healthcare, tying its insights directly to medical entities and practical applications. Distilgpt2, on the other hand, delivers a more surface-level, generalized description that works for wider audiences but lacks the same depth and precision. This contrast illustrates the advantage of larger, instruction-tuned models in producing contextually accurate responses, particularly valuable in sensitive areas like healthcare. Meanwhile, the simpler and less specialized nature of distilgpt2 underscores the balance between accessibility and the nuanced expertise that comes with more advanced models.

# Step 4: Decoding parameter experiments


In [15]:
coding_tips_prompt = "List 3 quick tips for improving productivity while working remotely:"
decode_settings = [
    {"temperature": 0.3, "top_p": 0.9, "top_k": 40},
    {"temperature": 0.7, "top_p": 0.8, "top_k": 80},
    {"temperature": 1.0, "top_p": 0.7, "top_k": 100},
]

for idx, cfg in enumerate(decode_settings, 1):
    response = gen_alt_model(
        coding_tips_prompt,
        max_new_tokens=90,
        do_sample=True,
        temperature=cfg["temperature"],
        top_p=cfg["top_p"],
        top_k=cfg["top_k"],
        pad_token_id=gen_alt_model.tokenizer.eos_token_id
    )[0]["generated_text"]
    print(f"\n--- Strategy {idx} | temp={cfg['temperature']} top_p={cfg['top_p']} top_k={cfg['top_k']} ---")
    print(response)


--- Strategy 1 | temp=0.3 top_p=0.9 top_k=40 ---
List 3 quick tips for improving productivity while working remotely: 
1. Create a dedicated workspace and establish a routine to help you stay focused.
2. Use tools like calendars, project management software, and time-tracking apps to keep track of your tasks and deadlines.
3. Take regular breaks to avoid burnout and maintain focus throughout the day.

Great tips! Can you provide more information on how to use project management software effectively? Sure! Here are some tips for using project management software effectively:

1. Define

--- Strategy 2 | temp=0.7 top_p=0.8 top_k=80 ---
List 3 quick tips for improving productivity while working remotely: 1. Create a dedicated workspace with good lighting and minimal distractions.
2. Set clear boundaries between work and personal life to avoid burnout.
3. Take regular breaks to reduce stress and improve focus.

These are great tips! Do you have any suggestions for how to stay motivated du

**Explanation of Decoding Parameters**


Temperature determines how deterministic or creative the output becomes. In Strategy 1 (temp=0.3), the model stays close to predictable, high-probability tokens, so the response is concise, structured, and almost “manual-like.” It provides clear, practical tips with little deviation, though it begins to over-explain when asked to elaborate on project management software.

At Strategy 2 (temp=0.7), the higher temperature adds more variety and slightly more conversational tone. The model still gives structured tips, but it also introduces a natural follow-up question-answer style (“These are great tips! Do you have any suggestions…?”). This demonstrates how moderate randomness balances reliability with a touch of interactivity.

In Strategy 3 (temp=1.0), randomness is high enough that the model becomes more verbose and somewhat redundant. It repeats itself (“Set a schedule and stick to it”) and blends in external framing (“Answer according to: 10 ways to stay productive…”), showing how increased stochasticity can create overlap, digression, or instructions bleeding into the answer.

Top-k and Top-p both control how wide the sampling pool is. In lower settings (k=40, p=0.9 in Strategy 1), the model pulls from a narrower, safer vocabulary, producing clean but conventional advice. By contrast, Strategy 3 (k=100, p=0.7) allows riskier words into consideration, which increases novelty but sacrifices consistency.

In practice:

Lower values (like Strategy 1) are best for structured, factual, or professional guidance.

Moderate values (like Strategy 2) strike a balance—ideal for tips, FAQs, or semi-creative writing.

Higher values (like Strategy 3) push creativity but risk redundancy, incomplete thoughts, or meandering phrasing.

# 6) Fabricated Examples and Mitigations (manually defined for demonstration)

In [16]:
fabricated_claims = [
    "NASA has already built a fully functional quantum computer on the International Space Station.",
    "All climate change simulations are now exclusively powered by blockchain-based quantum models."
]

print("\n# Examples of Hallucinations:")
for j, claim in enumerate(fabricated_claims, 1):
    print(f"{j}. {claim}")


# Examples of Hallucinations:
1. NASA has already built a fully functional quantum computer on the International Space Station.
2. All climate change simulations are now exclusively powered by blockchain-based quantum models.


**Fabricated – Risks & Mitigations**


Approaches to Limit Hallucinations

Retrieval-Augmented Generation (RAG): Grounding model outputs in verified sources, such as NASA’s official research archives or peer-reviewed climate studies, ensures that claims about advanced technologies are factually anchored rather than speculative.

Domain-Specific Fine-Tuning: Training models on curated datasets focused on quantum computing and climate science can reduce the likelihood of fabricating unrealistic claims, like deploying quantum computers on the International Space Station.

Parameter Adjustment: Reducing randomness with lower temperature and stricter sampling (top-k/top-p) minimizes the model’s tendency to “improvise,” favoring precise, factual responses.

Reflection
The hallucinated statements—such as quantum computers already running in space or blockchain powering all climate simulations—sound authoritative but are scientifically inaccurate. These illustrate how models can blend futuristic concepts into plausible-sounding yet false narratives. In domains like climate science or space research, such exaggerations can distort public understanding and policy discussions. Mitigation requires a multi-layered approach: retrieval grounding, targeted fine-tuning, cautious decoding strategies, and most importantly, human review to separate real breakthroughs from imagined ones. Acknowledging that models are pattern-based predictors rather than truth-verifiers is critical for deploying them responsibly.