<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 3: Running Open-Source LLMs

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

---

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Load and run an open-source LLM** (GPT-2) using the Transformers library
2. **Understand generation parameters** — temperature, top_p, sampling

**Duration:** ~30 minutes

**Note:** This notebook works on CPU but runs faster on GPU. On Colab, go to **Runtime > Change runtime type > T4 GPU**.

---

## 1. Environment Setup

In [None]:
# Install required packages
!pip install -q transformers accelerate torch

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"

---

## 2. Loading a Small Model (GPT-2)

In notebook 01 we used the `pipeline` API. Here we'll load the model and tokenizer directly, which gives us more control.

GPT-2 has 124M parameters — small enough to run on CPU.

In [None]:
# Load GPT-2 model and tokenizer
model_name = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move to GPU if available
model = model.to(device)

In [None]:
# Generate text
prompt = "India is a country known for"

inputs = tokenizer(prompt, return_tensors="pt").to(device)

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

---

## 3. Understanding Generation Parameters

When generating text, several parameters control the output:

| Parameter | Description | Effect |
|-----------|-------------|--------|
| `max_new_tokens` | Maximum tokens to generate | Controls output length |
| `temperature` | Randomness (0.0–2.0) | Higher = more creative/random |
| `top_p` | Nucleus sampling threshold | Limits token choices to top probability mass |
| `do_sample` | Enable sampling | `False` = greedy (always picks most likely token) |

Let's see how **temperature** affects the output.

In [None]:
# Compare different temperatures
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)

for temp in [0.3, 0.7, 1.2]:
    outputs = model.generate(
        **inputs,
        max_new_tokens=30,
        do_sample=True,
        temperature=temp,
        pad_token_id=tokenizer.eos_token_id
    )
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Temperature {temp}:")
    print(f"  {text}\n")

**What you should notice:**
- **Low temperature (0.3)** — more focused and repetitive
- **Medium temperature (0.7)** — good balance of coherence and variety
- **High temperature (1.2)** — more creative but less predictable

---

## 4. Exercise

Experiment with generation parameters! Change the `temperature` and `max_new_tokens` below and run the cell multiple times to see how the output changes.

In [None]:
# Exercise: Change the parameters and run this cell multiple times
# Try: temperature=0.2, temperature=1.5, max_new_tokens=50 vs 200

prompt = "Write a short poem about coding:"
inputs = tokenizer(prompt, return_tensors="pt").to(device)

outputs = model.generate(
    **inputs,
    max_new_tokens=100,   # Try changing this
    do_sample=True,
    temperature=0.7,      # Try changing this
    pad_token_id=tokenizer.eos_token_id
)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

---

## Key Takeaways

1. **`AutoModelForCausalLM` + `AutoTokenizer`** let you load any text generation model directly, giving you full control over tokenization and generation

2. **Temperature controls randomness** — low values (0.2–0.4) produce focused, predictable text; high values (1.0+) produce creative, varied text

---

## Additional Resources

- [HuggingFace Model Hub](https://huggingface.co/models)
- [Text Generation Strategies](https://huggingface.co/docs/transformers/generation_strategies)

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) — *Transform Graduates into Industry-Ready Professionals*

---