# 🌱 Gemma 2B Instruction Agent

**Goal:** You will learn how to do data prep, how to train, how to run the model, and how to save it using Google’s `gemma-2b-it` open-source model.

---


[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/Gemma2B_Instruction_Agent.ipynb)


#  Dependencies

In [None]:
!pip install transformers accelerate datasets bitsandbytes -q

# Tools & Model Setup

In [5]:
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from datasets import load_dataset
import torch

login("Enter your token here")

model_id = "google/gemma-2-2b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",              # Automatically selects GPU if available
    torch_dtype=torch.float16       # Optimized for performance
)


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

# Yaml Prompt Configuration

In [8]:
prompt = "You are Qwen, a helpful assistant.\nUser: What is the capital of France?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)


You are Qwen, a helpful assistant.
User: What is the capital of France?
Assistant: The capital of France is **Paris**. 



# Use a small sample dataset

In [18]:
from datasets import Dataset

sample_data = {
    'text': [
        'The sun is a star at the center of our solar system.',
        'Photosynthesis is the process by which green plants make food.',
        'Water freezes at 0 degrees Celsius.',
        'The Earth revolves around the sun in 365 days.'
    ]
}

dataset = Dataset.from_dict(sample_data)

def tokenize_function(example):
    return tokenizer(example['text'], padding='max_length', truncation=True, max_length=64)

tokenized_dataset = dataset.map(tokenize_function)
tokenized_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask'])

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

# Inference

In [15]:
input_text = "Explain photosynthesis to a child."
chat = tokenizer.apply_chat_template(
    [{"role": "user", "content": input_text}],
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

user
Explain photosynthesis to a child.
model
Imagine plants are like tiny chefs, and they cook their own food! 

They use sunlight, air, and water to make yummy food called sugar.  

Here's how it works:

1. **Sunlight:** Plants have special green stuff called chlorophyll that acts like a solar panel, soaking up the sun's energy.
2. **Air:** Plants take in air through tiny holes in their leaves called stomata.  The air has a gas called carbon dioxide.
3


# Save Model

In [16]:
model.save_pretrained("gemma-finetuned-demo")
tokenizer.save_pretrained("gemma-finetuned-demo")

('gemma-finetuned-demo/tokenizer_config.json',
 'gemma-finetuned-demo/special_tokens_map.json',
 'gemma-finetuned-demo/chat_template.jinja',
 'gemma-finetuned-demo/tokenizer.model',
 'gemma-finetuned-demo/added_tokens.json',
 'gemma-finetuned-demo/tokenizer.json')

# Output
`Photosynthesis is how plants eat sunlight! 🌞 They use air, water, and sunlight to make food and grow.`