# GRPO Training for Italian Exercise Generator

Train V5 model using Group Relative Policy Optimization (GRPO) with comprehensive reward function.

**Hardware**: A100 GPU (Colab Pro)
**Expected time**: ~2-4 hours for 2000 samples

## Setup

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Navigate to project
%cd /content/drive/MyDrive/Colab\ Notebooks/italian_teacher

Mounted at /content/drive
/content/drive/MyDrive/Colab Notebooks/italian_teacher


In [2]:
# Install dependencies
!pip install -q transformers trl accelerate peft datasets spacy sentence-transformers
!python -m spacy download it_core_news_sm

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/564.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.6/564.6 kB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting it-core-news-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/it_core_news_sm-3.8.0/it_core_news_sm-3.8.0-py3-none-any.whl (13.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: it-core-news-sm
Successfully installed it-core-news-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('it_core_news_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.

In [3]:
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import GRPOConfig, GRPOTrainer
from datasets import Dataset

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: NVIDIA L4


## Load Reward Function

In [4]:
from src.rl.reward_function import ExerciseRewardFunction

# Initialize reward function (loads vocabulary + models)
print("Loading reward function...")
reward_fn = ExerciseRewardFunction()
print("✅ Reward function ready")

Loading reward function...
Loading spaCy model: it_core_news_sm...
✅ spaCy model loaded
Initializing scorers...
Pre-loading CEFR vocabulary (16,887 words)...
✅ Loaded 16887 Italian words from vocabulary list
✅ Loaded vocabulary for all CEFR levels
Loading sentence transformer for topic similarity...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/402 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Sentence transformer loaded
✅ Reward function initialized with all scorers
✅ Reward function ready


## Load Training Requests

In [5]:
import os

# Load pre-generated training requests
if os.path.exists("src/rl/training_requests.json"):
    print("Loading existing training requests...")
    with open("src/rl/training_requests.json", "r") as f:
        training_requests = json.load(f)
else:
    # If not exists, generate them
    from src.rl.generate_training_requests import generate_training_requests
    print("Generating new training requests...")
    training_requests = generate_training_requests(
        num_requests=2000,
        output_path="src/rl/training_requests.json"
    )

print(f"✅ Loaded {len(training_requests)} training requests")

Loading existing training requests...
✅ Loaded 2000 training requests


## Format Dataset

In [6]:
def format_prompt_with_chat_template(request: dict, tokenizer) -> str:
    """
    Format request using chat template + EXACT API prompt format.

    Combines:
    1. Llama3 chat template (required for V4)
    2. Detailed API prompt (ensures proper JSON format)
    """
    topic = request.get('topic', 'general Italian')
    grammar = request.get('grammar_focus', 'general practice')

    # Create numbered placeholders to guide the model
    exercise_numbers = ", ".join([f"#{i+1}" for i in range(request['num_exercises'])])

    topic_instruction = f"about '{topic}'"
    grammar_instruction = f"focusing on {grammar}"
    focus_text = f"{topic_instruction} {grammar_instruction}".strip()

    # Build grammar-specific instruction
    grammar_rule = ""
    if "past" in grammar.lower() or "passato" in grammar.lower():
        grammar_rule = "\n⚠️ MANDATORY: Use ONLY past tense (passato prossimo like 'ho fatto', 'sono andato' OR imperfetto like 'facevo', 'andavo'). NO present tense!"
    elif "present" in grammar.lower() or "presente" in grammar.lower():
        grammar_rule = "\n⚠️ MANDATORY: Use ONLY present tense (presente indicativo like 'faccio', 'vado'). NO past or future!"
    elif "future" in grammar.lower() or "futuro" in grammar.lower():
        grammar_rule = "\n⚠️ MANDATORY: Use ONLY future tense (futuro semplice like 'farò', 'andrò'). NO present or past!"

    # EXACT API PROMPT FORMAT - goes in user message
    user_message = f"""Create exactly {request['num_exercises']} Italian language exercises ({exercise_numbers}) in JSON format {focus_text}.

REQUIREMENTS:
Level: {request['level']}
Topic: {topic}
Grammar: {grammar}{grammar_rule}
Exercise types: {', '.join(request['exercise_types'])}

CRITICAL RULES:
1. TOPIC: Every exercise MUST be about "{topic}" - stay on topic throughout
2. REALISM: Use factual, natural scenarios appropriate for the topic
3. GRAMMAR: EVERY SINGLE exercise MUST test "{grammar}" at {request['level']} level
4. MULTIPLE CHOICE: Provide 4 DIFFERENT grammatical forms as options
5. CONSISTENCY: Do not mix different topics or introduce unrelated subjects

OUTPUT FORMAT - JSON array with exercises testing {grammar}:
[
  {{"type": "fill_in_blank", "question": "[Italian sentence about {topic} with ___ blank for {grammar}]", "correct_answer": "[conjugated form in {grammar}]", "options": null, "explanation": "[grammar rule explanation]"}},
  {{"type": "translation", "question": "Translate: [English sentence about {topic} in {grammar}]", "correct_answer": "[Italian translation using {grammar}]", "options": null, "explanation": "[grammar note]"}},
  {{"type": "multiple_choice", "question": "[Italian sentence about {topic} with blank]", "correct_answer": "[correct form in {grammar}]", "options": ["[alt1]", "[alt2]", "[alt3]", "[alt4]"], "explanation": "[why this form is correct]"}}
]

NOW GENERATE {request['num_exercises']} EXERCISES ABOUT "{topic}" TESTING "{grammar}" (remember: {grammar} ONLY!):
["""

    # Apply chat template with system + user messages
    messages = [
        {"role": "system", "content": "You are an expert Italian language teacher. Generate high-quality exercises based on the assignment specification. Output exercises in JSON format."},
        {"role": "user", "content": user_message}
    ]

    # Use tokenizer's chat template to format properly
    formatted_prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return formatted_prompt




In [7]:
import random
from datasets import Dataset

# Load tokenizer first to apply chat template
from transformers import AutoTokenizer
MODEL_PATH = "/content/drive/MyDrive/Colab Notebooks/italian_teacher/models/italian_exercise_generator_v4_merged"
temp_tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

# Create prompts WITH CHAT TEMPLATE + DETAILED INSTRUCTIONS
prompts = [format_prompt_with_chat_template(req, temp_tokenizer) for req in training_requests]

# ⚠️ PILOT RUN: Use random subset for testing
PILOT_SIZE = 400
if len(prompts) > PILOT_SIZE:
    random_indices = random.sample(range(len(prompts)), PILOT_SIZE)
    prompts = [prompts[i] for i in random_indices]
    training_requests_subset = [training_requests[i] for i in random_indices]
else:
    training_requests_subset = training_requests

# Create dataset
train_dataset = Dataset.from_dict({
    "prompt": prompts,
    "request": training_requests_subset,
})

print(f"✅ Created dataset with {len(train_dataset)} examples (PILOT RUN)")
print(f"   Using: Chat template + detailed API instructions")
print(f"\nExample prompt (first 600 chars):\n{train_dataset[0]['prompt']}...")


✅ Created dataset with 400 examples (PILOT RUN)
   Using: Chat template + detailed API instructions

Example prompt (first 600 chars):
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert Italian language teacher. Generate high-quality exercises based on the assignment specification. Output exercises in JSON format.<|eot_id|><|start_header_id|>user<|end_header_id|>

Create exactly 4 Italian language exercises (#1, #2, #3, #4) in JSON format about 'premiazione' focusing on prepositions.

REQUIREMENTS:
Level: C2
Topic: premiazione
Grammar: prepositions
Exercise types: fill_in_blank, translation, multiple_choice

CRITICAL RULES:
1. TOPIC: Every exercise MUST be about "premiazione" - stay on topic throughout
2. REALISM: Use factual, natural scenarios appropriate for the topic
3. GRAMMAR: EVERY SINGLE exercise MUST test "prepositions" at C2 level
4. MULTIPLE CHOICE: Provide 4 DIFFERENT grammatical forms as options
5. CONSISTENCY: Do not mix different topics or intr

# Load YOUR V4 model - NOT a base model!
# V4 was already trained on exercise generation, so it knows the format
MODEL_NAME = "./models/italian_exercise_generator_v4_merged"

print(f"Loading YOUR V4 model: {MODEL_NAME}")
print("⚠️ IMPORTANT: Using V4 model that already knows exercise format!")

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Add padding token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model with quantization for A100
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

print("✅ V4 Model loaded - this model already generates valid exercises!")

In [8]:
# Use LLaMAntino-3 (Italian-optimized model) or your V4 model
MODEL_NAME = "./models/italian_v5_final"
# Alternative: "meta-llama/Llama-3.2-3B-Instruct" or your V4 model path

print(f"Loading model: {MODEL_NAME}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Add padding token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model with quantization for A100
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

print("✅ Model loaded")

Loading model: ./models/italian_v5_final


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

✅ Model loaded


## Define Reward Function (TRL Format)

In [7]:
def italian_exercise_reward(prompts=None, completions=None, completion_ids=None, **kwargs):
    """
    Reward function for Italian exercise generation.

    TRL calls this function with keyword arguments:
    - prompts: List of prompt strings
    - completions: List of generated completion strings
    - completion_ids: Token IDs of completions
    - **kwargs: All dataset columns (includes 'request')

    Returns:
        List of float rewards (0.0 to 1.0)
    """
    # Extract 'request' from kwargs (comes from dataset column)
    requests = kwargs.get('request', [])

    rewards = []

    for completion, req in zip(completions, requests):
        try:
            # Parse generated JSON
            completion_text = completion.strip()
            exercises = json.loads(completion_text)


            if not isinstance(exercises, list):
                exercises = [exercises]

            # Score each exercise with comprehensive reward function
            scores = []
            for exercise in exercises:
                score, _ = reward_fn.score(exercise, req)
                scores.append(score / 100.0)  # Normalize to 0-1

            # Average score across all exercises in the completion
            reward = sum(scores) / len(scores) if scores else 0.0

        except json.JSONDecodeError:
            # Invalid JSON - very low reward
            reward = 0.1
            print("------ invalid json -----")
            print(completion_text)
            print("-------------------------")
        except Exception as e:
            # Other errors - low reward
            reward = 0.1
        print("🎯 REWARDS:", rewards)
        rewards.append(reward)

    return rewards

print("✅ Reward function defined (TRL-compatible format)")

✅ Reward function defined (TRL-compatible format)


# GRPO configuration - OPTIMIZED FOR PILOT RUN
grpo_config = GRPOConfig(
    output_dir="./models/italian_v5_grpo_pilot",
    num_train_epochs=1,  # Reduced from 3 to 1 for pilot
    per_device_train_batch_size=1,  # Reduced from 2 to 1 to save memory
    gradient_accumulation_steps=4,  # Reduced from 8 to 4 (effective batch = 4)
    learning_rate=5e-6,
    warmup_steps=20,  # Reduced from 50
    logging_steps=5,  # More frequent logging for pilot
    save_steps=100,  # Save more frequently for pilot
    save_total_limit=2,
    bf16=True,
    remove_unused_columns=False,
    
    # GRPO-specific parameters - OPTIMIZED
    num_generations=4,  # Keep 4 for GRPO algorithm
    max_completion_length=320,  # Reduced from 512 (exercises don't need 512 tokens)
    temperature=0.7,
    
    # Generation optimization
    generation_batch_size=4,  # Process generations one at a time to save memory
    
    # CRITICAL: Add generation kwargs with stop tokens (like your API uses!)
    generation_kwargs={
        "do_sample": True,
        "top_p": 0.9,
        "eos_token_id": [128009],  # Stop token
        "pad_token_id": 128009,
    }
)

print("✅ GRPO config created (OPTIMIZED FOR PILOT)")
print(f"   Added stop tokens to prevent verbose output!")
print(f"   Estimated time: ~30-45 minutes for 200 samples")
print(f"   Memory usage should be ~35-40GB (vs 62GB)")
print(f"\n   For full training:")
print(f"   - Change PILOT_SIZE to 2000 in dataset cell")
print(f"   - Change num_train_epochs to 2-3")
print(f"   - Consider increasing batch sizes if training succeeds")

In [10]:
# GRPO configuration - OPTIMIZED FOR PILOT RUN
grpo_config = GRPOConfig(
    output_dir="./models/italian_v6_grpo_pilot",
    num_train_epochs=1,  # 2-3 recommended for full dataset
    per_device_train_batch_size=1,  # Increase if memory allows
    gradient_accumulation_steps=4,  # Keeps effective batch size manageable
    learning_rate=1e-6,  # Lower than pilot to avoid forgetting
    warmup_steps=50,  # Slightly higher for longer training
    logging_steps=10,  # Less frequent logging to reduce overhead
    save_steps=100,  # Save checkpoints periodically
    save_total_limit=3,
    bf16=True,
    remove_unused_columns=False,

    # GRPO-specific parameters
    num_generations=4,
    max_completion_length=400,  # Slightly higher than pilot if needed
    temperature=0.7,

    # Generation optimization
    generation_batch_size=4,

    # Stop tokens and sampling
    generation_kwargs={
        "do_sample": True,
        "top_p": 0.9,
        "eos_token_id": [128009],
        "pad_token_id": 128009,
    }
)

print("✅ GRPO config created (OPTIMIZED FOR PILOT)")
print(f"   Estimated time: ~30-45 minutes for 200 samples")
print(f"   Memory usage should be ~35-40GB (vs 62GB)")
print(f"\n   For full training:")
print(f"   - Change PILOT_SIZE to 2000 in dataset cell")
print(f"   - Change num_train_epochs to 2-3")
print(f"   - Consider increasing batch sizes if training succeeds")

✅ GRPO config created (OPTIMIZED FOR PILOT)
   Estimated time: ~30-45 minutes for 200 samples
   Memory usage should be ~35-40GB (vs 62GB)

   For full training:
   - Change PILOT_SIZE to 2000 in dataset cell
   - Change num_train_epochs to 2-3
   - Consider increasing batch sizes if training succeeds


## Initialize GRPO Trainer

In [11]:
trainer = GRPOTrainer(
    model=model,
    args=grpo_config,  # Changed from 'config' to 'args'
    reward_funcs=italian_exercise_reward,
    train_dataset=train_dataset,
    processing_class=tokenizer,  # Changed from 'tokenizer' to 'processing_class'
)

The model is already on multiple devices. Skipping the move to device specified in `args`.


## Start Training

In [12]:
print("🚀 Starting GRPO training...\n")
trainer.train()
print("\n✅ Training complete!")


output_dir = "./models/italian_v6_grpo_pilot"
trainer.save_model(output_dir)         # Saves model + adapter weights (if LoRA used)
tokenizer.save_pretrained(output_dir)  # Saves tokenizer

🚀 Starting GRPO training...



[34m[1mwandb[0m: Currently logged in as: [33mari-katzir[0m ([33mariel-katzir[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Detected [huggingface_hub.inference, openai] in use.
[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/
`generation_config` default values have been modified to match model-specific defaults: {'max_length': 8192}. If this is not desired, please set these values explicitly.


🎯 REWARDS: []
🎯 REWARDS: [0.9333333333333332]
🎯 REWARDS: [0.9333333333333332, 0.8433333333333334]
🎯 REWARDS: [0.9333333333333332, 0.8433333333333334, 0.9666666666666667]


Step,Training Loss
10,0.0105
20,0.0277
30,0.0071
40,0.0098
50,0.0007
60,-0.017
70,0.0092
80,0.0023
90,-0.0056
100,-0.0193


🎯 REWARDS: []
🎯 REWARDS: [0.9]
🎯 REWARDS: [0.9, 0.81]
🎯 REWARDS: [0.9, 0.81, 0.82]
🎯 REWARDS: []
🎯 REWARDS: [0.81625]
🎯 REWARDS: [0.81625, 0.8512500000000001]
🎯 REWARDS: [0.81625, 0.8512500000000001, 0.88625]
🎯 REWARDS: []
🎯 REWARDS: [0.965]
🎯 REWARDS: [0.965, 0.98]
🎯 REWARDS: [0.965, 0.98, 0.905]
🎯 REWARDS: []
🎯 REWARDS: [0.9385]
🎯 REWARDS: [0.9385, 0.9019999999999999]
🎯 REWARDS: [0.9385, 0.9019999999999999, 0.9359999999999999]
🎯 REWARDS: []
🎯 REWARDS: [0.7875]
🎯 REWARDS: [0.7875, 0.92]
🎯 REWARDS: [0.7875, 0.92, 0.94]
🎯 REWARDS: []
🎯 REWARDS: [0.9524999999999999]
🎯 REWARDS: [0.9524999999999999, 0.96]
🎯 REWARDS: [0.9524999999999999, 0.96, 0.9299999999999999]
🎯 REWARDS: []
🎯 REWARDS: [0.984]
🎯 REWARDS: [0.984, 0.9189999999999999]
🎯 REWARDS: [0.984, 0.9189999999999999, 0.93]
🎯 REWARDS: []
🎯 REWARDS: [0.92]
🎯 REWARDS: [0.92, 0.96]
🎯 REWARDS: [0.92, 0.96, 0.99]
🎯 REWARDS: []
🎯 REWARDS: [0.9666666666666667]
🎯 REWARDS: [0.9666666666666667, 0.9333333333333332]
🎯 REWARDS: [0.9666666666666667, 

('./models/italian_v6_grpo_pilot/tokenizer_config.json',
 './models/italian_v6_grpo_pilot/special_tokens_map.json',
 './models/italian_v6_grpo_pilot/chat_template.jinja',
 './models/italian_v6_grpo_pilot/tokenizer.json')

In [11]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./models/italian_v6_grpo_pilot/checkpoint-400")
tokenizer = AutoTokenizer.from_pretrained("./models/italian_v6_grpo_pilot/checkpoint-400")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [12]:
import re
import json

# ========== 1️⃣ TEST REQUEST ==========
test_request = {
    "level": "A1",
    "grammar_focus": "past_tense",
    "topic": "buttons",
    "num_exercises": 2,
    "exercise_types": ["translation", "fill_in_blank"]
}

# ========== 2️⃣ FORMAT PROMPT ==========
test_prompt = format_prompt_with_chat_template(test_request, tokenizer)
print(f"PROMPT SENT TO MODEL:\n{test_prompt}\n")

inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)

# ========== 3️⃣ GENERATE ==========
outputs = model.generate(
    **inputs,
    max_new_tokens=400,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
    eos_token_id=128009,  # IMPORTANT: Same as training
    pad_token_id=128009
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
print("🔍 FULL RAW MODEL OUTPUT:")
print(generated_text)
print("--------------------------------------------------")

# ========== 4️⃣ CLEAN & EXTRACT JSON ONLY ==========
# Use REGEX to extract ONLY the FIRST valid JSON array
matches = re.findall(r"\[\s*{.*?}\s*\]", generated_text, re.DOTALL)
if not matches:
    raise ValueError("❌ No valid JSON array found in output!")
json_text = matches[-1]  # Use the LAST block (real output)

if not matches:
    raise ValueError("❌ No valid JSON array found in output!")

print("✅ CLEANED JSON BLOCK EXTRACTED:")
print(json_text)
print("--------------------------------------------------")

# ========== 5️⃣ PARSE JSON SAFELY ==========
try:
    exercises = json.loads(json_text)
except json.JSONDecodeError as e:
    raise ValueError(f"❌ JSON parsing error: {e}")

if not isinstance(exercises, list):
    exercises = [exercises]

print(f"🎯 SUCCESS: Parsed {len(exercises)} exercises!")
print("--------------------------------------------------")

# ========== 6️⃣ SCORE OUTPUT ==========
for i, ex in enumerate(exercises):
    score, breakdown = reward_fn.score(ex, test_request)
    print(f"\n{'='*60}")
    print(f"📝 Exercise {i+1} Score: {score}/100")
    print(f"{'='*60}")
    print(f"Type: {ex.get('type')}")
    print(f"Question: {ex.get('question')}")
    print(f"Answer: {ex.get('correct_answer')}")
    if ex.get('options'):
        print(f"Options: {ex.get('options')}")
    print(f"\nDetails: {breakdown}")


PROMPT SENT TO MODEL:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert Italian language teacher. Generate high-quality exercises based on the assignment specification. Output exercises in JSON format.<|eot_id|><|start_header_id|>user<|end_header_id|>

Create exactly 2 Italian language exercises (#1, #2) in JSON format about 'buttons' focusing on past_tense.

REQUIREMENTS:
Level: A1
Topic: buttons
Grammar: past_tense
⚠️ MANDATORY: Use ONLY past tense (passato prossimo like 'ho fatto', 'sono andato' OR imperfetto like 'facevo', 'andavo'). NO present tense!
Exercise types: translation, fill_in_blank

CRITICAL RULES:
1. TOPIC: Every exercise MUST be about "buttons" - stay on topic throughout
2. REALISM: Use factual, natural scenarios appropriate for the topic
3. GRAMMAR: EVERY SINGLE exercise MUST test "past_tense" at A1 level
4. MULTIPLE CHOICE: Provide 4 DIFFERENT grammatical forms as options
5. CONSISTENCY: Do not mix different topics or introduce unrelated

## Test Generation