# Complete Training Pipeline: Question & Answer Generation Models

This notebook trains **2 specialized models** for logical reasoning:

1. **Question Generation Model** - Generates high-quality logical reasoning questions
2. **Answer Generation Model** - Solves logical reasoning questions with step-by-step reasoning

---

## 📊 Dataset Stats
- **Total Questions**: 638 (98.9% quality)
- **Blood Relations**: 401 (62.9%)
- **Seating Arrangement**: 237 (37.1%)
- **Split**: 574 train / 64 validation

---

## ⏱️ Expected Time
- Data Preparation: ~2 minutes
- Model 1 Training: ~15-30 minutes
- Model 2 Training: ~15-30 minutes
- **Total**: ~35-65 minutes

---
# Part 1: Setup & Data Preparation

In [None]:
# Install required packages (run once)
# !pip install torch transformers datasets trl peft unsloth

In [None]:
import os
import json
import torch
import random
from pathlib import Path
from datasets import Dataset
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template, standardize_sharegpt, train_on_responses_only
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq

print("All packages imported successfully")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# Configuration
CURATED_DIR = Path("/Users/777bhavyagoyal/Developer/UNSLOTHxAMDxHACk/MAIN_CURATED_JSON")
OUTPUT_DIR = Path("/Users/777bhavyagoyal/Developer/UNSLOTHxAMDxHACk/training_data")
MODELS_DIR = Path("/Users/777bhavyagoyal/Developer/UNSLOTHxAMDxHACk/models")

OUTPUT_DIR.mkdir(exist_ok=True)
MODELS_DIR.mkdir(exist_ok=True)

print(f"✅ Directories created")
print(f"Curated data: {CURATED_DIR}")
print(f"Training data: {OUTPUT_DIR}")
print(f"Models output: {MODELS_DIR}")

## 📊 Step 1: Load and Analyze Curated Data

In [None]:
# Load all curated questions
all_questions = []
curated_files = sorted(CURATED_DIR.glob("*.json"))

for file_path in curated_files:
    try:
        with open(file_path, 'r') as f:
            questions = json.load(f)
            # Filter valid questions
            for q in questions:
                if len(q.get('choices', [])) == 4 and q.get('answer', '') in ['A', 'B', 'C', 'D']:
                    all_questions.append(q)
    except Exception as e:
        print(f"Error loading {file_path.name}: {e}")

print(f"\n📊 Loaded {len(all_questions)} valid questions from {len(curated_files)} files")

# Show sample
print(f"\n📝 Sample question:")
print(json.dumps(all_questions[0], indent=2))

---
# Part 2: Prepare Question Generation Data

**Model 1 Task**: Generate logical reasoning questions from scratch

**Input**: `"Generate a medium difficulty blood_relations question."`

**Output**: Full question JSON with choices, answer, reasoning, explanation

In [None]:
# System prompt for Question Generation
QUESTION_GEN_SYSTEM_PROMPT = """You are an expert at creating high-quality logical reasoning questions for competitive exams and aptitude tests.

You specialize in two types of questions:

**1. Blood Relations**
- Family relationship puzzles involving complex kinship chains
- Include relationships like: father, mother, son, daughter, uncle, aunt, cousin, grandfather, grandmother, brother-in-law, sister-in-law
- Questions should be self-contained with all necessary information stated explicitly
- Example: "A is the father of B. C is the mother of B. D is C's sister. E is D's husband. How is E related to B?"

**2. Seating Arrangement**
- Spatial reasoning puzzles with people sitting in various configurations
- Types: linear rows, circular arrangements, parallel rows, square tables
- Include direction (facing north/south/center), positions (left/right, immediate/second/third), and constraints
- Questions should provide clear spatial relationships and ask about deducible positions

**Quality Requirements:**
✓ Self-contained: Include all facts needed to solve the question in the question itself
✓ Clear and unambiguous: No vague or confusing statements
✓ Exactly 4 choices: Always provide options A, B, C, D
✓ Unique correct answer: Only one option should be definitively correct
✓ Step-by-step reasoning: Provide 5 clear logical steps showing how to reach the answer
✓ Concise explanation: Brief summary of why the answer is correct
✓ Appropriate difficulty: Match the requested difficulty level (easy/medium/hard)

**Output Format:**
Return a valid JSON object with these exact fields:
{
  "topic": "blood_relations" or "seating_arrangement",
  "question": "<the question text with all context>",
  "choices": ["A) <option>", "B) <option>", "C) <option>", "D) <option>"],
  "answer": "A" or "B" or "C" or "D",
  "explanation": "<brief explanation of the answer>",
  "reasoning": "Step 1: ... Step 2: ... Step 3: ... Step 4: ... Step 5: ...",
  "difficulty": "easy" or "medium" or "hard"
}

Generate questions that would challenge students preparing for competitive exams while being solvable through logical reasoning."""

print("✅ Question Generation System Prompt Created")
print(f"\nPrompt length: {len(QUESTION_GEN_SYSTEM_PROMPT)} characters")

In [None]:
# Create Question Generation examples
question_gen_examples = []

for q in all_questions:
    topic = q.get('topic', 'blood_relations')
    difficulty = q.get('difficulty', 'medium')
    
    conversation = [
        {
            "role": "system",
            "content": QUESTION_GEN_SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": f"Generate a {difficulty} difficulty {topic.replace('_', ' ')} question with 4 multiple choice options, correct answer, explanation, and step-by-step reasoning."
        },
        {
            "role": "assistant",
            "content": json.dumps(q, indent=2, ensure_ascii=False)
        }
    ]
    
    question_gen_examples.append({"conversations": conversation})

print(f"✅ Created {len(question_gen_examples)} question generation examples")

# Shuffle and split
random.seed(42)
random.shuffle(question_gen_examples)
split_idx = int(len(question_gen_examples) * 0.9)

qgen_train = question_gen_examples[:split_idx]
qgen_val = question_gen_examples[split_idx:]

print(f"\nSplit:")
print(f"  Train: {len(qgen_train)}")
print(f"  Val: {len(qgen_val)}")

# Save
with open(OUTPUT_DIR / "question_gen_train.json", 'w') as f:
    json.dump(qgen_train, f, indent=2, ensure_ascii=False)

with open(OUTPUT_DIR / "question_gen_val.json", 'w') as f:
    json.dump(qgen_val, f, indent=2, ensure_ascii=False)

print(f"\n✅ Saved question generation data to {OUTPUT_DIR}")

---
# Part 3: Prepare Answer Generation Data

**Model 2 Task**: Solve logical reasoning questions with step-by-step reasoning

**Input**: Question text + choices

**Output**: Detailed reasoning + explanation + answer

In [None]:
# System prompt for Answer Generation
ANSWER_GEN_SYSTEM_PROMPT = """You are an expert at solving logical reasoning questions using systematic step-by-step analysis.

You excel at two types of logical reasoning:

**1. Blood Relations**
- Carefully track each family relationship mentioned
- Build a mental family tree to visualize connections
- Identify the relationship chain from person A to person B
- Consider both direct relationships and relationships through marriage
- Common relationships: parent, child, sibling, uncle/aunt, cousin, in-law, grandparent

**2. Seating Arrangement**
- Note the arrangement type (linear row, circle, parallel rows, etc.)
- Track explicit position information ("sits at position 3", "at extreme end", etc.)
- Track relative positions ("left of", "right of", "opposite to", "between", etc.)
- Consider facing direction when specified
- Use process of elimination to deduce unknown positions
- Verify final arrangement satisfies all given constraints

**Solution Approach:**

1. **Read Carefully**: Identify all given facts and relationships
2. **Organize Information**: Create a mental model (family tree or seating diagram)
3. **Apply Logic**: Use deduction to find new relationships or positions
4. **Verify**: Check that your answer satisfies all given constraints
5. **Explain**: Provide clear step-by-step reasoning

**Output Format:**

**Reasoning:**
Step 1: [First logical step]
Step 2: [Second logical step]
Step 3: [Third logical step]
Step 4: [Fourth logical step]
Step 5: [Conclusion]

**Explanation:**
[Brief summary of why this answer is correct]

**Answer:** [A/B/C/D]

Always provide clear, logical reasoning that anyone can follow to understand how you arrived at the correct answer."""

print("✅ Answer Generation System Prompt Created")
print(f"\nPrompt length: {len(ANSWER_GEN_SYSTEM_PROMPT)} characters")

In [None]:
# Helper functions for answer generation format
def format_question_with_choices(question, choices):
    """Format question with choices for display"""
    formatted = f"{question}\n\nChoices:\n"
    for choice in choices:
        formatted += f"{choice}\n"
    return formatted.strip()

def format_answer(answer, reasoning, explanation):
    """Format the answer with reasoning and explanation"""
    response = f"**Reasoning:**\n{reasoning}\n\n"
    response += f"**Explanation:**\n{explanation}\n\n"
    response += f"**Answer:** {answer}"
    return response

# Create Answer Generation examples
answer_gen_examples = []

for q in all_questions:
    conversation = [
        {
            "role": "system",
            "content": ANSWER_GEN_SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": format_question_with_choices(
                q.get('question', ''),
                q.get('choices', [])
            )
        },
        {
            "role": "assistant",
            "content": format_answer(
                q.get('answer', ''),
                q.get('reasoning', ''),
                q.get('explanation', '')
            )
        }
    ]
    
    answer_gen_examples.append({"conversations": conversation})

print(f"✅ Created {len(answer_gen_examples)} answer generation examples")

# Shuffle and split
random.seed(42)
random.shuffle(answer_gen_examples)
split_idx = int(len(answer_gen_examples) * 0.9)

agen_train = answer_gen_examples[:split_idx]
agen_val = answer_gen_examples[split_idx:]

print(f"\nSplit:")
print(f"  Train: {len(agen_train)}")
print(f"  Val: {len(agen_val)}")

# Save
with open(OUTPUT_DIR / "answer_gen_train.json", 'w') as f:
    json.dump(agen_train, f, indent=2, ensure_ascii=False)

with open(OUTPUT_DIR / "answer_gen_val.json", 'w') as f:
    json.dump(agen_val, f, indent=2, ensure_ascii=False)

print(f"\n✅ Saved answer generation data to {OUTPUT_DIR}")

# Show sample
print("\n📝 Sample Answer Generation Example:")
print(json.dumps(agen_train[0]['conversations'], indent=2)[:500] + "...")

---
# Part 4: Train Question Generation Model

**Model**: Llama-3.2-3B-Instruct  
**Task**: Generate logical reasoning questions  
**Training Time**: ~15-30 minutes

In [None]:
# Configuration for Question Generation Model
QGEN_CONFIG = {
    "model_name": "unsloth/Llama-3.2-3B-Instruct",
    "max_seq_length": 2048,  # Longer for full question JSON
    "lora_r": 32,
    "lora_alpha": 32,
    "batch_size": 4,
    "gradient_accumulation": 4,
    "learning_rate": 2e-4,
    "num_epochs": 3,
    "warmup_steps": 10,
}

print("📋 Question Generation Model Configuration:")
for key, value in QGEN_CONFIG.items():
    print(f"  {key}: {value}")

In [None]:
# Load model and tokenizer
print("\n📥 Loading Question Generation base model...")

qgen_model, qgen_tokenizer = FastLanguageModel.from_pretrained(
    model_name=QGEN_CONFIG["model_name"],
    max_seq_length=QGEN_CONFIG["max_seq_length"],
    dtype=torch.bfloat16,
    load_in_4bit=False,
    device_map="auto",
    trust_remote_code=True,
)

print(f"✅ Loaded {QGEN_CONFIG['model_name']}")

# Add LoRA adapters
print("\n🔧 Adding LoRA adapters...")

qgen_model = FastLanguageModel.get_peft_model(
    qgen_model,
    r=QGEN_CONFIG["lora_r"],
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=QGEN_CONFIG["lora_alpha"],
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

print(f"✅ LoRA adapters added (r={QGEN_CONFIG['lora_r']})")

In [None]:
# Prepare datasets
print("\n📊 Loading question generation datasets...")

qgen_train_dataset = Dataset.from_list(qgen_train)
qgen_val_dataset = Dataset.from_list(qgen_val)

print(f"Train: {len(qgen_train_dataset)} examples")
print(f"Val: {len(qgen_val_dataset)} examples")

# Set chat template
qgen_tokenizer = get_chat_template(qgen_tokenizer, chat_template="llama-3.1")

if qgen_tokenizer.pad_token is None:
    qgen_tokenizer.pad_token = qgen_tokenizer.eos_token
    qgen_tokenizer.pad_token_id = qgen_tokenizer.eos_token_id

# Formatting function
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = []
    for convo in convos:
        if isinstance(convo, list):
            text = qgen_tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
            texts.append(text)
    return {"text": texts}

print("\n🔧 Formatting datasets...")

qgen_train_dataset = standardize_sharegpt(qgen_train_dataset)
qgen_train_dataset = qgen_train_dataset.map(formatting_prompts_func, batched=True, remove_columns=qgen_train_dataset.column_names)
qgen_train_dataset = qgen_train_dataset.filter(lambda x: len(x["text"].strip()) > 0)

qgen_val_dataset = standardize_sharegpt(qgen_val_dataset)
qgen_val_dataset = qgen_val_dataset.map(formatting_prompts_func, batched=True, remove_columns=qgen_val_dataset.column_names)
qgen_val_dataset = qgen_val_dataset.filter(lambda x: len(x["text"].strip()) > 0)

print(f"✅ Formatted {len(qgen_train_dataset)} train + {len(qgen_val_dataset)} val examples")

In [None]:
# Setup trainer
print("\n🚀 Setting up Question Generation trainer...")

qgen_output_dir = MODELS_DIR / "question_gen_model"
qgen_output_dir.mkdir(exist_ok=True)

qgen_trainer = SFTTrainer(
    model=qgen_model,
    tokenizer=qgen_tokenizer,
    train_dataset=qgen_train_dataset,
    eval_dataset=qgen_val_dataset,
    dataset_text_field="text",
    max_seq_length=QGEN_CONFIG["max_seq_length"],
    data_collator=DataCollatorForSeq2Seq(tokenizer=qgen_tokenizer, padding=True),
    packing=False,
    args=SFTConfig(
        per_device_train_batch_size=QGEN_CONFIG["batch_size"],
        per_device_eval_batch_size=QGEN_CONFIG["batch_size"],
        gradient_accumulation_steps=QGEN_CONFIG["gradient_accumulation"],
        warmup_steps=QGEN_CONFIG["warmup_steps"],
        num_train_epochs=QGEN_CONFIG["num_epochs"],
        learning_rate=QGEN_CONFIG["learning_rate"],
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir=str(qgen_output_dir / "checkpoints"),
        report_to="none",
        bf16=True,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
)

# Train only on responses
qgen_trainer = train_on_responses_only(
    qgen_trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

print("✅ Trainer configured")

In [None]:
# Train Question Generation Model
print("\n" + "="*80)
print("🏋️  TRAINING QUESTION GENERATION MODEL")
print("="*80)
print("\nThis will take ~15-30 minutes. Go grab a coffee! ☕\n")

FastLanguageModel.for_training(qgen_model)
qgen_stats = qgen_trainer.train()

print("\n" + "="*80)
print("✅ QUESTION GENERATION TRAINING COMPLETE")
print("="*80)

In [None]:
# Save Question Generation Model
print("\n💾 Saving Question Generation model...")

qgen_lora_path = qgen_output_dir / "lora"
qgen_merged_path = qgen_output_dir / "merged"

qgen_lora_path.mkdir(exist_ok=True)
qgen_merged_path.mkdir(exist_ok=True)

# Save LoRA adapters
qgen_model.save_pretrained(str(qgen_lora_path))
qgen_tokenizer.save_pretrained(str(qgen_lora_path))
print(f"✅ LoRA adapters saved to: {qgen_lora_path}")

# Save merged model
qgen_model.save_pretrained_merged(str(qgen_merged_path), qgen_tokenizer, save_method="merged_16bit")
print(f"✅ Merged model saved to: {qgen_merged_path}")

print("\n🎉 Question Generation Model Ready!")

---
# Part 5: Train Answer Generation Model

**Model**: Llama-3.2-3B-Instruct  
**Task**: Solve logical reasoning questions  
**Training Time**: ~15-30 minutes

In [None]:
# Free up memory from Question Gen model
import gc
del qgen_model, qgen_tokenizer, qgen_trainer
gc.collect()
torch.cuda.empty_cache()
print("✅ Memory cleared")

In [None]:
# Configuration for Answer Generation Model
AGEN_CONFIG = {
    "model_name": "unsloth/Llama-3.2-3B-Instruct",
    "max_seq_length": 1536,  # Moderate for Q&A
    "lora_r": 32,
    "lora_alpha": 32,
    "batch_size": 8,
    "gradient_accumulation": 2,
    "learning_rate": 2e-4,
    "num_epochs": 3,
    "warmup_steps": 10,
}

print("📋 Answer Generation Model Configuration:")
for key, value in AGEN_CONFIG.items():
    print(f"  {key}: {value}")

In [None]:
# Load model and tokenizer
print("\n📥 Loading Answer Generation base model...")

agen_model, agen_tokenizer = FastLanguageModel.from_pretrained(
    model_name=AGEN_CONFIG["model_name"],
    max_seq_length=AGEN_CONFIG["max_seq_length"],
    dtype=torch.bfloat16,
    load_in_4bit=False,
    device_map="auto",
    trust_remote_code=True,
)

print(f"✅ Loaded {AGEN_CONFIG['model_name']}")

# Add LoRA adapters
print("\n🔧 Adding LoRA adapters...")

agen_model = FastLanguageModel.get_peft_model(
    agen_model,
    r=AGEN_CONFIG["lora_r"],
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=AGEN_CONFIG["lora_alpha"],
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

print(f"✅ LoRA adapters added (r={AGEN_CONFIG['lora_r']})")

In [None]:
# Prepare datasets
print("\n📊 Loading answer generation datasets...")

agen_train_dataset = Dataset.from_list(agen_train)
agen_val_dataset = Dataset.from_list(agen_val)

print(f"Train: {len(agen_train_dataset)} examples")
print(f"Val: {len(agen_val_dataset)} examples")

# Set chat template
agen_tokenizer = get_chat_template(agen_tokenizer, chat_template="llama-3.1")

if agen_tokenizer.pad_token is None:
    agen_tokenizer.pad_token = agen_tokenizer.eos_token
    agen_tokenizer.pad_token_id = agen_tokenizer.eos_token_id

# Formatting function
def formatting_prompts_func_agen(examples):
    convos = examples["conversations"]
    texts = []
    for convo in convos:
        if isinstance(convo, list):
            text = agen_tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
            texts.append(text)
    return {"text": texts}

print("\n🔧 Formatting datasets...")

agen_train_dataset = standardize_sharegpt(agen_train_dataset)
agen_train_dataset = agen_train_dataset.map(formatting_prompts_func_agen, batched=True, remove_columns=agen_train_dataset.column_names)
agen_train_dataset = agen_train_dataset.filter(lambda x: len(x["text"].strip()) > 0)

agen_val_dataset = standardize_sharegpt(agen_val_dataset)
agen_val_dataset = agen_val_dataset.map(formatting_prompts_func_agen, batched=True, remove_columns=agen_val_dataset.column_names)
agen_val_dataset = agen_val_dataset.filter(lambda x: len(x["text"].strip()) > 0)

print(f"✅ Formatted {len(agen_train_dataset)} train + {len(agen_val_dataset)} val examples")

In [None]:
# Setup trainer
print("\n🚀 Setting up Answer Generation trainer...")

agen_output_dir = MODELS_DIR / "answer_gen_model"
agen_output_dir.mkdir(exist_ok=True)

agen_trainer = SFTTrainer(
    model=agen_model,
    tokenizer=agen_tokenizer,
    train_dataset=agen_train_dataset,
    eval_dataset=agen_val_dataset,
    dataset_text_field="text",
    max_seq_length=AGEN_CONFIG["max_seq_length"],
    data_collator=DataCollatorForSeq2Seq(tokenizer=agen_tokenizer, padding=True),
    packing=False,
    args=SFTConfig(
        per_device_train_batch_size=AGEN_CONFIG["batch_size"],
        per_device_eval_batch_size=AGEN_CONFIG["batch_size"],
        gradient_accumulation_steps=AGEN_CONFIG["gradient_accumulation"],
        warmup_steps=AGEN_CONFIG["warmup_steps"],
        num_train_epochs=AGEN_CONFIG["num_epochs"],
        learning_rate=AGEN_CONFIG["learning_rate"],
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir=str(agen_output_dir / "checkpoints"),
        report_to="none",
        bf16=True,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
)

# Train only on responses
agen_trainer = train_on_responses_only(
    agen_trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

print("✅ Trainer configured")

In [None]:
# Train Answer Generation Model
print("\n" + "="*80)
print("🏋️  TRAINING ANSWER GENERATION MODEL")
print("="*80)
print("\nThis will take ~15-30 minutes. Another coffee? ☕\n")

FastLanguageModel.for_training(agen_model)
agen_stats = agen_trainer.train()

print("\n" + "="*80)
print("✅ ANSWER GENERATION TRAINING COMPLETE")
print("="*80)

In [None]:
# Save Answer Generation Model
print("\n💾 Saving Answer Generation model...")

agen_lora_path = agen_output_dir / "lora"
agen_merged_path = agen_output_dir / "merged"

agen_lora_path.mkdir(exist_ok=True)
agen_merged_path.mkdir(exist_ok=True)

# Save LoRA adapters
agen_model.save_pretrained(str(agen_lora_path))
agen_tokenizer.save_pretrained(str(agen_lora_path))
print(f"✅ LoRA adapters saved to: {agen_lora_path}")

# Save merged model
agen_model.save_pretrained_merged(str(agen_merged_path), agen_tokenizer, save_method="merged_16bit")
print(f"✅ Merged model saved to: {agen_merged_path}")

print("\n🎉 Answer Generation Model Ready!")

---
# 🎉 Training Complete!

## Your Models

### Question Generation Model
- **LoRA**: `models/question_gen_model/lora/`
- **Merged**: `models/question_gen_model/merged/` ⭐

### Answer Generation Model
- **LoRA**: `models/answer_gen_model/lora/`
- **Merged**: `models/answer_gen_model/merged/` ⭐

## Next Steps

1. ✅ Test your models (see inference notebook)
2. ✅ Deploy for your hackathon project
3. ✅ Generate new questions and solve them!

## System Prompts

The optimized system prompts used for training are saved above. Use these same prompts during inference for best results!

---

**Congratulations!** 🎊 You now have two specialized models for logical reasoning!