# Instruction Tuning for AG News Classification

## Overview

This notebook demonstrates instruction tuning techniques following methodologies from:
- Wei et al. (2022): "Finetuned Language Models Are Zero-Shot Learners"
- Chung et al. (2022): "Scaling Instruction-Finetuned Language Models"
- Wang et al. (2022): "Self-Instruct: Aligning Language Models with Self-Generated Instructions"

### Tutorial Objectives
1. Create instruction-following datasets
2. Design effective instruction formats
3. Fine-tune models with instructions
4. Implement multi-task instruction learning
5. Evaluate instruction-following capabilities
6. Deploy instruction-tuned models

Author: Võ Hải Dũng  
Email: vohaidung.work@gmail.com  
Date: 2025

## 1. Environment Setup

In [None]:
# Standard library imports
import sys
import json
import random
from pathlib import Path
from typing import Dict, List, Tuple, Optional, Any, Union
from dataclasses import dataclass
import warnings

# Data and ML imports
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    T5ForConditionalGeneration,
    Trainer,
    TrainingArguments
)

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm

# Project imports
PROJECT_ROOT = Path("../..").resolve()
sys.path.insert(0, str(PROJECT_ROOT))

from src.data.datasets.ag_news import AGNewsDataset, AGNewsConfig
from src.data.datasets.prompted_dataset import PromptedDataset
from src.models.prompt_based.instruction_model import InstructionTunedModel
from src.training.trainers.instruction_trainer import InstructionTrainer
from src.training.strategies.distillation.gpt4_distill import GPT4DistillationStrategy
from src.utils.reproducibility import set_seed
from src.utils.logging_config import setup_logging
from configs.config_loader import ConfigLoader
from configs.constants import (
    AG_NEWS_CLASSES,
    AG_NEWS_NUM_CLASSES,
    DATA_DIR,
    MODEL_DIR,
    OUTPUT_DIR
)

# Setup
warnings.filterwarnings('ignore')
sns.set_style('whitegrid')
set_seed(42)
logger = setup_logging('instruction_tuning_tutorial')

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

## 2. Load Configuration

In [None]:
# Load instruction tuning configuration
config_loader = ConfigLoader()

# Load instruction tuning config
instruction_config = config_loader.load_config('training/advanced/instruction_tuning.yaml')

# Tutorial configuration
tutorial_config = {
    'max_samples': 1000,
    'batch_size': 4,
    'num_epochs': 3,
    'learning_rate': 5e-5,
    'max_source_length': 512,
    'max_target_length': 64,
    'model_name': 't5-base',  # Using T5 for instruction tuning
    'instruction_templates': 10,
    'task_diversity': 5,
    'use_explanations': True
}

print("Instruction Tuning Configuration:")
print("="*50)
for key, value in tutorial_config.items():
    print(f"  {key}: {value}")

## 3. Data Loading and Preparation

In [None]:
# Load AG News dataset
data_config = AGNewsConfig(
    data_dir=DATA_DIR / "processed",
    max_samples=tutorial_config['max_samples'],
    use_cache=True
)

print("Loading datasets...")
train_dataset = AGNewsDataset(data_config, split="train")
val_dataset = AGNewsDataset(data_config, split="validation")

print(f"\nDataset loaded:")
print(f"  Train samples: {len(train_dataset)}")
print(f"  Validation samples: {len(val_dataset)}")

# Sample data
sample_idx = 0
sample_text = train_dataset.texts[sample_idx]
sample_label = train_dataset.labels[sample_idx]

print(f"\nSample data:")
print(f"  Label: {sample_label} ({AG_NEWS_CLASSES[sample_label]})")
print(f"  Text: {sample_text[:200]}...")

## 4. Instruction Dataset Creation

In [None]:
@dataclass
class InstructionExample:
    """
    Data structure for instruction-tuning examples.
    
    Following instruction format from:
        Longpre et al. (2023): "The Flan Collection: Designing Data and Methods for Effective Instruction Tuning"
    """
    instruction: str
    input_text: str
    output: str
    explanation: Optional[str] = None
    task_type: str = "classification"


class InstructionDatasetCreator:
    """
    Create diverse instruction-tuning datasets.
    
    Following dataset creation strategies from:
        Wang et al. (2022): "Super-NaturalInstructions: Generalization via Declarative Instructions"
    """
    
    def __init__(self, base_dataset: AGNewsDataset):
        self.base_dataset = base_dataset
        self.instruction_templates = self._create_instruction_templates()
        self.task_variations = self._create_task_variations()
    
    def _create_instruction_templates(self) -> List[str]:
        """Create diverse instruction templates."""
        return [
            "Classify the following news article into one of four categories: {categories}.",
            "Determine which category this news article belongs to. Options: {categories}.",
            "Read the article and identify its news category from: {categories}.",
            "What type of news is this? Choose from: {categories}.",
            "Categorize this news article. Available categories are: {categories}.",
            "Given a news article, classify it as {categories}.",
            "Identify the topic of this news article: {categories}.",
            "Label this news article with the appropriate category: {categories}.",
            "Assign this article to one of these sections: {categories}.",
            "Based on the content, this article should be classified as: {categories}."
        ]
    
    def _create_task_variations(self) -> List[Dict[str, Any]]:
        """Create task variations for multi-task learning."""
        return [
            {
                'name': 'standard_classification',
                'format': 'category_only',
                'requires_explanation': False
            },
            {
                'name': 'explained_classification',
                'format': 'category_with_reason',
                'requires_explanation': True
            },
            {
                'name': 'binary_verification',
                'format': 'yes_no',
                'requires_explanation': False
            },
            {
                'name': 'topic_extraction',
                'format': 'keywords',
                'requires_explanation': False
            },
            {
                'name': 'confidence_classification',
                'format': 'category_with_confidence',
                'requires_explanation': False
            }
        ]
    
    def create_instruction_example(
        self,
        idx: int,
        task_variation: Optional[str] = None
    ) -> InstructionExample:
        """Create a single instruction example."""
        text = self.base_dataset.texts[idx]
        label = self.base_dataset.labels[idx]
        category = AG_NEWS_CLASSES[label]
        
        # Select random instruction template
        instruction_template = random.choice(self.instruction_templates)
        instruction = instruction_template.format(
            categories="World, Sports, Business, Science/Technology"
        )
        
        # Select task variation
        if task_variation is None:
            task = random.choice(self.task_variations)
        else:
            task = next(t for t in self.task_variations if t['name'] == task_variation)
        
        # Format output based on task
        if task['format'] == 'category_only':
            output = category
            explanation = None
        elif task['format'] == 'category_with_reason':
            output = category
            explanation = self._generate_explanation(text, category)
        elif task['format'] == 'yes_no':
            target_category = random.choice(AG_NEWS_CLASSES)
            instruction = f"Is this a {target_category} article? Answer yes or no."
            output = "yes" if category == target_category else "no"
            explanation = None
        elif task['format'] == 'keywords':
            instruction = "Extract the main topic keywords from this news article."
            output = self._extract_keywords(text, category)
            explanation = None
        else:  # category_with_confidence
            output = f"{category} (confidence: high)"
            explanation = None
        
        return InstructionExample(
            instruction=instruction,
            input_text=text[:500],  # Truncate for efficiency
            output=output,
            explanation=explanation,
            task_type=task['name']
        )
    
    def _generate_explanation(self, text: str, category: str) -> str:
        """Generate explanation for classification."""
        explanations = {
            'World': "This article discusses international events, politics, or global affairs.",
            'Sports': "This article covers athletic competitions, sports teams, or athletes.",
            'Business': "This article focuses on companies, markets, economy, or finance.",
            'Science/Technology': "This article reports on scientific research, technology, or innovations."
        }
        return explanations.get(category, "Based on the content and context.")
    
    def _extract_keywords(self, text: str, category: str) -> str:
        """Extract keywords based on category."""
        # Simplified keyword extraction
        category_keywords = {
            'World': "politics, international, government, country",
            'Sports': "game, team, player, championship",
            'Business': "company, market, revenue, investment",
            'Science/Technology': "research, technology, innovation, discovery"
        }
        return category_keywords.get(category, "news, article, report")


# Create instruction dataset
instruction_creator = InstructionDatasetCreator(train_dataset)

# Generate examples
print("Creating Instruction Examples:")
print("="*50)

instruction_examples = []
for i in range(5):  # Create 5 examples for demonstration
    example = instruction_creator.create_instruction_example(i)
    instruction_examples.append(example)
    
    if i == 0:  # Display first example
        print(f"\nExample 1:")
        print(f"  Instruction: {example.instruction}")
        print(f"  Input: {example.input_text[:100]}...")
        print(f"  Output: {example.output}")
        if example.explanation:
            print(f"  Explanation: {example.explanation}")
        print(f"  Task type: {example.task_type}")

# Task distribution
task_types = [ex.task_type for ex in instruction_examples]
print(f"\nTask type distribution:")
for task_type in set(task_types):
    count = task_types.count(task_type)
    print(f"  {task_type}: {count}")

## 5. Instruction-Tuning Dataset Class

In [None]:
class InstructionTuningDataset(Dataset):
    """
    PyTorch dataset for instruction tuning.
    
    Following dataset design from:
        Iyer et al. (2022): "OPT-IML: Scaling Language Model Instruction Meta Learning"
    """
    
    def __init__(
        self,
        examples: List[InstructionExample],
        tokenizer,
        max_source_length: int = 512,
        max_target_length: int = 64
    ):
        self.examples = examples
        self.tokenizer = tokenizer
        self.max_source_length = max_source_length
        self.max_target_length = max_target_length
    
    def __len__(self):
        return len(self.examples)
    
    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        example = self.examples[idx]
        
        # Format input
        if example.explanation and random.random() < 0.5:  # Include explanation 50% of time
            source_text = f"{example.instruction}\n\nText: {example.input_text}\n\nExplanation: {example.explanation}"
        else:
            source_text = f"{example.instruction}\n\nText: {example.input_text}"
        
        # Tokenize input
        source_encoding = self.tokenizer(
            source_text,
            max_length=self.max_source_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        
        # Tokenize target
        target_encoding = self.tokenizer(
            example.output,
            max_length=self.max_target_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        
        # Prepare labels (set padding tokens to -100 for loss calculation)
        labels = target_encoding['input_ids'].squeeze()
        labels[labels == self.tokenizer.pad_token_id] = -100
        
        return {
            'input_ids': source_encoding['input_ids'].squeeze(),
            'attention_mask': source_encoding['attention_mask'].squeeze(),
            'labels': labels,
            'task_type': example.task_type
        }


# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(tutorial_config['model_name'])

# Create full instruction dataset
print("\nCreating full instruction dataset...")
all_instruction_examples = []
for i in tqdm(range(min(len(train_dataset), tutorial_config['max_samples']))):
    example = instruction_creator.create_instruction_example(i)
    all_instruction_examples.append(example)

# Split into train and validation
split_idx = int(0.9 * len(all_instruction_examples))
train_examples = all_instruction_examples[:split_idx]
val_examples = all_instruction_examples[split_idx:]

# Create datasets
train_instruction_dataset = InstructionTuningDataset(
    train_examples,
    tokenizer,
    max_source_length=tutorial_config['max_source_length'],
    max_target_length=tutorial_config['max_target_length']
)

val_instruction_dataset = InstructionTuningDataset(
    val_examples,
    tokenizer,
    max_source_length=tutorial_config['max_source_length'],
    max_target_length=tutorial_config['max_target_length']
)

print(f"\nInstruction datasets created:")
print(f"  Train examples: {len(train_instruction_dataset)}")
print(f"  Validation examples: {len(val_instruction_dataset)}")

# Test dataset
sample_batch = train_instruction_dataset[0]
print(f"\nSample batch shapes:")
for key, value in sample_batch.items():
    if isinstance(value, torch.Tensor):
        print(f"  {key}: {value.shape}")
    else:
        print(f"  {key}: {value}")

## 6. Model Initialization and Configuration

In [None]:
# Initialize model for instruction tuning
print("Initializing model for instruction tuning...")

# Load pre-trained T5 model
model = T5ForConditionalGeneration.from_pretrained(tutorial_config['model_name'])
model = model.to(device)

# Model information
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nModel Information:")
print("="*50)
print(f"Model: {tutorial_config['model_name']}")
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Model size: {total_params * 4 / 1024**2:.1f} MB (fp32)")


class InstructionTuningConfig:
    """
    Configuration for instruction tuning.
    
    Following configuration patterns from:
        Sanh et al. (2022): "Multitask Prompted Training"
    """
    
    def __init__(self):
        self.learning_rate = tutorial_config['learning_rate']
        self.num_epochs = tutorial_config['num_epochs']
        self.batch_size = tutorial_config['batch_size']
        self.gradient_accumulation_steps = 4
        self.warmup_ratio = 0.1
        self.weight_decay = 0.01
        self.max_grad_norm = 1.0
        self.label_smoothing = 0.1
        self.evaluation_strategy = "steps"
        self.eval_steps = 50
        self.save_steps = 100
        self.logging_steps = 10
        self.save_total_limit = 2
        self.load_best_model_at_end = True
        self.metric_for_best_model = "eval_loss"
        self.greater_is_better = False


# Create configuration
it_config = InstructionTuningConfig()

print("\nTraining Configuration:")
print(f"  Learning rate: {it_config.learning_rate}")
print(f"  Batch size: {it_config.batch_size}")
print(f"  Gradient accumulation: {it_config.gradient_accumulation_steps}")
print(f"  Effective batch size: {it_config.batch_size * it_config.gradient_accumulation_steps}")
print(f"  Number of epochs: {it_config.num_epochs}")

## 7. Training Setup

In [None]:
# Create data loaders
train_dataloader = DataLoader(
    train_instruction_dataset,
    batch_size=it_config.batch_size,
    shuffle=True,
    num_workers=2,
    pin_memory=torch.cuda.is_available()
)

val_dataloader = DataLoader(
    val_instruction_dataset,
    batch_size=it_config.batch_size,
    shuffle=False,
    num_workers=2,
    pin_memory=torch.cuda.is_available()
)

# Setup optimizer
from torch.optim import AdamW
from transformers import get_linear_schedule_with_warmup

optimizer = AdamW(
    model.parameters(),
    lr=it_config.learning_rate,
    weight_decay=it_config.weight_decay
)

# Setup scheduler
num_training_steps = len(train_dataloader) * it_config.num_epochs
num_warmup_steps = int(num_training_steps * it_config.warmup_ratio)

scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps
)

print("Training Setup:")
print("="*50)
print(f"Number of training steps: {num_training_steps}")
print(f"Number of warmup steps: {num_warmup_steps}")
print(f"Number of training batches: {len(train_dataloader)}")
print(f"Number of validation batches: {len(val_dataloader)}")

## 8. Training Loop

In [None]:
def train_epoch(
    model: nn.Module,
    dataloader: DataLoader,
    optimizer: torch.optim.Optimizer,
    scheduler: Any,
    device: torch.device,
    epoch: int
) -> Dict[str, float]:
    """
    Train model for one epoch.
    
    Following training practices from:
        Raffel et al. (2020): "Exploring the Limits of Transfer Learning"
    """
    model.train()
    total_loss = 0
    total_steps = 0
    
    progress_bar = tqdm(dataloader, desc=f"Epoch {epoch + 1}")
    
    for batch_idx, batch in enumerate(progress_bar):
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        
        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )
        
        loss = outputs.loss
        
        # Backward pass
        loss.backward()
        
        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), it_config.max_grad_norm)
        
        # Optimizer step
        if (batch_idx + 1) % it_config.gradient_accumulation_steps == 0:
            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()
        
        # Update metrics
        total_loss += loss.item()
        total_steps += 1
        
        # Update progress bar
        progress_bar.set_postfix({
            'loss': f'{loss.item():.4f}',
            'avg_loss': f'{total_loss / total_steps:.4f}'
        })
    
    return {
        'loss': total_loss / total_steps
    }


def evaluate(
    model: nn.Module,
    dataloader: DataLoader,
    device: torch.device
) -> Dict[str, float]:
    """
    Evaluate model on validation set.
    """
    model.eval()
    total_loss = 0
    total_steps = 0
    
    with torch.no_grad():
        for batch in tqdm(dataloader, desc="Evaluating"):
            # Move batch to device
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            
            # Forward pass
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=labels
            )
            
            total_loss += outputs.loss.item()
            total_steps += 1
    
    return {
        'loss': total_loss / total_steps
    }


# Training history
history = {
    'train_loss': [],
    'val_loss': [],
    'learning_rate': []
}

print("\nStarting Instruction Tuning:")
print("="*50)

# Training loop
best_val_loss = float('inf')
for epoch in range(it_config.num_epochs):
    # Train
    train_metrics = train_epoch(
        model, train_dataloader, optimizer, scheduler, device, epoch
    )
    
    # Evaluate
    val_metrics = evaluate(model, val_dataloader, device)
    
    # Store metrics
    history['train_loss'].append(train_metrics['loss'])
    history['val_loss'].append(val_metrics['loss'])
    history['learning_rate'].append(scheduler.get_last_lr()[0])
    
    # Print metrics
    print(f"\nEpoch {epoch + 1}/{it_config.num_epochs}:")
    print(f"  Train Loss: {train_metrics['loss']:.4f}")
    print(f"  Val Loss: {val_metrics['loss']:.4f}")
    print(f"  Learning Rate: {scheduler.get_last_lr()[0]:.2e}")
    
    # Save best model
    if val_metrics['loss'] < best_val_loss:
        best_val_loss = val_metrics['loss']
        print(f"  New best model! Val Loss: {best_val_loss:.4f}")

print("\nTraining completed!")

## 9. Model Inference and Generation

In [None]:
def generate_response(
    model: nn.Module,
    tokenizer,
    instruction: str,
    input_text: str,
    device: torch.device,
    max_length: int = 64
) -> str:
    """
    Generate response for instruction-tuned model.
    
    Following generation strategies from:
        Holtzman et al. (2020): "The Curious Case of Neural Text Degeneration"
    """
    model.eval()
    
    # Format input
    prompt = f"{instruction}\n\nText: {input_text}"
    
    # Tokenize
    inputs = tokenizer(
        prompt,
        max_length=512,
        truncation=True,
        return_tensors='pt'
    ).to(device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=max_length,
            num_beams=4,
            temperature=0.7,
            do_sample=False,
            early_stopping=True
        )
    
    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    return response


# Test generation with different instructions
test_text = val_dataset.texts[0][:300]
test_label = val_dataset.labels[0]

test_instructions = [
    "Classify this news article into World, Sports, Business, or Science/Technology.",
    "What category does this news article belong to?",
    "Is this a Sports article? Answer yes or no.",
    "Extract the main topic keywords from this article.",
    "Summarize the category and confidence level."
]

print("Model Generation Examples:")
print("="*50)
print(f"\nTest article (true label: {AG_NEWS_CLASSES[test_label]}):")
print(f"{test_text}\n")

for instruction in test_instructions[:3]:  # Test first 3 instructions
    response = generate_response(
        model, tokenizer, instruction, test_text, device
    )
    
    print(f"\nInstruction: {instruction}")
    print(f"Response: {response}")

## 10. Evaluation and Analysis

In [None]:
def evaluate_instruction_following(
    model: nn.Module,
    tokenizer,
    dataset: AGNewsDataset,
    instruction_creator: InstructionDatasetCreator,
    device: torch.device,
    num_samples: int = 100
) -> pd.DataFrame:
    """
    Evaluate instruction-following capabilities.
    
    Following evaluation protocols from:
        Ouyang et al. (2022): "Training language models to follow instructions"
    """
    results = []
    
    for i in tqdm(range(min(num_samples, len(dataset))), desc="Evaluating"):
        text = dataset.texts[i]
        true_label = dataset.labels[i]
        true_category = AG_NEWS_CLASSES[true_label]
        
        # Test different task variations
        for task_variation in ['standard_classification', 'binary_verification']:
            example = instruction_creator.create_instruction_example(i, task_variation)
            
            # Generate response
            response = generate_response(
                model, tokenizer, example.instruction, text[:300], device
            )
            
            # Evaluate correctness
            if task_variation == 'standard_classification':
                correct = true_category.lower() in response.lower()
            else:  # binary_verification
                expected = example.output
                correct = expected.lower() in response.lower()
            
            results.append({
                'sample_id': i,
                'task_type': task_variation,
                'true_label': true_category,
                'response': response[:50],
                'correct': correct
            })
    
    return pd.DataFrame(results)


# Run evaluation
print("Evaluating Instruction-Following Performance:")
print("="*50)

eval_results = evaluate_instruction_following(
    model, tokenizer, val_dataset, instruction_creator, device, num_samples=20
)

# Calculate metrics
task_performance = eval_results.groupby('task_type')['correct'].agg(['mean', 'count'])
task_performance.columns = ['Accuracy', 'Samples']

print("\nPerformance by Task Type:")
print(task_performance)

# Overall accuracy
overall_accuracy = eval_results['correct'].mean()
print(f"\nOverall Accuracy: {overall_accuracy:.3f}")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Training history
axes[0].plot(history['train_loss'], label='Train Loss', marker='o')
axes[0].plot(history['val_loss'], label='Val Loss', marker='s')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training History')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Task performance
task_performance['Accuracy'].plot(kind='bar', ax=axes[1])
axes[1].set_xlabel('Task Type')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Performance by Task Type')
axes[1].set_ylim([0, 1])
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## 11. Conclusions and Next Steps

### Instruction Tuning Summary

This tutorial demonstrated fundamental instruction tuning concepts:

1. **Instruction Dataset Creation**: Built diverse instruction-following examples
2. **Task Variations**: Implemented multiple task formats for robustness
3. **Model Fine-tuning**: Trained T5 model with instruction-response pairs
4. **Generation Strategies**: Applied beam search and sampling techniques
5. **Evaluation Protocol**: Assessed instruction-following capabilities
6. **Multi-task Learning**: Trained on various instruction types
7. **Performance Analysis**: Evaluated task-specific performance

### Key Takeaways

1. **Instruction Diversity**: Varied instructions improve generalization
2. **Task Mixing**: Training on multiple tasks enhances robustness
3. **Explanation Integration**: Including explanations improves interpretability
4. **Model Size**: Larger models generally follow instructions better
5. **Data Quality**: High-quality instructions crucial for performance

### Next Steps

1. **Advanced Techniques**:
   - Implement reinforcement learning from human feedback (RLHF)
   - Try constitutional AI approaches
   - Explore chain-of-thought instruction tuning

2. **Data Enhancement**:
   - Generate synthetic instructions with GPT-4
   - Implement data augmentation strategies
   - Create cross-lingual instructions

3. **Model Optimization**:
   - Apply LoRA for efficient tuning
   - Implement quantization-aware training
   - Try adapter-based approaches

4. **Production Deployment**:
   - Build instruction routing system
   - Implement response caching
   - Monitor instruction-following metrics

### References

For deeper understanding, consult:
- API usage: `notebooks/tutorials/07_api_usage.ipynb`
- Service integration: `notebooks/tutorials/08_service_integration.ipynb`
- Advanced training: `docs/user_guide/advanced_techniques.md`
- Production deployment: `docs/user_guide/deployment.md`