# Fine-tune Qwen2.5 v·ªõi LoRA cho LexiLingo
## Multi-Task Learning Pipeline tr√™n Mac Intel

Pipeline n√†y fine-tune Qwen2.5-1.5B-Instruct v·ªõi 4 LoRA adapters:
1. **Fluency Scoring** - ƒê√°nh gi√° ƒë·ªô tr√¥i ch·∫£y (0.0-1.0)
2. **Vocabulary Classification** - Ph√¢n lo·∫°i tr√¨nh ƒë·ªô t·ª´ v·ª±ng (A2/B1/B2)
3. **Grammar Correction** - S·ª≠a l·ªói ng·ªØ ph√°p + gi·∫£i th√≠ch
4. **Dialogue Generation** - T·∫°o ph·∫£n h·ªìi tutor

**Y√™u c·∫ßu h·ªá th·ªëng:**
- **Laptop** v·ªõi 16GB+ RAM (t·ªët nh·∫•t 32GB)
- Python 3.10+
- ~5GB disk space
- **L∆∞u √Ω:** Training s·∫Ω ch·∫≠m h∆°n n·∫øu ch·∫°y tr√™n local thay v√¨ ch·∫°y tr√™n CPU

## 1. Setup Environment

In [1]:
# Install required packages
!pip install -q transformers>=4.36.0 \
    peft>=0.7.0 \
    datasets>=2.16.0 \
    accelerate>=0.25.0 \
    bitsandbytes>=0.41.0 \
    trl>=0.7.0 \
    scipy \
    sentencepiece \
    protobuf \
    wandb

In [2]:
import torch
import json
import os
from pathlib import Path
from datasets import Dataset, load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType
)
from trl import SFTTrainer
import numpy as np
from sklearn.metrics import mean_absolute_error, accuracy_score

# Check device - Mac Intel uses CPU
device = torch.device("cpu")
print(f" Running on CPU (Mac Intel)")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")
print("\nüí° Tip: Training on CPU will be slower. Consider using 4-bit quantization to save memory.")



 Running on CPU (Mac Intel)
PyTorch version: 2.9.0+cpu
CUDA available: False
MPS available: False

üí° Tip: Training on CPU will be slower. Consider using 4-bit quantization to save memory.


## 2. Configuration

In [4]:
# Model configuration
MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
MAX_SEQ_LENGTH = 512

# LoRA configuration for each task
LORA_CONFIGS = {
    "fluency": {
        "task_type": TaskType.CAUSAL_LM,
        "r": 32,
        "lora_alpha": 64,
        "lora_dropout": 0.05,
        "bias": "none",
        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", 
                          "gate_proj", "up_proj", "down_proj"],
        "inference_mode": False
    },
    "vocabulary": {
        "task_type": TaskType.CAUSAL_LM,
        "r": 32,
        "lora_alpha": 64,
        "lora_dropout": 0.05,
        "bias": "none",
        "target_modules": ["q_proj", "v_proj", "o_proj"],
        "inference_mode": False
    },
    "grammar": {
        "task_type": TaskType.CAUSAL_LM,
        "r": 32,
        "lora_alpha": 64,
        "lora_dropout": 0.05,
        "bias": "none",
        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
        "inference_mode": False
    },
    "dialogue": {
        "task_type": TaskType.CAUSAL_LM,
        "r": 32,
        "lora_alpha": 64,
        "lora_dropout": 0.05,
        "bias": "none",
        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", 
                          "gate_proj", "up_proj", "down_proj"],
        "inference_mode": False
    }
}

# Training configuration (optimized for Mac Intel i9 CPU)
TRAINING_CONFIG = {
    "output_dir": "./outputs",
    "num_train_epochs": 5,
    "per_device_train_batch_size": 2,  # Reduced for CPU
    "per_device_eval_batch_size": 2,
    "gradient_accumulation_steps": 16,  # Increased to compensate for smaller batch
    "learning_rate": 3e-4,
    "weight_decay": 0.01,
    "warmup_ratio": 0.03,
    "lr_scheduler_type": "cosine",
    "logging_steps": 5,  # More frequent logging
    "save_steps": 100,
    "eval_steps": 100,
    "save_total_limit": 2,
    "fp16": False,  # CPU doesn't support fp16 well
    "bf16": False,  # CPU doesn't support bf16
    "gradient_checkpointing": True,
    "optim": "adamw_torch",
    "report_to": "none",  # Change to "wandb" if needed
    "dataloader_num_workers": 4,  # Use multiple CPU cores
}

# Create output directories
Path(TRAINING_CONFIG["output_dir"]).mkdir(parents=True, exist_ok=True)
Path("./data").mkdir(parents=True, exist_ok=True)
Path("./adapters").mkdir(parents=True, exist_ok=True)

print("‚úì Configuration set for Mac Intel i9 (CPU)")
print(f"  Batch size: {TRAINING_CONFIG['per_device_train_batch_size']} (effective: {TRAINING_CONFIG['per_device_train_batch_size'] * TRAINING_CONFIG['gradient_accumulation_steps']})")
print(f"  Precision: FP32 (full precision)")
print(f"  CPU workers: {TRAINING_CONFIG['dataloader_num_workers']}")

‚úì Configuration set for Mac Intel i9 (CPU)
  Batch size: 2 (effective: 32)
  Precision: FP32 (full precision)
  CPU workers: 4


## 3. Load Base Model & Tokenizer

In [5]:
# Quantization config for memory efficiency on CPU
# 4-bit quantization helps reduce RAM usage significantly
print("Loading model with 4-bit quantization for CPU...")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float32,  # Use float32 for CPU
    bnb_4bit_use_double_quant=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
    padding_side="left"  # Important for causal LM
)
tokenizer.pad_token = tokenizer.eos_token

# Load base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",  # Auto handles CPU placement
    trust_remote_code=True,
    low_cpu_mem_usage=True,  # Enable for better memory efficiency
)

# Enable gradient checkpointing
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

print(f"‚úì Model loaded: {MODEL_NAME}")
print(f"  Parameters: {base_model.num_parameters() / 1e9:.2f}B")
print(f"  Quantization: 4-bit (saves ~75% memory)")
print(f"  Device: CPU")
print("\n‚è±Ô∏è  Expected training time per adapter: 2-3 hours (with small dataset)")
print("üí° Consider using smaller sample for testing, then scale up with full data")

Loading model with 4-bit quantization for CPU...


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

‚úì Model loaded: Qwen/Qwen2.5-1.5B-Instruct
  Parameters: 1.54B
  Quantization: 4-bit (saves ~75% memory)
  Device: CPU

‚è±Ô∏è  Expected training time per adapter: 2-3 hours (with small dataset)
üí° Consider using smaller sample for testing, then scale up with full data


## 4. Prepare Training Data

### 4.1 Fluency Scoring Dataset

In [6]:
# Sample fluency scoring data
fluency_data = [
    {
        "text": "I like learning English",
        "score": 0.90,
        "reasoning": "Clear subject-verb agreement, natural word order, appropriate vocabulary"
    },
    {
        "text": "Yesterday I go to school",
        "score": 0.65,
        "reasoning": "Incorrect past tense usage, should be 'went'"
    },
    {
        "text": "She don't like coffee",
        "score": 0.55,
        "reasoning": "Subject-verb disagreement, should be 'doesn't'"
    },
    {
        "text": "The weather is beautiful today",
        "score": 0.95,
        "reasoning": "Perfect grammar, natural expression, clear meaning"
    },
    {
        "text": "Me and my friend goes to park",
        "score": 0.45,
        "reasoning": "Multiple errors: pronoun case, subject-verb agreement, missing article"
    }
]

def format_fluency_prompt(example):
    """Format data for fluency scoring task"""
    prompt = f"""Rate the fluency of this English sentence on a scale of 0.0 to 1.0:
Sentence: {example['text']}

Provide:
1. Fluency score (0.0-1.0)
2. Brief reasoning

Format: Score: X.XX | Reason: ..."""
    
    response = f"Score: {example['score']:.2f} | Reason: {example['reasoning']}"
    
    # Qwen chat template
    messages = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": response}
    ]
    
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

# Create dataset
fluency_dataset = Dataset.from_list(fluency_data)
fluency_dataset = fluency_dataset.map(format_fluency_prompt)

print(f"Fluency dataset size: {len(fluency_dataset)}")
print("\nExample:")
print(fluency_dataset[0]["text"][:500])

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Fluency dataset size: 5

Example:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Rate the fluency of this English sentence on a scale of 0.0 to 1.0:
Sentence: I like learning English

Provide:
1. Fluency score (0.0-1.0)
2. Brief reasoning

Format: Score: X.XX | Reason: ...<|im_end|>
<|im_start|>assistant
Score: 0.90 | Reason: Clear subject-verb agreement, natural word order, appropriate vocabulary<|im_end|>



### 4.2 Vocabulary Classification Dataset

In [12]:
# Sample vocabulary classification data
vocabulary_data = [
    {
        "text": "I like to eat apples",
        "level": "A2",
        "key_words": "like (A2), eat (A2), apples (A2)"
    },
    {
        "text": "We should discuss the opportunity",
        "level": "B1",
        "key_words": "discuss (B1), opportunity (B1)"
    },
    {
        "text": "His argument was quite eloquent",
        "level": "B2",
        "key_words": "argument (B2), eloquent (B2)"
    },
    {
        "text": "The weather is nice today",
        "level": "A2",
        "key_words": "weather (A2), nice (A2)"
    },
    {
        "text": "She demonstrated remarkable perseverance",
        "level": "B2",
        "key_words": "demonstrated (B2), remarkable (B2), perseverance (B2)"
    }
]

def format_vocabulary_prompt(example):
    """Format data for vocabulary classification task"""
    prompt = f"""Classify the vocabulary level of this English sentence according to CEFR:
Sentence: {example['text']}

Provide:
1. CEFR Level (A2, B1, or B2)
2. Key vocabulary words with their levels

Format: Level: XX | Key words: ..."""
    
    response = f"Level: {example['level']} | Key words: {example['key_words']}"
    
    messages = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": response}
    ]
    
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

vocabulary_dataset = Dataset.from_list(vocabulary_data)
vocabulary_dataset = vocabulary_dataset.map(format_vocabulary_prompt)

print(f"Vocabulary dataset size: {len(vocabulary_dataset)}")

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Vocabulary dataset size: 5


### 4.3 Grammar Correction Dataset

In [13]:
# Sample grammar correction data
grammar_data = [
    {
        "incorrect": "She don't like coffee",
        "correct": "She doesn't like coffee",
        "explanation": "Subject-verb agreement: 'she' (3rd person singular) requires 'doesn't' not 'don't'"
    },
    {
        "incorrect": "I goes to school yesterday",
        "correct": "I went to school yesterday",
        "explanation": "Incorrect tense: 'yesterday' requires past tense 'went', not present 'goes'"
    },
    {
        "incorrect": "He have a car",
        "correct": "He has a car",
        "explanation": "Subject-verb agreement: 'he' (3rd person singular) requires 'has' not 'have'"
    },
    {
        "incorrect": "They was playing",
        "correct": "They were playing",
        "explanation": "Subject-verb agreement: 'they' (plural) requires 'were' not 'was'"
    },
    {
        "incorrect": "I am go to school",
        "correct": "I am going to school",
        "explanation": "Continuous tense requires present participle 'going', not base form 'go'"
    }
]

def format_grammar_prompt(example):
    """Format data for grammar correction task"""
    prompt = f"""Correct the grammar errors in this English sentence:
Incorrect: {example['incorrect']}

Provide:
1. Corrected sentence
2. Brief explanation of the error

Format: Correct: ... | Explanation: ..."""
    
    response = f"Correct: {example['correct']} | Explanation: {example['explanation']}"
    
    messages = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": response}
    ]
    
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

grammar_dataset = Dataset.from_list(grammar_data)
grammar_dataset = grammar_dataset.map(format_grammar_prompt)

print(f"Grammar dataset size: {len(grammar_dataset)}")

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Grammar dataset size: 5


### 4.4 Dialogue Generation Dataset

In [14]:
# Sample dialogue generation data
dialogue_data = [
    {
        "user_input": "I like learning English",
        "fluency_score": 0.90,
        "level": "A2",
        "errors": "None",
        "response": "That's wonderful! Your sentence is very clear and natural. You're using the present tense correctly. Would you like to tell me more about why you enjoy learning English?"
    },
    {
        "user_input": "Yesterday I go to school",
        "fluency_score": 0.65,
        "level": "A2",
        "errors": "Incorrect past tense: 'go' should be 'went'",
        "response": "Good try! You're using 'yesterday' correctly to indicate past time. However, with past time expressions, we need the past tense. Try saying: 'Yesterday I went to school.' Can you repeat that?"
    },
    {
        "user_input": "She don't like coffee",
        "fluency_score": 0.55,
        "level": "A2",
        "errors": "Subject-verb agreement: 'don't' should be 'doesn't'",
        "response": "Almost there! When we use 'she', 'he', or 'it', we need 'doesn't' instead of 'don't'. The correct sentence is: 'She doesn't like coffee.' This is because 'she' is third person singular. Let's practice this pattern."
    },
    {
        "user_input": "We should discuss the opportunity",
        "fluency_score": 0.92,
        "level": "B1",
        "errors": "None",
        "response": "Excellent! Your sentence structure is perfect, and you're using B1 level vocabulary very naturally. The modal verb 'should' is used correctly to make a suggestion. What opportunity are you thinking about?"
    },
    {
        "user_input": "The weather is beautiful today",
        "fluency_score": 0.95,
        "level": "A2",
        "errors": "None",
        "response": "Perfect! Your sentence is grammatically correct and very natural. You're using the present tense 'is' correctly with 'weather', and your word choice is appropriate. Great job!"
    }
]

def format_dialogue_prompt(example):
    """Format data for dialogue generation task"""
    prompt = f"""You are an encouraging English tutor. Generate a supportive response to the student's input.

Student Input: {example['user_input']}
Fluency Score: {example['fluency_score']}
Vocabulary Level: {example['level']}
Errors Detected: {example['errors']}

Generate an encouraging tutor response that:
1. Acknowledges what the student did well
2. Gently corrects errors if any
3. Provides a learning point
4. Encourages further practice"""
    
    response = example['response']
    
    messages = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": response}
    ]
    
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

dialogue_dataset = Dataset.from_list(dialogue_data)
dialogue_dataset = dialogue_dataset.map(format_dialogue_prompt)

print(f"Dialogue dataset size: {len(dialogue_dataset)}")

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dialogue dataset size: 5


## 5. Fine-tune Function

In [18]:
def finetune_with_lora(task_name, dataset, lora_config):
    """
    Fine-tune base model with LoRA adapter for specific task
    
    Args:
        task_name: Name of the task (fluency, vocabulary, grammar, dialogue)
        dataset: Training dataset
        lora_config: LoRA configuration dict
    
    Returns:
        Trained model with LoRA adapter
    """
    print(f"\n{'='*60}")
    print(f"Training {task_name.upper()} adapter")
    print(f"{'='*60}\n")
    
    # Create LoRA config
    peft_config = LoraConfig(**lora_config)
    
    # Prepare model for training
    model = base_model
    model = prepare_model_for_kbit_training(model)
    model = get_peft_model(model, peft_config)
    
    # Print trainable parameters
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())
    print(f"Trainable params: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
    print(f"Total params: {total_params:,}")
    
    # Tokenize dataset
    def tokenize_function(examples):
        # Tokenize the text
        result = tokenizer(
            examples["text"],
            truncation=True,
            max_length=MAX_SEQ_LENGTH,
            padding="max_length",
        )
        # For causal LM, labels are the same as input_ids
        result["labels"] = result["input_ids"].copy()
        return result
    
    tokenized_dataset = dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=dataset.column_names,
    )
    
    # Training arguments
    training_args = TrainingArguments(
        output_dir=f"{TRAINING_CONFIG['output_dir']}/{task_name}",
        num_train_epochs=TRAINING_CONFIG['num_train_epochs'],
        per_device_train_batch_size=TRAINING_CONFIG['per_device_train_batch_size'],
        gradient_accumulation_steps=TRAINING_CONFIG['gradient_accumulation_steps'],
        learning_rate=TRAINING_CONFIG['learning_rate'],
        weight_decay=TRAINING_CONFIG['weight_decay'],
        warmup_ratio=TRAINING_CONFIG['warmup_ratio'],
        lr_scheduler_type=TRAINING_CONFIG['lr_scheduler_type'],
        logging_steps=TRAINING_CONFIG['logging_steps'],
        save_steps=TRAINING_CONFIG['save_steps'],
        save_total_limit=TRAINING_CONFIG['save_total_limit'],
        fp16=TRAINING_CONFIG['fp16'],
        bf16=TRAINING_CONFIG['bf16'],
        gradient_checkpointing=TRAINING_CONFIG['gradient_checkpointing'],
        optim=TRAINING_CONFIG['optim'],
        report_to=TRAINING_CONFIG['report_to'],
        logging_first_step=True,
        push_to_hub=False,
        dataloader_num_workers=TRAINING_CONFIG['dataloader_num_workers'],
        dataloader_pin_memory=False,
    )
    
    # Data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,  # Causal LM, not masked LM
    )
    
    # Create standard Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset,
        data_collator=data_collator,
    )
    
    # Train
    print(f"\nStarting training for {task_name}...")
    print(" Training on CPU - this will take longer than GPU/MPS")
    print("üí° Monitor CPU usage with Activity Monitor to ensure efficient utilization\n")
    
    trainer.train()
    
    # Save adapter
    adapter_path = f"./adapters/{task_name}_lora_adapter"
    model.save_pretrained(adapter_path)
    tokenizer.save_pretrained(adapter_path)
    print(f"\n‚úì LoRA adapter saved to: {adapter_path}")
    
    return model

## 6. Train All Adapters

In [None]:
# Train each task adapter
# Note: In production, use larger datasets (1500-3000 samples per task)

# 6.1 Train Fluency Scoring Adapter
fluency_model = finetune_with_lora(
    task_name="fluency",
    dataset=fluency_dataset,
    lora_config=LORA_CONFIGS["fluency"]
)


Training FLUENCY adapter





Trainable params: 36,929,536 (3.99%)
Total params: 925,545,984


Map:   0%|          | 0/5 [00:00<?, ? examples/s]


Starting training for fluency...
 Training on CPU - this will take longer than GPU/MPS
üí° Monitor CPU usage with Activity Monitor to ensure efficient utilization





KeyboardInterrupt: 

In [None]:
# 6.2 Train Vocabulary Classification Adapter
vocabulary_model = finetune_with_lora(
    task_name="vocabulary",
    dataset=vocabulary_dataset,
    lora_config=LORA_CONFIGS["vocabulary"]
)

In [None]:
# 6.3 Train Grammar Correction Adapter
grammar_model = finetune_with_lora(
    task_name="grammar",
    dataset=grammar_dataset,
    lora_config=LORA_CONFIGS["grammar"]
)

In [None]:
# 6.4 Train Dialogue Generation Adapter
dialogue_model = finetune_with_lora(
    task_name="dialogue",
    dataset=dialogue_dataset,
    lora_config=LORA_CONFIGS["dialogue"]
)

## 7. Test Inference

In [None]:
from peft import PeftModel

def test_adapter(task_name, test_prompt):
    """
    Test a trained LoRA adapter
    
    Args:
        task_name: Name of the task adapter to test
        test_prompt: Test prompt text
    """
    print(f"\n{'='*60}")
    print(f"Testing {task_name.upper()} adapter")
    print(f"{'='*60}\n")
    
    # Load base model + adapter (with quantization for CPU)
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float32,
        bnb_4bit_use_double_quant=True,
    )
    
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
        low_cpu_mem_usage=True
    )
    
    model = PeftModel.from_pretrained(
        model,
        f"./adapters/{task_name}_lora_adapter"
    )
    
    # Prepare input
    messages = [{"role": "user", "content": test_prompt}]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer(text, return_tensors="pt")
    
    # Generate (will be slower on CPU)
    print("‚è≥ Generating response on CPU...")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    
    print(f"Prompt: {test_prompt}")
    print(f"\nResponse: {response}")
    print("\n" + "="*60)
    
    return response

In [None]:
# Test Fluency Scoring
test_adapter(
    "fluency",
    "Rate the fluency of this sentence: She plays piano every day"
)

In [None]:
# Test Vocabulary Classification
test_adapter(
    "vocabulary",
    "Classify the vocabulary level: The presentation was incredibly sophisticated"
)

In [None]:
# Test Grammar Correction
test_adapter(
    "grammar",
    "Correct the grammar: He don't want to go there"
)

In [None]:
# Test Dialogue Generation
test_adapter(
    "dialogue",
    """Generate a tutor response:
Student Input: I likes playing basketball
Fluency Score: 0.60
Level: A2
Errors: Subject-verb agreement"""
)

## 8. Export for Production

### 8.1 Merge Adapters (Optional)

In [None]:
# Merge LoRA weights into base model (optional, for deployment)
def merge_and_save_adapter(task_name):
    """
    Merge LoRA adapter weights into base model and save
    This creates a standalone model without needing PEFT library
    """
    print(f"Merging {task_name} adapter...")
    
    # Load base + adapter
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True
    )
    
    model = PeftModel.from_pretrained(
        model,
        f"./adapters/{task_name}_lora_adapter"
    )
    
    # Merge and unload
    merged_model = model.merge_and_unload()
    
    # Save merged model
    output_path = f"./merged_models/{task_name}_merged"
    Path(output_path).mkdir(parents=True, exist_ok=True)
    
    merged_model.save_pretrained(output_path)
    tokenizer.save_pretrained(output_path)
    
    print(f"‚úì Merged model saved to: {output_path}")

# Example: Merge fluency adapter
# merge_and_save_adapter("fluency")

### 8.2 Quantize for Mobile (Production Mode)

In [None]:
# Quantize to 4-bit for mobile deployment
# Note: This requires optimum library
# !pip install optimum

# from optimum.quanto import quantize, qint4

# def quantize_for_mobile(task_name):
#     """Quantize merged model to 4-bit for mobile"""
#     model_path = f"./merged_models/{task_name}_merged"
#     model = AutoModelForCausalLM.from_pretrained(model_path)
#     
#     # Quantize
#     quantize(model, weights=qint4)
#     
#     # Save
#     output_path = f"./quantized_models/{task_name}_q4"
#     Path(output_path).mkdir(parents=True, exist_ok=True)
#     model.save_pretrained(output_path)
#     
#     print(f"‚úì Quantized model saved to: {output_path}")

## 9. Summary & Next Steps

In [None]:
print("""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë          FINE-TUNING PIPELINE COMPLETED                      ‚ïë
‚ïë              (Mac Intel i9 - CPU Mode)                       ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

‚úì Base Model: Qwen2.5-1.5B-Instruct (4-bit quantized)
‚úì 4 LoRA Adapters Trained:
  1. Fluency Scoring (r=32, alpha=64)
  2. Vocabulary Classification (r=32, alpha=64)
  3. Grammar Correction (r=32, alpha=64)
  4. Dialogue Generation (r=32, alpha=64)

Saved Adapters:
  ./adapters/fluency_lora_adapter/
  ./adapters/vocabulary_lora_adapter/
  ./adapters/grammar_lora_adapter/
  ./adapters/dialogue_lora_adapter/

 MAC INTEL i9 CONSIDERATIONS:

CPU Training Performance:
  ‚Ä¢ 2-3x slower than M1/M2 with MPS
  ‚Ä¢ Batch size: 2 (vs 8 on GPU)
  ‚Ä¢ Gradient accumulation: 16 steps
  ‚Ä¢ Expected time: 2-3 hours per adapter (small dataset)

Optimizations Applied:
  ‚úì 4-bit quantization (75% RAM saving)
  ‚úì Gradient checkpointing enabled
  ‚úì Multi-core dataloader (4 workers)
  ‚úì Low CPU memory usage mode
  ‚úì FP32 precision (CPU compatible)

NEXT STEPS:

1. Collect Real Data:
   - Fluency: 1,500-3,000 annotated sentences
   - Vocabulary: 2,500 CEFR-labeled sentences
   - Grammar: 2,000 error-correction pairs
   - Dialogue: 1,500 tutor-student conversations

2. Re-train with Full Dataset:
   - Expected training time: 8-12 hours per adapter on CPU
   - Consider using cloud GPU for faster training
   - Or train overnight with smaller learning rate

3. Evaluate Performance:
   - Fluency: MAE < 0.12, Pearson > 0.90
   - Vocabulary: Accuracy > 90%
   - Grammar: F0.5 > 68
   - Dialogue: Quality > 96%

4. Create Production Models:
   - Knowledge distillation to Qwen2.5-0.5B
   - LoRA rank: 16 (instead of 32)
   - Expected: 88-91% accuracy, 2x faster

5. Deploy:
   - Mobile: Quantized 0.5B model (~350MB)
   - Server: Full 1.5B model (~1GB)
   - Switch adapters in < 1ms

Memory Usage (Mac Intel):
  Development: 1GB storage, 4-6GB RAM (with 4-bit)
  Production: 350MB storage, 2GB RAM

üí° Tips for Mac Intel:
  - Close unnecessary apps to free RAM
  - Use Activity Monitor to check CPU utilization
  - Consider training with smaller epochs first (2-3)
  - Use wandb: set report_to="wandb" for monitoring
  - Train one adapter at a time to avoid memory issues
  - Consider using Google Colab GPU for faster iteration

üîß Alternative: Use Cloud GPU
  - Google Colab: Free T4 GPU (~15-20 min per adapter)
  - Kaggle: Free GPU/TPU (~20-30 min per adapter)
  - AWS/Azure: Paid GPU instances (~10-15 min per adapter)

""")