# üöÄ ApplyVortex Qwen 2.5-7B Fine-Tuning

This notebook fine-tunes **Qwen 2.5-7B-Instruct** on the ApplyVortex dataset for:
- **Resume Parsing** (messy text ‚Üí structured JSON)
- **Job Scoring** (profile + JD ‚Üí match score with reasoning)
- **Resume Tailoring** (profile + JD ‚Üí formatted resume)

### Requirements
- **Runtime**: GPU (T4 free tier works, A100 recommended)
- **Dataset**: `applyvortex_qwen_train.jsonl` (upload when prompted)

---

## 1Ô∏è‚É£ Install Dependencies

In [None]:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

## 2Ô∏è‚É£ Upload Dataset

Upload your `applyvortex_qwen_train.jsonl` file when the file picker appears.

In [None]:
from google.colab import files
import os

# Check if dataset already exists
DATASET_PATH = "applyvortex_qwen_train.jsonl"

if not os.path.exists(DATASET_PATH):
    print("üìÇ Please upload your dataset file...")
    uploaded = files.upload()
    if DATASET_PATH not in uploaded:
        # Rename first uploaded file
        for name in uploaded.keys():
            os.rename(name, DATASET_PATH)
            break
    print(f"‚úÖ Dataset uploaded: {DATASET_PATH}")
else:
    print(f"‚úÖ Dataset already exists: {DATASET_PATH}")

# Quick validation
import json
with open(DATASET_PATH, 'r') as f:
    sample = json.loads(f.readline())
    print(f"\nüìä Sample keys: {sample.keys()}")
    print(f"üìù First message role: {sample['messages'][0]['role']}")

## 3Ô∏è‚É£ Load Base Model

In [None]:
from unsloth import FastLanguageModel
import torch

# Configuration
MODEL_NAME = "unsloth/Qwen2.5-7B-Instruct"
MAX_SEQ_LENGTH = 8192  # Extended context for long CVs and JDs
DTYPE = None  # Auto-detect (Float16 vs Bfloat16)
LOAD_IN_4BIT = True  # 4-bit quantization for memory efficiency

print(f"üîß Loading {MODEL_NAME}...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LENGTH,
    dtype=DTYPE,
    load_in_4bit=LOAD_IN_4BIT,
)
print("‚úÖ Model loaded successfully!")

## 4Ô∏è‚É£ Configure LoRA Adapters

In [None]:
# LoRA Configuration
# r=64: Higher rank for deep behavioral adaptation (complex reasoning tasks)
# Target all key modules for full-network plasticity

model = FastLanguageModel.get_peft_model(
    model,
    r=64,  # Higher rank for nuanced skill-requirement correlations
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_alpha=16,
    lora_dropout=0,  # Optimized for Unsloth
    bias="none",
    use_gradient_checkpointing="unsloth",  # For very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

print("‚úÖ LoRA adapters configured!")
print(f"üìä Trainable parameters: {model.print_trainable_parameters()}")

## 5Ô∏è‚É£ Prepare Dataset

In [None]:
from datasets import load_dataset
from unsloth.chat_templates import get_chat_template

# Configure ChatML template (Qwen's native format)
tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
)

def formatting_prompts_func(examples):
    """Applies ChatML template to the batch."""
    convos = examples["messages"]
    texts = [
        tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
        for convo in convos
    ]
    return {"text": texts}

print("üìÇ Loading dataset...")
dataset = load_dataset("json", data_files=DATASET_PATH, split="train")

# Validate structure
if "messages" not in dataset.column_names:
    raise KeyError("Dataset missing 'messages' column!")

dataset = dataset.map(formatting_prompts_func, batched=True)
print(f"‚úÖ Dataset ready! {len(dataset)} samples loaded.")

# Preview a sample
print("\n--- Sample Preview ---")
print(dataset[0]["text"][:500] + "...")

## 6Ô∏è‚É£ Configure Training

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    dataset_num_proc=2,
    packing=False,  # Disabled for long documents
    args=TrainingArguments(
        per_device_train_batch_size=2,  # Small batch for 8k context
        gradient_accumulation_steps=4,   # Effective batch size = 8
        warmup_ratio=0.05,
        num_train_epochs=2,              # Full coverage of 5k dataset
        learning_rate=5e-5,              # Conservative to preserve reasoning
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        optim="adamw_8bit",              # Memory efficient optimizer
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir="outputs",
        report_to="none",                # Disable W&B for simplicity
    ),
)

print("‚úÖ Trainer configured!")

## 7Ô∏è‚É£ Start Training üèãÔ∏è

In [None]:
# GPU Stats before training
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)

print(f"üñ•Ô∏è GPU: {gpu_stats.name}")
print(f"üìä Max Memory: {max_memory} GB")
print(f"üìä Reserved: {start_gpu_memory} GB")
print("\n" + "="*50)
print("üöÄ TRAINING STARTED...")
print("="*50 + "\n")

trainer_stats = trainer.train()

print("\n" + "="*50)
print("‚úÖ TRAINING COMPLETE!")
print("="*50)

## 8Ô∏è‚É£ Training Stats

In [None]:
# Final memory stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)

print(f"\nüìä Training Statistics")
print(f"{'='*40}")
print(f"Peak Memory Used: {used_memory} GB ({used_percentage}%)")
print(f"LoRA Memory Used: {used_memory_for_lora} GB ({lora_percentage}%)")
print(f"Training Time: {trainer_stats.metrics['train_runtime']:.2f} seconds")
print(f"Samples/Second: {trainer_stats.metrics['train_samples_per_second']:.2f}")

## 9Ô∏è‚É£ Save Model to Google Drive

In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Save paths
LOCAL_OUTPUT_DIR = "ApplyVortex-Qwen2.5-7B-Adapter"
DRIVE_OUTPUT_DIR = "/content/drive/MyDrive/ApplyVortex-Models/ApplyVortex-Qwen2.5-7B-Adapter"

print(f"üíæ Saving model locally to {LOCAL_OUTPUT_DIR}...")
model.save_pretrained(LOCAL_OUTPUT_DIR)
tokenizer.save_pretrained(LOCAL_OUTPUT_DIR)

print(f"‚òÅÔ∏è Copying to Google Drive: {DRIVE_OUTPUT_DIR}...")
!mkdir -p "{DRIVE_OUTPUT_DIR}"
!cp -r {LOCAL_OUTPUT_DIR}/* "{DRIVE_OUTPUT_DIR}/"

print("‚úÖ Model saved to Google Drive!")

## üîü Test the Fine-Tuned Model

In [None]:
# Enable inference mode
FastLanguageModel.for_inference(model)

# Test Resume Parsing
test_resume = """John Smith
john.smith@email.com | +1-555-0123
San Francisco, CA

Senior Software Engineer with 5+ years of Python and AWS experience.

EXPERIENCE
TechCorp - Senior Software Engineer
2021-01 - Present
- Built microservices using Python and FastAPI
- Reduced latency by 40% through Redis caching

EDUCATION
Stanford University - MS Computer Science

SKILLS
Python, AWS, Docker, Kubernetes, PostgreSQL
"""

messages = [
    {"role": "system", "content": "You are a specialized Resume Parsing Engine. Extract candidate data into strict ApplyVortex JSON schema. Return ONLY valid JSON."},
    {"role": "user", "content": test_resume}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=2048,
    use_cache=True,
    temperature=0.1,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("ü§ñ Model Response:")
print(response.split("assistant")[-1] if "assistant" in response else response)

## 1Ô∏è‚É£1Ô∏è‚É£ Export to GGUF (Optional - for Ollama/llama.cpp)

In [None]:
# Uncomment to export to GGUF format for local inference with Ollama

# GGUF_OUTPUT = "ApplyVortex-Qwen2.5-7B-Q4_K_M.gguf"
#
# model.save_pretrained_gguf(
#     GGUF_OUTPUT,
#     tokenizer,
#     quantization_method="q4_k_m"  # Good balance of quality and size
# )
#
# # Copy to Drive
# !cp {GGUF_OUTPUT} "/content/drive/MyDrive/ApplyVortex-Models/"
# print(f"‚úÖ GGUF exported to Google Drive!")

---

## ‚úÖ All Done!

Your fine-tuned model is saved to:
- **Google Drive**: `/MyDrive/ApplyVortex-Models/ApplyVortex-Qwen2.5-7B-Adapter/`

### Next Steps:
1. Download the adapter from Google Drive
2. Load it in your ApplyVortex agent using `peft` or convert to GGUF for Ollama
3. Update your agent's `config.py` to point to the new model

---
*Generated by ApplyVortex Training Pipeline*