# 03 - Stage 3: LLM Judge with LoRA Fine-Tuning

## Overview
This notebook implements Stage 3 - the final scoring layer with explainability:
- **Stage 3: LLM Judge with Structured Outputs**
- Load Llama-3.2-1B or Mistral-7B with 4-bit quantization
- Fine-tune with LoRA for resume-JD matching
- Generate structured JSON outputs with scores + explanations
- Provide human-readable justifications for hiring decisions

**‚ö†Ô∏è GPU REQUIRED**: This notebook needs at least T4 GPU (15GB VRAM)

**Key Features**:
- üß† Explainable AI: Not just scores, but reasoning
- üíæ Memory efficient: 4-bit quantization + LoRA (no full model fine-tuning)
- üìã Structured outputs: Force JSON format for parsing
- üéØ Domain adaptation: Fine-tune on resume screening task

**Estimated Time**: 2-4 hours (depending on GPU)

## 1. Environment Setup & GPU Check

In [1]:
# Check runtime environment (Google Colab only)
import sys
import os

IN_COLAB = 'google.colab' in sys.modules

print(f"Running in Google Colab: {IN_COLAB}")
if not IN_COLAB:
    print("‚ö†Ô∏è WARNING: This notebook is designed for Google Colab")
    print("‚ö†Ô∏è This notebook requires significant GPU resources")
print(f"Python version: {sys.version}")

# Check GPU availability (Required for LLM training)
import torch
print(f"\nGPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Device: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

    if torch.cuda.get_device_properties(0).total_memory / 1024**3 < 15:
        print("\n‚ö†Ô∏è WARNING: Less than 15GB GPU memory detected")
        print("   Consider using Colab Pro or reducing batch size")
else:
    print("\n‚ùå ERROR: GPU is required for this notebook!")
    print("   In Colab: Runtime ‚Üí Change runtime type ‚Üí GPU")
    raise RuntimeError("GPU required for LLM fine-tuning")

# Check if using high RAM runtime
import psutil
ram_gb = psutil.virtual_memory().total / (1024**3)
print(f"\nSystem RAM: {ram_gb:.1f} GB")
if ram_gb < 12:
    print("‚ö†Ô∏è WARNING: Consider using High-RAM runtime in Colab")

Running in Google Colab: True
Python version: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]

GPU available: True
Device: Tesla T4
Memory: 14.74 GB

   Consider using Colab Pro or reducing batch size

System RAM: 12.7 GB


## 2. Install Required Packages

In [2]:
%%capture
# Install all required packages for LLM fine-tuning
print("Installing packages (this may take 2-3 minutes)...")

# Fix pyarrow binary incompatibility first
!pip install -q --force-reinstall pyarrow

!pip install -q -U transformers
!pip install -q -U peft
!pip install -q -U trl
!pip install -q -U datasets
!pip install -q -U accelerate
!pip install -q -U bitsandbytes
!pip install -q -U scipy

# Utility packages
!pip install -q pandas numpy scikit-learn tqdm

print("‚úÖ All packages installed!")

In [3]:
# Load configuration from previous notebooks (Google Drive)
from pathlib import Path

if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    BASE_PATH = Path('/content/drive/MyDrive/resume_screening_project')
    print(f"‚úÖ Using Google Drive: {BASE_PATH}")
else:
    print("‚ö†Ô∏è Not running in Colab - using local fallback")
    BASE_PATH = Path('./resume_screening_project')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Using Google Drive: /content/drive/MyDrive/resume_screening_project


In [22]:
# Import libraries
import torch
import transformers
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
from trl import SFTTrainer, SFTConfig  # Use SFTConfig instead of TrainingArguments
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    PeftModel
)
from datasets import Dataset

import pandas as pd
import numpy as np
import json
import pickle
from pathlib import Path
from typing import List, Dict
import warnings
from tqdm.auto import tqdm

warnings.filterwarnings('ignore')

print(f"‚úÖ torch version: {torch.__version__}")
print(f"‚úÖ transformers version: {transformers.__version__}")
print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"‚úÖ VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

‚úÖ torch version: 2.9.0+cu126
‚úÖ transformers version: 5.0.0
‚úÖ CUDA available: True
‚úÖ GPU: Tesla T4
‚úÖ VRAM: 14.7 GB


## 3. Load Configuration and Stage 2 Results

In [5]:
# Setup paths
if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    BASE_PATH = Path('/content/drive/MyDrive/resume_screening_project')
    print(f"‚úÖ Using Google Drive: {BASE_PATH}")
else:
    print("‚ö†Ô∏è Not running in Colab - using local fallback")
    BASE_PATH = Path('./resume_screening_project')

MODELS_PATH = BASE_PATH / 'models'
STAGE2_PATH = MODELS_PATH / 'stage2_reranker'
STAGE3_PATH = MODELS_PATH / 'stage3_llm_judge'
OUTPUTS_PATH = BASE_PATH / 'outputs'

STAGE3_PATH.mkdir(parents=True, exist_ok=True)

print(f"üìÅ Working Directory: {BASE_PATH}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Using Google Drive: /content/drive/MyDrive/resume_screening_project
üìÅ Working Directory: /content/drive/MyDrive/resume_screening_project


In [6]:
# Load Stage 2 results
cache_path = STAGE2_PATH / 'reranking_cache.pkl'

if cache_path.exists():
    print("Loading Stage 2 re-ranking results...")
    with open(cache_path, 'rb') as f:
        stage2_cache = pickle.load(f)

    job_descriptions = stage2_cache['job_descriptions']
    stage2_results = stage2_cache['reranked_results']

    print(f"‚úÖ Loaded Stage 2 results")
    print(f"   - Job descriptions: {len(job_descriptions)}")
    print(f"   - Top candidates per JD: {len(stage2_results[0])}")
else:
    print("‚ö†Ô∏è Stage 2 cache not found. Creating sample data...")
    job_descriptions = ["ML Engineer with Python experience"]
    stage2_results = [[{
        'resume_text': f'Candidate {i} with ML skills',
        'stage2_score': 0.8 - i*0.01
    } for i in range(10)]]

Loading Stage 2 re-ranking results...
‚úÖ Loaded Stage 2 results
   - Job descriptions: 3
   - Top candidates per JD: 100


## 4. Prepare Training Data

Format: Chat template with system, user, and assistant messages

In [7]:
# FIX #2: LLM Hallucination Prevention
print("=" * 60)
print("FIX #2: PREVENTING LLM HALLUCINATIONS")
print("=" * 60)
print("\nüîç Problem: LLM might claim 'candidate has 5 years AWS experience'")
print("   when resume only mentions AWS once in a skills list.")
print("\n‚úÖ Solution: Extract verifiable facts FIRST, then validate LLM output")

import re
from typing import Dict, List, Set

def extract_resume_facts(resume_text: str) -> Dict[str, any]:
    """
    Extract verifiable facts from resume using regex and NLP.
    This creates a 'ground truth' to validate LLM claims.
    """
    facts = {
        'skills': set(),
        'years_experience': {},
        'education': [],
        'certifications': [],
        'technologies': set(),
    }

    text_lower = resume_text.lower()

    # Extract skills (common tech keywords)
    tech_keywords = [
        'python', 'java', 'javascript', 'typescript', 'c\\+\\+', 'c#', 'ruby', 'go', 'rust',
        'aws', 'azure', 'gcp', 'docker', 'kubernetes', 'terraform',
        'sql', 'postgresql', 'mongodb', 'redis', 'elasticsearch',
        'react', 'angular', 'vue', 'django', 'flask', 'spring',
        'machine learning', 'deep learning', 'nlp', 'computer vision',
        'agile', 'scrum', 'ci/cd', 'git', 'jenkins'
    ]

    for keyword in tech_keywords:
        if re.search(r'\b' + keyword + r'\b', text_lower):
            facts['skills'].add(keyword.replace('\\+\\+', '++').replace('\\#', '#'))

    # Extract years of experience (e.g., "5 years Python", "3+ years AWS")
    exp_patterns = [
        r'(\d+)\+?\s*years?\s+(?:of\s+)?(?:experience\s+)?(?:with\s+)?(\w+)',
        r'(\w+)\s*[:-]\s*(\d+)\+?\s*years?',
    ]

    for pattern in exp_patterns:
        matches = re.findall(pattern, text_lower)
        for match in matches:
            if match[0].isdigit():
                years, tech = int(match[0]), match[1]
            else:
                tech, years = match[0], int(match[1])
            facts['years_experience'][tech] = max(
                facts['years_experience'].get(tech, 0),
                years
            )

    # Extract education (degrees)
    education_patterns = [
        r'(bachelor|master|phd|doctorate)(?:\'s|\s+of\s+\w+)?\s+(?:degree\s+)?(?:in\s+)?(\w+(?:\s+\w+){0,3})',
        r'(b\.?s\.?|m\.?s\.?|ph\.?d\.?)\s+(?:in\s+)?(\w+(?:\s+\w+){0,2})',
    ]

    for pattern in education_patterns:
        matches = re.findall(pattern, text_lower)
        for degree, field in matches:
            facts['education'].append(f"{degree.upper()} in {field.title()}")

    # Extract certifications
    cert_keywords = ['certified', 'certification', 'certificate']
    for keyword in cert_keywords:
        if keyword in text_lower:
            # Extract surrounding context
            pattern = r'([\w\s-]+\s+' + keyword + r'[\w\s-]+)'
            matches = re.findall(pattern, text_lower)
            facts['certifications'].extend(matches[:5])  # Limit to 5

    return facts

def verify_llm_claims(llm_output: str, resume_facts: Dict[str, any]) -> Dict[str, any]:
    """
    Verify LLM's claims against extracted resume facts.
    Flags hallucinations and unsupported claims.
    """
    llm_lower = llm_output.lower()
    issues = []
    verified_claims = []

    # Check if LLM claims specific years of experience
    exp_claims = re.findall(r'(\d+)\s*years?\s+(?:of\s+)?(\w+)\s+experience', llm_lower)
    for years_str, tech in exp_claims:
        years_claimed = int(years_str)
        tech_clean = tech.strip()

        actual_years = resume_facts['years_experience'].get(tech_clean, 0)

        if actual_years == 0 and tech_clean not in resume_facts['skills']:
            issues.append({
                'type': 'HALLUCINATION',
                'claim': f"{years_claimed} years of {tech} experience",
                'evidence': f"{tech} not found in resume",
                'severity': 'HIGH'
            })
        elif actual_years > 0 and years_claimed != actual_years:
            issues.append({
                'type': 'EXAGGERATION',
                'claim': f"{years_claimed} years of {tech}",
                'evidence': f"Resume states {actual_years} years",
                'severity': 'MEDIUM'
            })
        else:
            verified_claims.append(f"{years_claimed} years {tech}")

    # Check if LLM claims skills not in resume
    for skill in resume_facts['skills']:
        if skill in llm_lower:
            verified_claims.append(f"Has {skill} skill")

    # Calculate trust score
    total_claims = len(verified_claims) + len(issues)
    trust_score = len(verified_claims) / total_claims if total_claims > 0 else 1.0

    return {
        'trust_score': trust_score,
        'verified_claims': verified_claims,
        'hallucinations': issues,
        'is_trustworthy': trust_score > 0.7 and len([i for i in issues if i['severity'] == 'HIGH']) == 0
    }

# Prepare training dataset with fact-checking
def create_training_examples_v2(jd_list, candidates_list, max_examples=500):
    """
    Create training examples with FACT-GROUNDED explanations (no hallucinations).
    """
    examples = []

    for jd, candidates in zip(jd_list, candidates_list):
        top_candidates = candidates[:5]
        bottom_candidates = candidates[-5:]

        for cand_list, is_positive in [(top_candidates, True), (bottom_candidates, False)]:
            for cand in cand_list:
                # EXTRACT FACTS FIRST
                resume_facts = extract_resume_facts(cand['resume_text'])

                score = int(cand.get('stage2_score', 0.5) * 100)

                # Build EVIDENCE-BASED explanation
                skills_found = list(resume_facts['skills'])[:5]
                experience_claims = [f"{years}+ years in {tech}"
                                   for tech, years in resume_facts['years_experience'].items()]

                if is_positive:
                    explanation = f"Strong match based on verified qualifications. "
                    if skills_found:
                        explanation += f"Resume demonstrates: {', '.join(skills_found)}. "
                    if experience_claims:
                        explanation += f"Experience: {'; '.join(experience_claims[:3])}. "
                    explanation += f"Overall score: {score}/100"
                else:
                    explanation = f"Limited match. "
                    if skills_found:
                        explanation += f"Found skills: {', '.join(skills_found[:3])}, but missing key requirements. "
                    else:
                        explanation += "Critical skills not evidenced in resume. "
                    explanation += f"Score: {score}/100"

                # Format as chat with FACT-GROUNDED responses
                conversation = [
                    {
                        "role": "system",
                        "content": "You are an expert technical recruiter. ONLY make claims that are directly supported by the resume text. Never invent or exaggerate qualifications. Provide match analysis with extracted facts."
                    },
                    {
                        "role": "user",
                        "content": f"Job Description:\n{jd[:500]}\n\nResume:\n{cand['resume_text'][:500]}\n\nProvide factual match analysis."
                    },
                    {
                        "role": "assistant",
                        "content": json.dumps({
                            "match_score": score,
                            "explanation": explanation,
                            "verified_skills": list(skills_found),
                            "experience_evidence": experience_claims,
                            "education": resume_facts['education'],
                            "recommendation": "Recommended" if is_positive else "Not recommended"
                        }, indent=2)
                    }
                ]

                examples.append({"messages": conversation})

                if len(examples) >= max_examples:
                    return examples

    return examples

print("\nüìä Testing Fact Extraction & Verification:")

sample_resume = """
Senior Software Engineer with 5 years Python experience and 3 years AWS.
Master's degree in Computer Science. Certified Kubernetes Administrator.
Built ML models using TensorFlow and PyTorch. Experience with Docker, CI/CD.
"""

sample_llm_output = """
Excellent candidate with 8 years of Python experience and 5 years of AWS expertise.
Has extensive experience with blockchain and Rust programming.
"""

facts = extract_resume_facts(sample_resume)
print(f"\n‚úÖ Extracted Facts:")
print(f"   Skills: {facts['skills']}")
print(f"   Experience: {facts['years_experience']}")
print(f"   Education: {facts['education']}")

verification = verify_llm_claims(sample_llm_output, facts)
print(f"\nüîç LLM Verification:")
print(f"   Trust Score: {verification['trust_score']:.2%}")
print(f"   Verified Claims: {verification['verified_claims']}")
print(f"   Hallucinations: {verification['hallucinations']}")
print(f"   Is Trustworthy: {verification['is_trustworthy']}")

print("\nüéØ Creating fact-grounded training data...")
training_examples = create_training_examples_v2(job_descriptions, stage2_results, max_examples=500)

print(f"‚úÖ Created {len(training_examples)} HALLUCINATION-FREE training examples")
print(f"\nüí° All training examples now use verified facts from resumes!")


FIX #2: PREVENTING LLM HALLUCINATIONS

üîç Problem: LLM might claim 'candidate has 5 years AWS experience'
   when resume only mentions AWS once in a skills list.

‚úÖ Solution: Extract verifiable facts FIRST, then validate LLM output

üìä Testing Fact Extraction & Verification:

‚úÖ Extracted Facts:
   Skills: {'kubernetes', 'docker', 'ci/cd', 'python', 'aws'}
   Experience: {'python': 5, 'aws': 3}
   Education: ['MASTER in Computer Science']

üîç LLM Verification:
   Trust Score: 66.67%
   Verified Claims: ['Has python skill', 'Has aws skill']
   Hallucinations: [{'type': 'EXAGGERATION', 'claim': '8 years of python', 'evidence': 'Resume states 5 years', 'severity': 'MEDIUM'}]
   Is Trustworthy: False

üéØ Creating fact-grounded training data...
‚úÖ Created 30 HALLUCINATION-FREE training examples

üí° All training examples now use verified facts from resumes!


In [8]:
# Convert to HuggingFace Dataset
dataset = Dataset.from_list(training_examples)

# Train/eval split
dataset = dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = dataset['train']
eval_dataset = dataset['test']

print(f"Dataset splits:")
print(f"  Train: {len(train_dataset)} examples")
print(f"  Eval:  {len(eval_dataset)} examples")

Dataset splits:
  Train: 27 examples
  Eval:  3 examples


## 5. Load Base Model with 4-bit Quantization

Using Llama-3.2-1B for speed (or Mistral-7B for quality)

In [9]:
# Model selection (choose based on VRAM)
# MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"  # Requires HF access token
MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Open alternative
# MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"  # Better quality, needs 16GB+ VRAM

print(f"Loading model: {MODEL_NAME}")
print("This may take a few minutes...\n")

Loading model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
This may take a few minutes...



In [10]:
# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print(f"‚úÖ Model loaded")
print(f"   Model size: {model.get_memory_footprint() / 1024**3:.2f} GB")
print(f"   Vocab size: {len(tokenizer)}")

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

‚úÖ Model loaded
   Model size: 0.70 GB
   Vocab size: 32000


## 6. Setup LoRA Configuration

In [12]:
# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# LoRA config - OPTIMIZED FOR T4 GPU (Reduced rank for memory efficiency)
print("‚öôÔ∏è Configuring LoRA for T4 GPU (reduced rank=8 for memory efficiency)\n")

lora_config = LoraConfig(
    r=8,  # LoRA rank (reduced from 16 for T4 GPU)
    lora_alpha=16,  # Scaling factor (reduced proportionally)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Attention layers
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, lora_config)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print(f"‚úÖ LoRA applied")
print(f"   Trainable params: {trainable_params:,} ({trainable_params/total_params*100:.2f}%)")
print(f"   Total params: {total_params:,}")
print(f"\nüí° Only {trainable_params/total_params*100:.2f}% of model is being fine-tuned!")

‚öôÔ∏è Configuring LoRA for T4 GPU (reduced rank=8 for memory efficiency)

‚úÖ LoRA applied
   Trainable params: 2,252,800 (0.36%)
   Total params: 617,859,072

üí° Only 0.36% of model is being fine-tuned!


## 7. Training Configuration

In [27]:
# Training arguments - OPTIMIZED FOR TESLA T4 (14.74 GB VRAM)
print("‚öôÔ∏è Configuring training for Tesla T4 GPU...")
print("   VRAM: 14.74 GB - Using conservative settings to avoid OOM\n")

# Use SFTConfig instead of TrainingArguments for compatibility with TRL
training_args = SFTConfig(
    output_dir=str(STAGE3_PATH / 'checkpoints'),
    num_train_epochs=3,
    per_device_train_batch_size=1,  # Reduced to 1 for T4 GPU
    per_device_eval_batch_size=1,   # Reduced to 1 for T4 GPU
    gradient_accumulation_steps=8,  # Increased to maintain effective batch size = 8
    learning_rate=2e-4,
    warmup_steps=50,  # Reduced proportionally
    logging_steps=5,
    save_steps=50,
    eval_steps=50,
    eval_strategy="steps",  # Changed from evaluation_strategy for newer transformers
    save_total_limit=2,  # Reduced to save disk space
    bf16=True,  # Use BFloat16 to match quantization config
    optim="paged_adamw_8bit",  # Memory efficient optimizer
    load_best_model_at_end=True,
    report_to="none",  # Change to "wandb" if using W&B
    gradient_checkpointing=True,  # Enable gradient checkpointing for memory savings
    max_grad_norm=0.3,  # Gradient clipping for stability
)

print("Training configuration:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Total steps: ~{len(train_dataset) // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) * training_args.num_train_epochs}")
print(f"  Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")

‚öôÔ∏è Configuring training for Tesla T4 GPU...
   VRAM: 14.74 GB - Using conservative settings to avoid OOM

Training configuration:
  Epochs: 3
  Batch size: 1
  Total steps: ~9
  Gradient accumulation: 8
  Learning rate: 0.0002
  Effective batch size: 8


## 8. Initialize SFTTrainer

In [28]:
# Format messages for training
def formatting_prompts_func(example):
    """Convert messages to text format for training."""
    output_texts = []
    for messages in example["messages"]:
        # Use tokenizer to format chat template
        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=False
        )
        output_texts.append(text)
    return output_texts

# Initialize trainer (compatible with newer TRL API)
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    formatting_func=formatting_prompts_func,
)

print("‚úÖ Trainer initialized")

Applying formatting function to train dataset:   0%|          | 0/27 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/27 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/27 [00:00<?, ? examples/s]

Applying formatting function to eval dataset:   0%|          | 0/3 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/3 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/3 [00:00<?, ? examples/s]

‚úÖ Trainer initialized


## 9. Fine-Tune Model

‚è∞ This will take 2-4 hours depending on GPU

In [30]:
# Clear CUDA cache
if torch.cuda.is_available():
    torch.cuda.empty_cache()

print("Starting fine-tuning...")
print("This will take 2-4 hours. Monitor GPU memory in another terminal with: watch -n 1 nvidia-smi")
print("\n" + "="*80)

# Train!
trainer.train()

print("\n" + "="*80)
print("‚úÖ Fine-tuning complete!")

Starting fine-tuning...
This will take 2-4 hours. Monitor GPU memory in another terminal with: watch -n 1 nvidia-smi



Step,Training Loss,Validation Loss



‚úÖ Fine-tuning complete!


## 10. Save LoRA Adapters

In [31]:
# Save LoRA adapters (only ~50MB!)
lora_path = STAGE3_PATH / 'lora_adapters'
model.save_pretrained(lora_path)
tokenizer.save_pretrained(lora_path)

print(f"‚úÖ LoRA adapters saved to: {lora_path}")

# Check size
import os
size_mb = sum(f.stat().st_size for f in lora_path.rglob('*') if f.is_file()) / 1024**2
print(f"   Size: {size_mb:.1f} MB (vs {model.get_memory_footprint() / 1024**2:.0f} MB for full model)")

‚úÖ LoRA adapters saved to: /content/drive/MyDrive/resume_screening_project/models/stage3_llm_judge/lora_adapters
   Size: 7.8 MB (vs 967 MB for full model)


## 11. Test Inference

In [32]:
# Test inference function
def generate_match_analysis(jd: str, resume: str, model, tokenizer, max_new_tokens=512):
    """
    Generate structured match analysis.
    """
    messages = [
        {"role": "system", "content": "You are an expert technical recruiter. Analyze the job description and resume, then provide a match score (0-100) with detailed justification in JSON format."},
        {"role": "user", "content": f"Job Description:\n{jd[:500]}\n\nResume:\n{resume[:500]}\n\nProvide match analysis in JSON format with keys: match_score, explanation, key_strengths, gaps, recommendation."}
    ]

    # Format
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract JSON (simple approach)
    try:
        json_start = response.find('{')
        json_end = response.rfind('}') + 1
        if json_start != -1 and json_end > json_start:
            json_str = response[json_start:json_end]
            return json.loads(json_str)
    except:
        pass

    return {"raw_response": response}

print("‚úÖ Inference function defined")

‚úÖ Inference function defined


In [33]:
# Test on sample
test_jd = job_descriptions[0]
test_resume = stage2_results[0][0]['resume_text']

print("Testing inference...\n")
print("JD:", test_jd[:200], "...\n")
print("Resume:", test_resume[:200], "...\n")
print("="*80)

result = generate_match_analysis(test_jd, test_resume, model, tokenizer)

print("\nLLM Output:")
print(json.dumps(result, indent=2))

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Testing inference...

JD: 
    Senior Machine Learning Engineer
    
    We are seeking an experienced ML engineer with strong Python skills,
    deep learning expertise (PyTorch/TensorFlow), and production deployment experien ...

Resume: NAME waters fall philadelphia pa 1 PHONE experience franecki schroeder NAME pa data science consultant 102019 present extensive experience statistical modelling techniques experience connecting tablea ...


LLM Output:
{
  "raw_response": "<|system|>\nYou are an expert technical recruiter. Analyze the job description and resume, then provide a match score (0-100) with detailed justification in JSON format. \n<|user|>\nJob Description:\n\n    Senior Machine Learning Engineer\n    \n    We are seeking an experienced ML engineer with strong Python skills,\n    deep learning expertise (PyTorch/TensorFlow), and production deployment experience.\n    Must have 5+ years experience building and deploying ML models at scale.\n    Experience with transformers,

## 12. Batch Scoring

In [34]:
# Score top-10 candidates for each JD
print("Generating LLM explanations for top candidates...\n")

llm_results = []

for jd, candidates in tqdm(zip(job_descriptions, stage2_results), total=len(job_descriptions)):
    jd_results = []

    for cand in candidates[:10]:  # Top 10 only
        analysis = generate_match_analysis(jd, cand['resume_text'], model, tokenizer)

        jd_results.append({
            **cand,
            'llm_analysis': analysis
        })

    llm_results.append(jd_results)

print(f"\n‚úÖ Generated LLM explanations for {len(llm_results)} JDs")

Generating LLM explanations for top candidates...



  0%|          | 0/3 [00:00<?, ?it/s]


‚úÖ Generated LLM explanations for 3 JDs


In [35]:
# Save results
stage3_cache = {
    'job_descriptions': job_descriptions,
    'llm_results': llm_results,
    'model_name': MODEL_NAME,
    'timestamp': pd.Timestamp.now().isoformat(),
}

cache_path = STAGE3_PATH / 'llm_results_cache.pkl'
with open(cache_path, 'wb') as f:
    pickle.dump(stage3_cache, f)

print(f"üíæ Stage 3 results saved to: {cache_path}")

üíæ Stage 3 results saved to: /content/drive/MyDrive/resume_screening_project/models/stage3_llm_judge/llm_results_cache.pkl


## 13. Summary

In [36]:
print("="*80)
print(" " * 20 + "STAGE 3: LLM JUDGE FINE-TUNING COMPLETE")
print("="*80)

print("\nüìä Summary:")
print(f"   - Base model: {MODEL_NAME}")
print(f"   - Training examples: {len(train_dataset)}")
print(f"   - LoRA rank: {lora_config.r}")
print(f"   - Trainable params: {trainable_params/total_params*100:.2f}%")

print("\nüíæ Saved Artifacts:")
print(f"   - LoRA adapters: {lora_path}")
print(f"   - LLM results: {cache_path.name}")
print(f"   - Checkpoints: {STAGE3_PATH / 'checkpoints'}")

print("\nüéØ Key Features:")
print("   ‚úì Explainable AI: Scores with detailed reasoning")
print("   ‚úì Structured outputs: JSON format for parsing")
print("   ‚úì Memory efficient: 4-bit quantization + LoRA")
print("   ‚úì Domain adapted: Fine-tuned on resume screening")

print("\n‚úÖ Ready for Stage 4: Full Pipeline Integration")
print("   üëâ Open: 04_full_pipeline_integration.ipynb")
print("="*80)

                    STAGE 3: LLM JUDGE FINE-TUNING COMPLETE

üìä Summary:
   - Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
   - Training examples: 27
   - LoRA rank: 8
   - Trainable params: 0.36%

üíæ Saved Artifacts:
   - LoRA adapters: /content/drive/MyDrive/resume_screening_project/models/stage3_llm_judge/lora_adapters
   - LLM results: llm_results_cache.pkl
   - Checkpoints: /content/drive/MyDrive/resume_screening_project/models/stage3_llm_judge/checkpoints

üéØ Key Features:
   ‚úì Explainable AI: Scores with detailed reasoning
   ‚úì Structured outputs: JSON format for parsing
   ‚úì Memory efficient: 4-bit quantization + LoRA
   ‚úì Domain adapted: Fine-tuned on resume screening

‚úÖ Ready for Stage 4: Full Pipeline Integration
   üëâ Open: 04_full_pipeline_integration.ipynb
