# **AI Recruiter Assistant 🤖**

A conversational chatbot to pre-screen job offers from recruiters using advanced AI techniques.

## Project Overview
- **Goal**: Automate initial screening of job offers from recruiters
- **Technology**: RAG-first approach with open-source LLM and advanced prompt engineering
- **Interface**: Gradio web application for real-time conversations
- **Timeline**: 1 week structured development
- **Methodology**: Following systematic Generative AI project lifecycle

## 🎯 **AI Project Methodology**

This project follows a **structured 4-stage Generative AI lifecycle** for systematic development and evaluation:

### **Stage 1: Define the Scope** ✅ COMPLETED
- **Problem Identification**: Manual screening of recruiter messages is time-consuming and inconsistent
- **Desired Outcome**: Automated system that analyzes job offers and responds appropriately based on profile match
- **Data Requirements**: CV, job expectations, LinkedIn conversation history for context retrieval
- **Feasibility**: Generative AI is ideal for this conversational task with contextual decision-making

### **Stage 2: Select Models** ✅ COMPLETED
- **Research Models**: Compare 4 open-source models from Hugging Face
- **Benchmark Performance**: Test speed, memory usage, and response quality
- **Model Selection**: Choose optimal model based on performance metrics
- **Cache Management**: Efficient model storage and loading from Google Drive

### **Stage 3: Adapt & Align Model** ✅ COMPLETED
- **RAG Pipeline Implementation**: ✅ Build core retrieval system with FAISS vector database
- **Context-Aware Prompt Engineering**: ✅ Design system prompts for effective context utilization
- **Implementing Guardrails**: ✅ Add safety and formatting checks for reliable output
- **Performance Evaluation**: ✅ Manual and qualitative assessment of retrieval and response quality
- **Prompt Engineering Optimization**: ✅ Refine and improve prompt effectiveness
- **Fine-Tuning (Postponed)**: Deferred to future iterations, focusing on RAG optimization first

### **Stage 4: Application Integration & Deployment** 🔄 CURRENT
- **Gradio Interface Implementation**: ✅ Deploy web application for real-time testing
- **Hugging Face Spaces Deployment**: 📋 Step-by-step deployment to production
- **End-to-end Testing**: 📋 Comprehensive system validation
- **Final Summary & Review**: 📋 Complete project documentation and next steps

## 🏗️ **System Architecture** (RAG-First Approach)

```mermaid
graph LR
    A[📨 Recruiter Message] --> B[🔍 Intent Detection]
    B --> C[📊 RAG Analysis]
    C --> D[🎯 Match Scoring]
    D --> E[🧠 State Management]
    E --> F[💬 Response Generation]
    
    G[📄 CV + Job Expectations] --> H[🔍 Vector Embeddings]
    H --> I[💾 FAISS Database]
    I --> C
    
    J[🧠 Base LLM] --> F
    K[🎯 Prompt Engineering] --> J
    L[🛡️ Guardrails] --> F
```

---

**🚀 Currently working on Stage 4: Gradio Interface Implementation - Production deployment phase**


# **Stage 1: Define the Scope**

## ***PHASE 1*** - Configuration

In [1]:
# Install required packages
!pip install transformers>=4.36.0 torch>=2.0.0 peft>=0.7.0 bitsandbytes>=0.41.0 accelerate>=0.24.0
!pip install langchain>=0.1.0 langchain-community>=0.0.10 faiss-cpu>=1.7.4 sentence-transformers>=2.2.0
!pip install gradio>=4.0.0 pandas>=2.0.0 numpy>=1.24.0 tqdm>=4.65.0 datasets>=2.14.0

# Core imports
import os, json, torch, pandas as pd, numpy as np, time, psutil
from pathlib import Path
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from enum import Enum

# ML/AI imports
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline, GenerationConfig
from peft import LoraConfig, get_peft_model, TaskType

# RAG imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document

print("✅ All dependencies loaded!")
print(f"🔥 CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🎮 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

✅ All dependencies loaded!
🔥 CUDA available: True
🎮 GPU: NVIDIA A100-SXM4-40GB
💾 GPU Memory: 42.5 GB


## ***PHASE 2*** - Data Requirements

In [2]:
from google.colab import drive
drive.mount('/content/drive')

# Set up project paths
project_path = "/content/drive/MyDrive/Colab Notebooks/KEEPCODING/PROJECT/AI_Recruiter_Assistant"
cache_path = f"{project_path}/huggingface_cache"

# Create directories
os.makedirs(cache_path, exist_ok=True)
os.makedirs(f"{project_path}/data", exist_ok=True)

print(f"✅ Google Drive mounted")
print(f"Main paths")
print(f"📁 Project path: {project_path}")
print(f"🗂️ Cache path: {cache_path}")

# Load data files
def load_documents():
    try:
        with open(f'{project_path}/RAG/cv.md', 'r', encoding='utf-8') as f:
            cv_content = f.read()
        with open(f'{project_path}/RAG/job_expectations.md', 'r', encoding='utf-8') as f:
            expectations_content = f.read()

        print(f"Loaded documents")
        print(f"✅ CV loaded: {len(cv_content)} characters")
        print(f"✅ Job expectations loaded: {len(expectations_content)} characters")
        return cv_content, expectations_content
    except FileNotFoundError as e:
        print(f"❌ Error loading documents: {e}")
        return None, None

cv_content, expectations_content = load_documents()


Mounted at /content/drive
✅ Google Drive mounted
Main paths
📁 Project path: /content/drive/MyDrive/Colab Notebooks/KEEPCODING/PROJECT/AI_Recruiter_Assistant
🗂️ Cache path: /content/drive/MyDrive/Colab Notebooks/KEEPCODING/PROJECT/AI_Recruiter_Assistant/huggingface_cache
Loaded documents
✅ CV loaded: 7032 characters
✅ Job expectations loaded: 326 characters


# **Stage 2:** Select Models

## ***PHASE 1*** - check cached models

In [3]:
def check_cached_models(cache_path):
    """Check for cached models"""
    cached_models = []
    if not os.path.exists(cache_path):
        return cached_models

    try:
        items = os.listdir(cache_path)
        for item in items:
            item_path = os.path.join(cache_path, item)
            if os.path.isdir(item_path) and not item.startswith('.'):
                try:
                    contents = os.listdir(item_path)
                    # Simple check: does it contain ANY folder starting with "models--"?
                    has_models_folder = any(f.startswith('models--') for f in contents if os.path.isdir(os.path.join(item_path, f)))
                    if has_models_folder:
                        cached_models.append(item)
                except Exception:
                    continue
    except Exception:
        pass

    return cached_models

def create_model_cache_dir(model_name: str, cache_path: str) -> str:
    """Create clean cache directory for a model"""
    model_folder = model_name.replace('/', ' ')
    model_cache_dir = os.path.join(cache_path, model_folder)
    os.makedirs(model_cache_dir, exist_ok=True)
    return model_cache_dir

def display_cache_status(candidate_models):
    """Display cached models status"""
    print("\n🔍 CACHE STATUS:")
    print("="*80 )
    cached_models = check_cached_models(cache_path)

    if cached_models:
        print(f"✅ Found {len(cached_models)} cached models:")
        for model in cached_models:
            model_name = model.replace(' ', '/')
            print(f"\t⚡ {model_name}")
    else:
        print("📭 No cached models found")

    # Show download vs cache status for candidate models
    print(f"\n⬇️Download vs 💻Cache status:")
    for model_name in candidate_models:
        model_folder_space = model_name.replace('/', ' ')
        if model_folder_space in cached_models:
            print(f"\t⚡ {model_name} - 💻 Will be load from cache")
        else:
            print(f"\t📥 {model_name} - ⬇️ Will be download")

    return cached_models

print("✅ Cache detection functions ready")


✅ Cache detection functions ready


## **PHASE 2** - detect environment configuration, download required models, apply quantizacion, create benchmark process

In [4]:
class CacheAwareModelBenchmark:

    def __init__(self):
        self.candidate_models = [
            "mistralai/Mistral-7B-Instruct-v0.3",
            "meta-llama/Meta-Llama-3-8B-Instruct",
            "microsoft/Phi-3-mini-4k-instruct",
            "google/gemma-3-4b-it"
        ]

        self.model_specs = {
            "mistralai/Mistral-7B-Instruct-v0.3": {"size": "7B", "context_length": "32K"},
            "meta-llama/Meta-Llama-3-8B-Instruct": {"size": "8B", "context_length": "8K"},
            "microsoft/Phi-3-mini-4k-instruct": {"size": "3.8B", "context_length": "4K"},
            "google/gemma-3-4b-it": {"size": "4B", "context_length": "8K", "features": "multimodal"}
        }

        self.results = []

        # Benchmark iteration tracking
        from datetime import datetime
        self.benchmark_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        self.results_folder = f"{project_path}/benchmark_iterations"
        os.makedirs(self.results_folder, exist_ok=True)

        # Capture environment configuration
        self.environment_config = self.detect_environment_config()

        # System prompt for all interactions
        self.system_prompt = """
        You are my highly intelligent personal assistant.
        Your mission is to engage in a continuous role-playing conversation where you will act as me.
        From this moment forward, the user will be playing the role of various recruiters contacting me.
        You must analyze the messages and generate appropriate responses as if you were me talking with a recruiter.
        Do not break character. Do not mention that you are an AI or an assistant in your replies to the recruiters.
        Analyze this job offer and tell me if it matches my profile:
        ***I am a Data and AI Engineer with extensive experience in building robust ELT pipelines and developing cutting-edge Generative AI solutions.
        My key skills include designing systems with Retrieval-Augmented Generation (RAG) and AI Agents using frameworks like Semantic Kernel and LangChain.
        I am proficient in data engineering with PySpark, SQL, and Apache Airflow,
        and highly skilled in cloud platforms, particularly Microsoft Azure (Data Factory, Databricks, AI Services) and GCP (BigQuery, Composer).
        I also possess strong backend development experience using Python and FastAPI to build and deploy services.***
        """

        # 3 different recruiter scenarios to test
        self.test_prompts = [
            {
            "name": "Perfect Match",
            "message": """
            Hi! I hope you are doing well.
            I came across your profile and I am impressed by your background in AI and data engineering.
            I have an exciting opportunity for a Senior Data Engineer position at a fast-growing fintech company.
            The role involves working with Python, cloud technologies, and building ML pipelines.
            The salary range is €60,000-65,000 and it is 100% remote.
            Would you be interested in learning more?
            """
            },
            {
            "name": "Generic Message",
            "message": """
            Hello, are you currently open to new opportunities?
            """
            },
            {
            "name": "Wrong Match",
            "message": """
            We have a Java developer position available at our company.
            5 years experience required, on-site work in London, competitive salary.
            Interested?
            """
            }
        ]

    def detect_environment_config(self):
        """Detect comprehensive environment configuration for Google Colab"""
        import platform
        import sys
        import subprocess
        import pkg_resources
        import re

        config = {
            "platform": {
                "system": platform.system(),
                "release": platform.release(),
                "machine": platform.machine(),
                "processor": platform.processor(),
                "python_version": sys.version.split()[0],
                "python_implementation": platform.python_implementation()
            },
            "hardware": {},
            "software": {},
            "colab_specific": {}
        }

        # GPU Information
        if torch.cuda.is_available():
            gpu_count = torch.cuda.device_count()
            config["hardware"]["gpu"] = {
                "available": True,
                "count": gpu_count,
                "devices": []
            }

            for i in range(gpu_count):
                gpu_props = torch.cuda.get_device_properties(i)
                gpu_info = {
                    "index": i,
                    "name": gpu_props.name,
                    "memory_total_gb": round(gpu_props.total_memory / 1e9, 2),
                    "memory_total_mb": gpu_props.total_memory // (1024 * 1024),
                    "compute_capability": f"{gpu_props.major}.{gpu_props.minor}",
                    "multiprocessor_count": gpu_props.multi_processor_count
                }
                config["hardware"]["gpu"]["devices"].append(gpu_info)

            # Current GPU memory usage
            try:
                config["hardware"]["gpu"]["current_memory_allocated_gb"] = round(torch.cuda.memory_allocated() / 1e9, 2)
                config["hardware"]["gpu"]["current_memory_reserved_gb"] = round(torch.cuda.memory_reserved() / 1e9, 2)
            except:
                pass
        else:
            config["hardware"]["gpu"] = {"available": False}

        # CPU Information
        try:
            cpu_count = os.cpu_count()
            config["hardware"]["cpu"] = {
                "count": cpu_count,
                "architecture": platform.architecture()[0]
            }

            # Try to get more detailed CPU info
            try:
                with open('/proc/cpuinfo', 'r') as f:
                    cpuinfo = f.read()
                    if 'model name' in cpuinfo:
                        cpu_model = cpuinfo.split("model name")[1].split(':')[1].split('\n')[0].strip()
                        config["hardware"]["cpu"]["model"] = cpu_model
            except:
                pass
        except:
            config["hardware"]["cpu"] = {"count": "unknown"}

        # Memory Information
        try:
            import psutil
            memory = psutil.virtual_memory()
            config["hardware"]["memory"] = {
                "total_gb": round(memory.total / 1e9, 2),
                "available_gb": round(memory.available / 1e9, 2),
                "used_gb": round(memory.used / 1e9, 2),
                "percentage_used": memory.percent
            }
        except ImportError:
            # Fallback without psutil
            try:
                with open('/proc/meminfo', 'r') as f:
                    meminfo = f.read()
                    total_match = re.search(r'MemTotal:\s+(\d+)', meminfo)
                    if total_match:
                        total_kb = int(total_match.group(1))
                        config["hardware"]["memory"] = {
                            "total_gb": round(total_kb / 1e6, 2),
                            "source": "proc_meminfo"
                        }
            except:
                config["hardware"]["memory"] = {"total_gb": "unknown"}

        # CUDA Information
        if torch.cuda.is_available():
            config["software"]["cuda"] = {
                "version": torch.version.cuda,
                "cudnn_version": torch.backends.cudnn.version() if torch.backends.cudnn.is_available() else None,
                "cudnn_available": torch.backends.cudnn.is_available()
            }

        # Key Library Versions
        key_libraries = ['torch', 'transformers', 'accelerate', 'bitsandbytes', 'peft', 'numpy', 'pandas']
        config["software"]["libraries"] = {}

        for lib in key_libraries:
            try:
                version = pkg_resources.get_distribution(lib).version
                config["software"]["libraries"][lib] = version
            except:
                config["software"]["libraries"][lib] = "not_found"

        # Google Colab Specific Detection
        try:
            # Check if running in Colab
            import google.colab
            config["colab_specific"]["environment"] = "google_colab"
            config["colab_specific"]["is_colab"] = True

            # Try to get Colab VM info
            try:
                result = subprocess.run(["nvidia-smi", "--query-gpu=name,memory.total", "--format=csv,noheader,nounits"],
                                      capture_output=True, text=True, timeout=10)
                if result.returncode == 0:
                    gpu_info = result.stdout.strip().split(", ")
                    if len(gpu_info) >= 2:
                        config["colab_specific"]["nvidia_smi"] = {
                            "gpu_name": gpu_info[0],
                            "memory_total_mb": gpu_info[1]
                        }
            except:
                pass

            # Detect Colab GPU type from GPU name
            if config["hardware"]["gpu"]["available"]:
                gpu_name = config["hardware"]["gpu"]["devices"][0]["name"].lower()
                if "tesla t4" in gpu_name:
                    config["colab_specific"]["colab_gpu_type"] = "T4"
                elif "tesla k80" in gpu_name:
                    config["colab_specific"]["colab_gpu_type"] = "K80"
                elif "tesla v100" in gpu_name:
                    config["colab_specific"]["colab_gpu_type"] = "V100"
                elif "tesla p4" in gpu_name:
                    config["colab_specific"]["colab_gpu_type"] = "P4"
                elif "tesla p100" in gpu_name:
                    config["colab_specific"]["colab_gpu_type"] = "P100"
                elif "a100" in gpu_name:
                    config["colab_specific"]["colab_gpu_type"] = "A100"
                else:
                    config["colab_specific"]["colab_gpu_type"] = "Unknown"

        except ImportError:
            config["colab_specific"]["is_colab"] = False
            config["colab_specific"]["environment"] = "local_or_other"

        # Disk Space (for cache management)
        try:
            import shutil
            cache_disk_usage = shutil.disk_usage(cache_path)
            config["hardware"]["disk"] = {
                "cache_path_total_gb": round(cache_disk_usage.total / 1e9, 2),
                "cache_path_free_gb": round(cache_disk_usage.free / 1e9, 2),
                "cache_path_used_gb": round((cache_disk_usage.total - cache_disk_usage.free) / 1e9, 2)
            }
        except:
            config["hardware"]["disk"] = {"status": "unable_to_detect"}

        return config

    def display_environment_config(self):

        print("\n🖥️ ENVIRONMENT CONFIGURATION:")
        print("=" * 60)

        """Display current environment configuration"""
        config = self.environment_config
        # GPU Information
        if config["hardware"]["gpu"]["available"]:
            gpu = config['hardware']['gpu']['devices'][0]
            print(f"🎮 GPU: {gpu['name']}")
            print(f"💾 GPU Memory: {gpu['memory_total_gb']} GB ({gpu['memory_total_mb']} MB)")
            print(f"🔧 Compute Capability: {gpu['compute_capability']}")

            if config["colab_specific"]["is_colab"]:
                colab_gpu = config["colab_specific"].get("colab_gpu_type", "Unknown")
                print(f"☁️ Colab GPU Type: {colab_gpu}")
        else:
            print("❌ GPU: Not available")

        # CPU & Memory
        print(f"🧠 CPU: {config['hardware']['cpu']['count']} cores")
        if 'model' in config['hardware']['cpu']:
            print(f"🔍 CPU Model: {config['hardware']['cpu']['model']}")

        if 'memory' in config['hardware']:
            mem = config['hardware']['memory']
            print(f"💿 RAM: {mem['total_gb']} GB total")
            if 'available_gb' in mem:
                print(f"📊 RAM Usage: {mem['used_gb']}/{mem['total_gb']} GB ({mem['percentage_used']:.1f}%)")

        # Software
        print(f"🐍 Python: {config['platform']['python_version']}")
        print(f"🔥 PyTorch: {config['software']['libraries']['torch']}")
        print(f"🤗 Transformers: {config['software']['libraries']['transformers']}")

        if config['hardware']['gpu']['available']:
            print(f"⚡ CUDA: {config['software']['cuda']['version']}")

        # Environment
        env_type = "Google Colab" if config["colab_specific"]["is_colab"] else "Local/Other"
        print(f"🌐 Environment: {env_type}")

    def load_model_with_quantization(self, model_name: str):
        """Load model with cache detection, corruption handling, and optimized configurations"""
        cached_models = check_cached_models(cache_path)
        model_folder_space = model_name.replace('/', ' ')

        is_cached = model_folder_space in cached_models
        cache_corrupted = False  # Flag to track if we need to retry

        if is_cached:
            print(f"⚡ Loading {model_name} from cache...")
            model_cache_dir = os.path.join(cache_path, model_folder_space)
        else:
            print(f"📥 Downloading {model_name} ...")
            model_cache_dir = create_model_cache_dir(model_name, cache_path)

        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=False
        )

        def _attempt_load():
            """Helper function to attempt model loading"""
            tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=model_cache_dir)
            if tokenizer.pad_token is None:
                tokenizer.pad_token = tokenizer.eos_token

            # Model-specific configurations to avoid warnings
            model_kwargs = {
                "quantization_config": bnb_config,
                "device_map": "auto",
                "trust_remote_code": True,
                "cache_dir": model_cache_dir
            }

            # Fix Phi-3 flash-attention warnings
            if "phi-3" in model_name.lower():
                model_kwargs["attn_implementation"] = "eager"
                print(f"   🔧 Using eager attention for Phi-3 (avoiding flash-attention warnings)")

            model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
            return model, tokenizer

        try:
            model, tokenizer = _attempt_load()
            print(f"\t✅ {model_name} loaded successfully!")
            return model, tokenizer

        except Exception as e:
            error_msg = str(e).lower()

            # Check for cache corruption errors
            if any(corruption_keyword in error_msg for corruption_keyword in
                   ['headertoosmall', 'header too small', 'corrupt', 'safetensors',
                    'invalid', 'decode', 'deserializing']):

                if is_cached and not cache_corrupted:
                    print(f"\t🔧 Cache corruption detected: {str(e)}")
                    print(f"\t🗑️ Clearing corrupted cache and retrying download...")

                    # Remove corrupted cache directory
                    import shutil
                    try:
                        if os.path.exists(model_cache_dir):
                            shutil.rmtree(model_cache_dir)
                            print(f"\t✅ Corrupted cache removed: {model_cache_dir}")
                    except Exception as remove_error:
                        print(f"\t⚠️ Could not remove cache: {str(remove_error)}")

                    # Create new cache directory and retry download
                    model_cache_dir = create_model_cache_dir(model_name, cache_path)
                    print(f"\t📥 Re-downloading {model_name}...")
                    cache_corrupted = True  # Mark as corrupted to avoid infinite retry

                    try:
                        model, tokenizer = _attempt_load()
                        print(f"\t✅ {model_name} loaded successfully after cache cleanup!")
                        return model, tokenizer
                    except Exception as retry_error:
                        print(f"\t❌ Failed even after cache cleanup: {str(retry_error)}")
                        raise retry_error
                else:
                    print(f"\t❌ Cache corruption persists or already retried: {str(e)}")
                    raise
            else:
                print(f"\t❌ Error loading {model_name}: {str(e)}")
                raise

    def test_single_prompt(self, model, tokenizer, model_name: str, test_prompt: dict):
        """Test a single prompt and return response"""
        # Combine system prompt with user message
        full_prompt = f"{self.system_prompt}\\n\\nRecruiter: {test_prompt['message']}\\n\\nResponse:"

        try:
            inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True, max_length=512)

            device = next(model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                inference_start = time.time()

                if "phi-3" in model_name.lower():
                    # Fix for Phi-3 DynamicCache issue
                    outputs = model.generate(
                        **inputs,  # This already contains attention_mask
                        max_new_tokens=100,
                        do_sample=True,
                        temperature=0.7,
                        pad_token_id=tokenizer.eos_token_id,
                        use_cache=False  # Disable problematic cache
                    )
                else:
                    outputs = model.generate(
                        **inputs,  # This already contains attention_mask
                        max_new_tokens=100,
                        do_sample=True,
                        temperature=0.7,
                        pad_token_id=tokenizer.eos_token_id
                    )
                inference_time = time.time() - inference_start

            # Decode response and clean up
            response = tokenizer.decode(outputs[0], skip_special_tokens=True)
            # Extract only the response part (remove the prompt)
            response = response.replace(full_prompt, "").strip()

            return {
                "prompt_name": test_prompt["name"],
                "inference_time": inference_time,
                "response": response, # [:300] + "..." if len(response) > 300 else response,
                "status": "success"
            }

        except Exception as e:
            return {
                "prompt_name": test_prompt["name"],
                "status": "failed",
                "error": str(e)
            }

    def benchmark_model(self, model_name: str):
        """Test model with multiple recruiter scenarios"""
        print(f"\n🧪 Testing {model_name}...")

        start_time = time.time()

        try:
            model, tokenizer = self.load_model_with_quantization(model_name)
            load_time = time.time() - start_time

            # Test all prompts
            prompt_results = []
            total_inference_time = 0

            for i, test_prompt in enumerate(self.test_prompts):
                print(f"\tTesting scenario {i+1}/3: {test_prompt['name']}")
                prompt_result = self.test_single_prompt(model, tokenizer, model_name, test_prompt)
                prompt_results.append(prompt_result)
                if prompt_result["status"] == "success":
                    total_inference_time += prompt_result["inference_time"]

            # Calculate memory usage
            if torch.cuda.is_available():
                memory_used = torch.cuda.max_memory_allocated() / 1e9
                torch.cuda.reset_peak_memory_stats()
            else:
                memory_used = 0

            device = next(model.parameters()).device
            avg_inference_time = total_inference_time / len([r for r in prompt_results if r["status"] == "success"])

            result = {
                "model_name": model_name,
                "status": "success",
                "load_time": load_time,
                "avg_inference_time": avg_inference_time,
                "total_inference_time": total_inference_time,
                "memory_gb": memory_used,
                "device": str(device),
                "prompt_results": prompt_results,
                "specifications": self.model_specs.get(model_name, {})
            }

            successful_prompts = len([r for r in prompt_results if r["status"] == "success"])
            print(f"\t✅ Load: {load_time:.1f}s | Avg Inference: {avg_inference_time:.2f}s | Memory: {memory_used:.1f}GB")
            print(f"\t🎯 Successful prompts: {successful_prompts}/3 | Device: {device}")

            # Clean up
            del model, tokenizer
            if torch.cuda.is_available():
                torch.cuda.empty_cache()

            return result

        except Exception as e:
            print(f"\t⚠️ Model failed to load: {str(e)}")
            return {
                "model_name": model_name,
                "status": "failed",
                "error": str(e),
                "specifications": self.model_specs.get(model_name, {})
            }

    def benchmark_all_models(self):

        """Benchmark all models with multiple scenarios"""
        print("🚀 STARTING MULTI-SCENARIO ANALYSIS")
        print("🎯 Testing 3 recruiter scenarios per model")

        for model_name in self.candidate_models:
            print("-" * 60)
            result = self.benchmark_model(model_name)
            self.results.append(result)

        print("\n✅ Multi-scenario benchmark complete!")

    def save_benchmark_iteration(self):
        """Save current benchmark iteration with timestamp"""
        from datetime import datetime

        # Create detailed iteration data
        iteration_data = {
            "metadata": {
                "timestamp": self.benchmark_timestamp,
                "datetime_readable": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
                "total_models": len(self.candidate_models),
                "successful_models": len([r for r in self.results if r.get("status") == "success"]),
                "failed_models": len([r for r in self.results if r.get("status") == "failed"]),
                "test_scenarios": len(self.test_prompts)
            },
            "environment_config": self.environment_config,
            "test_scenarios": self.test_prompts,
            "model_specifications": self.model_specs,
            "detailed_results": self.results,
            "summary": self.generate_benchmark_summary()
        }

        # Save detailed results
        iteration_file = f"{self.results_folder}/benchmark_{self.benchmark_timestamp}.json"
        with open(iteration_file, "w", encoding="utf-8") as f:
            json.dump(iteration_data, f, indent=2, default=str, ensure_ascii=False)

        print(f"\n💾 Benchmark iteration saved: {iteration_file}")

        # Update comparison history
        self.update_comparison_history(iteration_data)

        return iteration_file

    def generate_benchmark_summary(self):
        """Generate concise summary of benchmark results"""
        successful_results = [r for r in self.results if r.get('status') == 'success']

        if not successful_results:
            return {"status": "no_successful_models"}

        # Performance rankings
        by_speed = sorted(successful_results, key=lambda x: x["avg_inference_time"])
        by_memory = sorted(successful_results, key=lambda x: x["memory_gb"])
        by_load_time = sorted(successful_results, key=lambda x: x["load_time"])

        summary = {
            "rankings": {
                "fastest_inference": {
                    "model": by_speed[0]["model_name"],
                    "time_seconds": by_speed[0]["avg_inference_time"]
                },
                "least_memory": {
                    "model": by_memory[0]["model_name"],
                    "memory_gb": by_memory[0]["memory_gb"]
                },
                "fastest_loading": {
                    "model": by_load_time[0]["model_name"],
                    "load_time_seconds": by_load_time[0]["load_time"]
                }
            },
            "overall_stats": {
                "avg_inference_time": sum(r["avg_inference_time"] for r in successful_results) / len(successful_results),
                "avg_memory_usage": sum(r["memory_gb"] for r in successful_results) / len(successful_results),
                "avg_load_time": sum(r["load_time"] for r in successful_results) / len(successful_results),
                "total_successful_prompts": sum(len([p for p in r["prompt_results"] if p["status"] == "success"]) for r in successful_results)
            },
            "model_performance_scores": []
        }

        # Calculate performance scores (lower is better)
        for result in successful_results:
            score = (
                result["avg_inference_time"] * 0.4 +  # 40% weight on inference speed
                result["memory_gb"] * 0.3 +           # 30% weight on memory efficiency
                result["load_time"] / 100 * 0.3       # 30% weight on load time (scaled)
            )

            summary["model_performance_scores"].append({
                "model": result["model_name"],
                "performance_score": round(score, 2),
                "successful_prompts": len([p for p in result["prompt_results"] if p["status"] == "success"])
            })

        # Sort by performance score
        summary["model_performance_scores"].sort(key=lambda x: x["performance_score"])


        return summary

    def update_comparison_history(self, current_iteration):
        """Update master comparison file with historical data"""
        comparison_file = f"{self.results_folder}/benchmark_comparison_history.json"

        # Load existing history
        if os.path.exists(comparison_file):
            with open(comparison_file, 'r', encoding='utf-8') as f:
                history = json.load(f)
        else:
            history = {
                "iterations": [],
                "models_tracked": list(set(self.candidate_models)),
                "created": current_iteration["metadata"]["datetime_readable"]
            }

        # Add current iteration summary
        history["iterations"].append({
            "timestamp": current_iteration["metadata"]["timestamp"],
            "datetime": current_iteration["metadata"]["datetime_readable"],
            "summary": current_iteration["summary"],
            "metadata": current_iteration["metadata"]
        })

        # Keep only last 10 iterations
        history["iterations"] = history["iterations"][-10:]
        history["last_updated"] = current_iteration["metadata"]["datetime_readable"]

        # Save updated history
        with open(comparison_file, "w", encoding="utf-8") as f:
            json.dump(history, f, indent=2, default=str, ensure_ascii=False)

        print(f"📈 Comparison history updated: {comparison_file}")

    def display_detailed_results(self):
        """Display comprehensive benchmark results and save iteration"""
        print("\n\n📊 DETAILED BENCHMARK RESULTS")
        print("=" * 60)

        successful_results = [r for r in self.results if r.get('status') == 'success']
        failed_results = [r for r in self.results if r.get('status') == 'failed']

        if successful_results:
            print(f"\n✅ {len(successful_results)} models tested successfully:")

            # Model comparison table
            print(f"\n\n📈 MODEL PERFORMANCE COMPARISON:")
            print("-" * 80)
            print(f"{'Model':<30} {'Size':<8} {'Load(s)':<8} {'Avg Inf(s)':<10} {'Memory(GB)':<10} {'Features':<12}")
            print("-" * 80)

            for result in successful_results:
                specs = result['specifications']
                features = specs.get('features', 'text-only')
                model_short = result['model_name'].split('/')[-1][:28]

                print(f"{model_short:<30} {specs['size']:<8} {result['load_time']:<8.1f} {result['avg_inference_time']:<10.2f} {result['memory_gb']:<10.1f} {features:<12}")

            # Find best models
            fastest_model = min(successful_results, key=lambda x: x['avg_inference_time'])
            least_memory = min(successful_results, key=lambda x: x['memory_gb'])

            print("-" * 80)
            print(f"🏆 FASTEST: {fastest_model['model_name'].split('/')[-1]} ({fastest_model['avg_inference_time']:.2f}s)")
            print(f"💾 LEAST MEMORY: {least_memory['model_name'].split('/')[-1]} ({least_memory['memory_gb']:.1f}GB)")

            # Performance scoring
            summary = self.generate_benchmark_summary()
            print(f"\n\n🎯 OVERALL PERFORMANCE RANKING:")
            print("-" * 80)
            print("\nSCORE = 40% weight on inference speed + 30% weight on memory efficiency + 30% weight on load time (scaled)\n")
            for i, model_score in enumerate(summary['model_performance_scores'], 1):
                model_short = model_score['model'].split('/')[-1]
                print(f"{i}. {model_short:<35} Score: {model_score['performance_score']:<6} ({model_score['successful_prompts']}/3 prompts)")

            # Detailed responses per scenario
            print("=" * 60)
            print(f"\n\n🎭 RESPONSE QUALITY BY SCENARIO:")

            for i, scenario in enumerate(self.test_prompts):
                print(f"\n\n📝 SCENARIO {i+1}: {scenario['name']}")
                print(f"Recruiter: {scenario['message'][:100]}...")
                print("-" * 60)

                for result in successful_results:
                    if 'prompt_results' in result:
                        prompt_result = result['prompt_results'][i]
                        model_short = result['model_name'].split('/')[-1]

                        if prompt_result['status'] == 'success':
                            print(f"\n🤖 {model_short}:")
                            print(f"\t⚡ Time: {prompt_result['inference_time']:.2f}s")
                            print(f"\t💬 Response:\n<<START>>\n {prompt_result['response']}\n<<END>>")
                        else:
                            print(f"\n❌ {model_short}: {prompt_result.get('error', 'Failed')}")

        if failed_results:
            print(f"\n❌ {len(failed_results)} models failed:")
            for result in failed_results:
                print(f"\t• {result['model_name']}: {result['error']}")

        # Save this iteration
        self.save_benchmark_iteration()

benchmark = CacheAwareModelBenchmark()
print("✅ Multi-scenario benchmark ready!")
print("🎯 3 recruiter scenarios per model")
print(f"📊 Test scenarios: {[p['name'] for p in benchmark.test_prompts]}")


✅ Multi-scenario benchmark ready!
🎯 3 recruiter scenarios per model
📊 Test scenarios: ['Perfect Match', 'Generic Message', 'Wrong Match']


  import pkg_resources


In [5]:
# 🔧 ROBUST FILE SAVING PATCH
# Aplicar patch a las funciones de guardado existentes para manejo robusto de errores

def robust_save_json(file_path, data, description="file"):
    """Helper function for robust JSON file saving with error handling"""
    import shutil
    from datetime import datetime
    import traceback

    try:
        # Ensure directory exists
        os.makedirs(os.path.dirname(file_path), exist_ok=True)

        # Check disk space
        disk_usage = shutil.disk_usage(os.path.dirname(file_path))
        free_gb = disk_usage.free / 1e9

        if free_gb < 0.1:  # Less than 100MB
            print(f"⚠️ Warning: Low disk space ({free_gb:.1f} GB)")

        # Write file with verification
        with open(file_path, 'w', encoding='utf-8') as f:
            json.dump(data, f, indent=2, default=str, ensure_ascii=False)

        # Verify file was created and has content
        if os.path.exists(file_path):
            file_size = os.path.getsize(file_path)
            print(f"✅ {description} saved: {file_path} ({file_size} bytes)")
            return True
        else:
            raise FileNotFoundError(f"File was not created: {file_path}")

    except Exception as error:
        print(f"❌ Error saving {description}: {str(error)}")

        # Try alternative filename with timestamp
        base_name, ext = os.path.splitext(file_path)
        alt_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        alt_file = f"{base_name}_recovery_{alt_timestamp}{ext}"

        try:
            with open(alt_file, 'w', encoding='utf-8') as f:
                json.dump(data, f, indent=2, default=str, ensure_ascii=False)
            print(f"✅ {description} saved to recovery file: {alt_file}")
            return True
        except Exception as alt_error:
            print(f"❌ Recovery save also failed: {str(alt_error)}")
            print(f"📍 Full traceback: {traceback.format_exc()}")
            return False

# Patch the existing save methods with robust error handling
original_save_iteration = benchmark.save_benchmark_iteration
original_update_history = benchmark.update_comparison_history

def patched_save_benchmark_iteration(self):
    """Enhanced save_benchmark_iteration with robust error handling"""
    from datetime import datetime

    try:
        # Create detailed iteration data (same as original)
        iteration_data = {
            "metadata": {
                "timestamp": self.benchmark_timestamp,
                "datetime_readable": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
                "total_models": len(self.candidate_models),
                "successful_models": len([r for r in self.results if r.get("status") == "success"]),
                "failed_models": len([r for r in self.results if r.get("status") == "failed"]),
                "test_scenarios": len(self.test_prompts)
            },
            "environment_config": self.environment_config,
            "test_scenarios": self.test_prompts,
            "model_specifications": self.model_specs,
            "detailed_results": self.results,
            "summary": self.generate_benchmark_summary()
        }

        # Use robust save function
        iteration_file = f"{self.results_folder}/benchmark_{self.benchmark_timestamp}.json"
        success = robust_save_json(iteration_file, iteration_data, "Benchmark iteration")

        if success:
            # Update comparison history
            self.update_comparison_history(iteration_data)
            return iteration_file
        else:
            print(f"❌ Failed to save benchmark iteration")
            return None

    except Exception as e:
        print(f"❌ Critical error in patched_save_benchmark_iteration: {str(e)}")
        return None

def patched_update_comparison_history(self, current_iteration):
    """Enhanced update_comparison_history with robust error handling"""
    comparison_file = f"{self.results_folder}/benchmark_comparison_history.json"

    try:
        # Load existing history with error handling
        if os.path.exists(comparison_file):
            try:
                with open(comparison_file, 'r', encoding='utf-8') as f:
                    history = json.load(f)
                print(f"📚 Loaded existing history with {len(history.get('iterations', []))} iterations")
            except Exception as load_error:
                print(f"⚠️ Error loading history: {str(load_error)}. Creating new history.")
                history = {
                    "iterations": [],
                    "models_tracked": list(set(self.candidate_models)),
                    "created": current_iteration["metadata"]["datetime_readable"]
                }
        else:
            print(f"📝 Creating new history file")
            history = {
                "iterations": [],
                "models_tracked": list(set(self.candidate_models)),
                "created": current_iteration["metadata"]["datetime_readable"]
            }

        # Add current iteration summary
        history["iterations"].append({
            "timestamp": current_iteration["metadata"]["timestamp"],
            "datetime": current_iteration["metadata"]["datetime_readable"],
            "summary": current_iteration["summary"],
            "metadata": current_iteration["metadata"]
        })

        # Keep only last 10 iterations
        history["iterations"] = history["iterations"][-10:]
        history["last_updated"] = current_iteration["metadata"]["datetime_readable"]

        # Use robust save function
        robust_save_json(comparison_file, history, "Comparison history")

    except Exception as e:
        print(f"❌ Critical error in patched_update_comparison_history: {str(e)}")

# Apply patches to the benchmark instance
benchmark.save_benchmark_iteration = patched_save_benchmark_iteration.__get__(benchmark, CacheAwareModelBenchmark)
benchmark.update_comparison_history = patched_update_comparison_history.__get__(benchmark, CacheAwareModelBenchmark)

print("🔧 ROBUST FILE SAVING PATCH APPLIED!")
print("✅ Enhanced error handling for benchmark file operations")
print("💾 Recovery mechanisms activated for failed saves")
print("📊 Disk space monitoring enabled")


🔧 ROBUST FILE SAVING PATCH APPLIED!
✅ Enhanced error handling for benchmark file operations
💾 Recovery mechanisms activated for failed saves
📊 Disk space monitoring enabled


## ***PHASE 3*** - fix issues related with current models

In [6]:
# 🔧 COMPREHENSIVE GEMMA DIVISION BY ZERO FIX
import warnings
import numpy as np

# Global numpy error state configuration
old_err_state = np.seterr(divide='ignore', invalid='ignore', over='ignore', under='ignore')

# Comprehensive warning suppression for numerical issues
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=UserWarning, message=".*torch.utils.checkpoint.*")
warnings.filterwarnings("ignore", category=FutureWarning)

# Monkey patch the test_single_prompt method with comprehensive Gemma fixes
original_test_single_prompt = benchmark.test_single_prompt

def gemma_safe_test_single_prompt(self, model, tokenizer, model_name: str, test_prompt: dict):
    """Comprehensive Gemma-safe version with multiple fallback strategies"""
    full_prompt = f"{self.system_prompt}\n\nRecruiter: {test_prompt['message']}\n\nResponse:"

    try:
        # Comprehensive protection for all models with special handling for Gemma
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            with np.errstate(all='ignore'):  # Ignore all numpy warnings
                return self._safe_generate_response(model, tokenizer, model_name, test_prompt, full_prompt)

    except Exception as e:
        # Enhanced error handling with type detection
        error_msg = str(e).lower()
        if any(keyword in error_msg for keyword in ['division', 'divide', 'zero', 'nan', 'inf']):
            print(f"     🔧 Numerical instability detected, applying fallback...")
            try:
                return self._fallback_generation(model, tokenizer, model_name, test_prompt, full_prompt)
            except Exception as fallback_error:
                return {
                    "prompt_name": test_prompt["name"],
                    "status": "failed",
                    "error": f"Primary: {str(e)}, Fallback: {str(fallback_error)}"
                }
        else:
            return {
                "prompt_name": test_prompt["name"],
                "status": "failed",
                "error": str(e)
            }

def safe_generate_response(self, model, tokenizer, model_name: str, test_prompt: dict, full_prompt: str):
    """Enhanced response generation with model-specific optimizations"""

    # Enhanced tokenization with explicit parameters
    inputs = tokenizer(
        full_prompt,
        return_tensors="pt",
        truncation=True,
        max_length=512,
        padding=True,
        add_special_tokens=True
    )

    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        inference_start = time.time()

        # Enhanced model-specific generation parameters
        if "phi-3" in model_name.lower():
            generation_config = {
                **inputs,
                'max_new_tokens': 100,
                'do_sample': True,
                'temperature': 0.7,
                'pad_token_id': tokenizer.eos_token_id,
                'use_cache': False
            }
        elif "gemma" in model_name.lower():
            # Optimized Gemma configuration (cleaned invalid parameters)
            generation_config = {
                'input_ids': inputs['input_ids'],
                'attention_mask': inputs.get('attention_mask'),
                'max_new_tokens': 80,  # Reduced to avoid memory issues
                'do_sample': False,    # Use greedy decoding for stability
                'pad_token_id': tokenizer.pad_token_id or tokenizer.eos_token_id,
                'eos_token_id': tokenizer.eos_token_id,
                'repetition_penalty': 1.1,
                'use_cache': True     # Enable cache for Gemma
                # Removed: top_p, early_stopping (invalid for Gemma)
            }
        else:
            generation_config = {
                **inputs,
                'max_new_tokens': 100,
                'do_sample': True,
                'temperature': 0.7,
                'pad_token_id': tokenizer.eos_token_id
            }

        # Generate with comprehensive error handling
        outputs = model.generate(**generation_config)
        inference_time = time.time() - inference_start

    # Enhanced response decoding
    try:
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        # Clean up response by removing prompt
        if full_prompt in response:
            response = response.replace(full_prompt, "").strip()
        elif response.startswith(full_prompt[:50]):  # Partial match fallback
            response = response[len(full_prompt):].strip()
    except Exception as decode_error:
        response = f"[Decoding error: {str(decode_error)}]"

    return {
        "prompt_name": test_prompt["name"],
        "inference_time": inference_time,
        "response": response, # [:300] + "..." if len(response) > 300 else response,
        "status": "success"
    }

def fallback_generation(self, model, tokenizer, model_name: str, test_prompt: dict, full_prompt: str):
    """Ultra-conservative fallback generation for problematic models"""
    print(f"\t\t🆘 Using ultra-conservative fallback generation...")

    # Minimal tokenization
    inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True, max_length=256)
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        inference_start = time.time()

        # Ultra-conservative generation
        try:
            outputs = model.generate(
                inputs['input_ids'],
                max_new_tokens=50,     # Very limited output
                do_sample=False,       # Greedy only
                pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id,
                use_cache=False,       # Disable cache
                output_scores=False,   # Disable score computation
                return_dict_in_generate=False
            )
            inference_time = time.time() - inference_start

            response = tokenizer.decode(outputs[0], skip_special_tokens=True)
            response = response.replace(full_prompt, "").strip() or "[Model generated empty response]"

            return {
                "prompt_name": test_prompt["name"],
                "inference_time": inference_time,
                "response": f"[FALLBACK] {response}",
                "status": "success"
            }

        except Exception as e:
            return {
                "prompt_name": test_prompt["name"],
                "status": "failed",
                "error": f"Fallback failed: {str(e)}"
            }

# Apply comprehensive patches
benchmark.test_single_prompt = gemma_safe_test_single_prompt.__get__(benchmark, CacheAwareModelBenchmark)
benchmark._safe_generate_response = safe_generate_response.__get__(benchmark, CacheAwareModelBenchmark)
benchmark._fallback_generation = fallback_generation.__get__(benchmark, CacheAwareModelBenchmark)

## ***PHASE 4*** - Execute benchmarking

In [7]:
print("="*100 )
print("="*41 + " START OF PROCESS " + "="*41 )
print("="*100 )

# Display environment configuration
benchmark.display_environment_config()

# Check cache status
cached_models = display_cache_status(benchmark.candidate_models)

# 🎯 SYSTEM PROMPT
print(f"\n\n⌨️ SYSTEM PROMPT:")
print("=" * 60)
print(benchmark.system_prompt)


# 🎯 TEST SCENARIOS INFO
print(f"\n\n🎯 BENCHMARK SCENARIOS:")
print("=" * 60)

for i, prompt in enumerate(benchmark.test_prompts):
    print(f"  {i+1}. {prompt['name']}: {prompt['message']}")


######### 🚀 RUN COMPREHENSIVE BENCHMARK  #########
run_benchmark = True # False # ✅ Set to True to test

if run_benchmark:
    print("\n🔧 STARTING BENCHMARK")
    print(f"📁 Results will be saved to: {benchmark.results_folder}")
    print(f"🕒 Benchmark timestamp: {benchmark.benchmark_timestamp}")

    benchmark.benchmark_all_models()
    benchmark.display_detailed_results()

    print(f"\n✅ BENCHMARK COMPLETE!")
    print(f"📊 Check {benchmark.results_folder}/ for detailed results and comparison history")
else:
    print("\n⏸️ Set 'run_benchmark = True' above to start the OPTIMIZED benchmark")

print("🎯 Multiple test scenarios: READY")
print("📊 Detailed comparison: READY")
print("🆘 Automatic fallback system: ACTIVE")
print("💾 Your cached models will save significant time!\n")


print("="*100 )
print("="*41 + " END OF PROCESS " + "="*41 )
print("="*100 )


🖥️ ENVIRONMENT CONFIGURATION:
🎮 GPU: NVIDIA A100-SXM4-40GB
💾 GPU Memory: 42.47 GB (40506 MB)
🔧 Compute Capability: 8.0
☁️ Colab GPU Type: A100
🧠 CPU: 12 cores
🔍 CPU Model: Intel(R) Xeon(R) CPU @ 2.20GHz
💿 RAM: 89.63 GB total
📊 RAM Usage: 3.98/89.63 GB (5.5%)
🐍 Python: 3.11.13
🔥 PyTorch: 2.6.0+cu124
🤗 Transformers: 4.53.3
⚡ CUDA: 12.4
🌐 Environment: Google Colab

🔍 CACHE STATUS:
✅ Found 4 cached models:
	⚡ microsoft/Phi-3-mini-4k-instruct
	⚡ google/gemma-3-4b-it
	⚡ meta-llama/Meta-Llama-3-8B-Instruct
	⚡ mistralai/Mistral-7B-Instruct-v0.3

⬇️Download vs 💻Cache status:
	⚡ mistralai/Mistral-7B-Instruct-v0.3 - 💻 Will be load from cache
	⚡ meta-llama/Meta-Llama-3-8B-Instruct - 💻 Will be load from cache
	⚡ microsoft/Phi-3-mini-4k-instruct - 💻 Will be load from cache
	⚡ google/gemma-3-4b-it - 💻 Will be load from cache


⌨️ SYSTEM PROMPT:

        You are my highly intelligent personal assistant.
        Your mission is to engage in a continuous role-playing conversation where you will act as 

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

	✅ mistralai/Mistral-7B-Instruct-v0.3 loaded successfully!
	Testing scenario 1/3: Perfect Match
	Testing scenario 2/3: Generic Message
	Testing scenario 3/3: Wrong Match
	✅ Load: 273.3s | Avg Inference: 5.78s | Memory: 7.2GB
	🎯 Successful prompts: 3/3 | Device: cuda:0
------------------------------------------------------------

🧪 Testing meta-llama/Meta-Llama-3-8B-Instruct...
⚡ Loading meta-llama/Meta-Llama-3-8B-Instruct from cache...


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

	✅ meta-llama/Meta-Llama-3-8B-Instruct loaded successfully!
	Testing scenario 1/3: Perfect Match
	Testing scenario 2/3: Generic Message
	Testing scenario 3/3: Wrong Match
	✅ Load: 277.8s | Avg Inference: 4.60s | Memory: 11.6GB
	🎯 Successful prompts: 3/3 | Device: cuda:0
------------------------------------------------------------

🧪 Testing microsoft/Phi-3-mini-4k-instruct...
⚡ Loading microsoft/Phi-3-mini-4k-instruct from cache...




Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



	✅ microsoft/Phi-3-mini-4k-instruct loaded successfully!
	Testing scenario 1/3: Perfect Match
	Testing scenario 2/3: Generic Message
	Testing scenario 3/3: Wrong Match
	✅ Load: 185.2s | Avg Inference: 5.67s | Memory: 11.1GB
	🎯 Successful prompts: 3/3 | Device: cuda:0
------------------------------------------------------------

🧪 Testing google/gemma-3-4b-it...
⚡ Loading google/gemma-3-4b-it from cache...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


	✅ google/gemma-3-4b-it loaded successfully!
	Testing scenario 1/3: Perfect Match


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


	Testing scenario 2/3: Generic Message


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


	Testing scenario 3/3: Wrong Match
	✅ Load: 201.5s | Avg Inference: 8.00s | Memory: 14.0GB
	🎯 Successful prompts: 3/3 | Device: cuda:0

✅ Multi-scenario benchmark complete!


📊 DETAILED BENCHMARK RESULTS

✅ 4 models tested successfully:


📈 MODEL PERFORMANCE COMPARISON:
--------------------------------------------------------------------------------
Model                          Size     Load(s)  Avg Inf(s) Memory(GB) Features    
--------------------------------------------------------------------------------
Mistral-7B-Instruct-v0.3       7B       273.3    5.78       7.2        text-only   
Meta-Llama-3-8B-Instruct       8B       277.8    4.60       11.6       text-only   
Phi-3-mini-4k-instruct         3.8B     185.2    5.67       11.1       text-only   
gemma-3-4b-it                  4B       201.5    8.00       14.0       multimodal  
--------------------------------------------------------------------------------
🏆 FASTEST: Meta-Llama-3-8B-Instruct (4.60s)
💾 LEAST MEMORY: Mistra

## ***PHASE  5*** - Analyzing results

In [8]:
# 📊 BENCHMARK RESULTS EXPLORER
# Run this cell AFTER BECHMARKING to explore saved iterations

# List all iterations
results_folder = f"{project_path}/benchmark_iterations"
iteration_files = [f for f in os.listdir(results_folder) if f.startswith('benchmark_') and f.endswith('.json')]
iteration_files.sort(reverse=True)  # Most recent first


def explore_benchmark_history():
    """Explore saved benchmark iterations and history"""


    if not os.path.exists(results_folder):
        print("❌ No benchmark iterations found. Run the benchmark first!")
        return

    print(f"📁 BENCHMARK ITERATIONS FOUND: {len(iteration_files)}")
    print("-" * 60)

    if iteration_files:
        for i, filename in enumerate(iteration_files[:5], 1):  # Show last 5
            timestamp = filename.replace('benchmark_', '').replace('.json', '')
            # Parse timestamp
            from datetime import datetime
            try:
                dt = datetime.strptime(timestamp, '%Y%m%d_%H%M%S')
                readable_date = dt.strftime('%Y-%m-%d %H:%M:%S')
                print(f"{i}. {filename} ({readable_date})")
            except:
                print(f"{i}. {filename}")

    # Show comparison history if exists
    history_file = f"{results_folder}/benchmark_comparison_history.json"
    if os.path.exists(history_file):
        print(f"\n📈 COMPARISON HISTORY:")
        with open(history_file, 'r', encoding='utf-8') as f:
            history = json.load(f)

        print(f"Total iterations tracked: {len(history['iterations'])}")
        print(f"Models tracked: {', '.join([m.split('/')[-1] for m in history['models_tracked']])}")
        print(f"Last updated: {history['last_updated']}")

        if history['iterations']:
            latest = history['iterations'][-1]
            if 'summary' in latest and 'model_performance_scores' in latest['summary']:
                print(f"\n🏆 LATEST PERFORMANCE RANKING:")
                for i, score in enumerate(latest['summary']['model_performance_scores'][:3], 1):
                    model_name = score['model'].split('/')[-1]
                    print(f"  {i}. {model_name} (Score: {score['performance_score']})")

    return results_folder, iteration_files

def load_specific_iteration(timestamp_or_index=None):
    """Load a specific benchmark iteration for detailed analysis"""
    results_folder = f"{project_path}/benchmark_iterations"
    iteration_files = [f for f in os.listdir(results_folder) if f.startswith('benchmark_2') and f.endswith('.json')]
    iteration_files.sort(reverse=True)

    if not iteration_files:
        print("❌ No iterations found!")
        return None

    # Determine which file to load
    if timestamp_or_index is None:
        # Load most recent
        target_file = iteration_files[0]
        print(f"📄 Loading most recent iteration: {target_file}")
    elif isinstance(timestamp_or_index, int):
        # Load by index (1-based)
        if 1 <= timestamp_or_index <= len(iteration_files):
            target_file = iteration_files[timestamp_or_index - 1]
            print(f"📄 Loading iteration #{timestamp_or_index}: {target_file}")
        else:
            print(f"❌ Invalid index. Available: 1-{len(iteration_files)}")
            return None
    else:
        # Load by timestamp
        target_file = f"benchmark_{timestamp_or_index}.json"
        if target_file not in iteration_files:
            print(f"❌ Timestamp {timestamp_or_index} not found!")
            return None
        print(f"📄 Loading iteration: {target_file}")

    # Load the iteration data
    with open(f"{results_folder}/{target_file}", 'r', encoding='utf-8') as f:
        iteration_data = json.load(f)


    # Display environment config if available
    if 'environment_config' in iteration_data:
        env = iteration_data['environment_config']
        if env['hardware']['gpu']['available']:
            gpu = env['hardware']['gpu']['devices'][0]
            gpu_type = env["colab_specific"].get("colab_gpu_type", 'Unknown') if env["colab_specific"]["is_colab"] else 'Local'
            print(f"🎮 Environment: {gpu['name']} ({gpu['memory_total_gb']} GB) - {gpu_type}")
        else:
            print(f"🖥️ Environment: CPU-only")

        env_type = "Google Colab" if env["colab_specific"]["is_colab"] else "Local/Other"
        print(f"🌐 Platform: {env_type}")

    return iteration_data

def compare_environments_performance():
    """Compare performance across different environment configurations"""
    results_folder = f"{project_path}/benchmark_iterations"
    history_file = f"{results_folder}/benchmark_comparison_history.json"

    if not os.path.exists(history_file):
        print("❌ No comparison history found. Run benchmarks first!")
        return

    with open(history_file, 'r', encoding='utf-8') as f:
        history = json.load(f)

    if len(history['iterations']) < 2:
        print("❌ Need at least 2 benchmark iterations to compare environments")
        return

    print("🔄 ENVIRONMENT PERFORMANCE COMPARISON")
    print("=" * 60)

    # Group iterations by environment
    env_groups = {}

    for iteration in history['iterations']:
        # Load full iteration data to get environment config
        iteration_file = f"{results_folder}/benchmark_{iteration['timestamp']}.json"
        if os.path.exists(iteration_file):
            with open(iteration_file, 'r', encoding='utf-8') as f:
                full_data = json.load(f)

            if 'environment_config' in full_data:
                env = full_data['environment_config']

                # Create environment signature
                if env['hardware']['gpu']['available']:
                    gpu_name = env['hardware']['gpu']['devices'][0]['name']
                    gpu_memory = env['hardware']['gpu']['devices'][0]['memory_total_gb']
                    colab_type = env["colab_specific"].get("colab_gpu_type", 'Unknown')
                    env_signature = f"{colab_type} ({gpu_name}) - {gpu_memory}GB"
                else:
                    env_signature = "CPU-only"

                if env_signature not in env_groups:
                    env_groups[env_signature] = []

                env_groups[env_signature].append({
                    'timestamp': iteration['timestamp'],
                    'datetime': iteration['datetime'],
                    'summary': iteration['summary']
                })

    # Display comparison
    for env_sig, iterations in env_groups.items():
        print(f"\n🎮 {env_sig}:")
        print(f"\t📊 Iterations: {len(iterations)}")

        if iterations and 'model_performance_scores' in iterations[-1]['summary']:
            latest_scores = iterations[-1]['summary']['model_performance_scores']
            print(f"\t🏆 Best Model: {latest_scores[0]['model'].split('/')[-1]} (Score: {latest_scores[0]['performance_score']})")

            # Calculate average performance across iterations for this environment
            if len(iterations) > 1:
                avg_inference_times = []
                for iter_data in iterations:
                    if 'overall_stats' in iter_data['summary']:
                        avg_inference_times.append(iter_data['summary']['overall_stats']['avg_inference_time'])

                if avg_inference_times:
                    avg_time = sum(avg_inference_times) / len(avg_inference_times)
                    print(f"   ⚡ Avg Inference Time: {avg_time:.2f}s (across {len(avg_inference_times)} runs)")

    # Show environment impact on specific models
    print(f"\n📈 MODEL PERFORMANCE BY ENVIRONMENT:")
    print("-" * 70)

    model_env_performance = {}
    for env_sig, iterations in env_groups.items():
        for iteration in iterations:
            if "model_performance_scores" in iteration["summary"]:
                for model_score in iteration["summary"]["model_performance_scores"]:
                    model_name = model_score["model"].split("/")[-1]
                    if model_name not in model_env_performance:
                        model_env_performance[model_name] = {}

                    if env_sig not in model_env_performance[model_name]:
                        model_env_performance[model_name][env_sig] = []

                    model_env_performance[model_name][env_sig].append(model_score["performance_score"])

    # Display model performance across environments
    for model_name, env_scores in model_env_performance.items():
        print(f"\n🤖 {model_name}:")
        for env_sig, scores in env_scores.items():
            avg_score = sum(scores) / len(scores)
            runs = len(scores)
            print(f"   {env_sig}: {avg_score:.2f} avg score ({runs} run{'s' if runs != 1 else ''})")

# Quick exploration
print("🔍 BENCHMARK EXPLORER READY!")

🔍 BENCHMARK EXPLORER READY!


In [9]:
want_to_explore_benchmark_history = True

if want_to_explore_benchmark_history:
    # Auto-explore if results exist
    results_folder = f"{project_path}/benchmark_iterations"
    if os.path.exists(results_folder):
        explore_benchmark_history()
    else:
        print("❌ No benchmark iterations found. Run the benchmark first!")


📁 BENCHMARK ITERATIONS FOUND: 6
------------------------------------------------------------
1. benchmark_comparison_history.json
2. benchmark_20250728_155841.json (2025-07-28 15:58:41)
3. benchmark_20250725_181432.json (2025-07-25 18:14:32)
4. benchmark_20250725_165345.json (2025-07-25 16:53:45)
5. benchmark_20250725_082023.json (2025-07-25 08:20:23)

📈 COMPARISON HISTORY:
Total iterations tracked: 2
Models tracked: gemma-3-4b-it, Phi-3-mini-4k-instruct, Mistral-7B-Instruct-v0.3, Meta-Llama-3-8B-Instruct
Last updated: 2025-07-28 16:15:32

🏆 LATEST PERFORMANCE RANKING:
  1. Mistral-7B-Instruct-v0.3 (Score: 5.31)
  2. Meta-Llama-3-8B-Instruct (Score: 6.14)
  3. Phi-3-mini-4k-instruct (Score: 6.14)


# **Stage 3: Adapt & Align Model**

In [10]:
# Selected LLM model
selected_text_generator_model = "mistralai/Mistral-7B-Instruct-v0.3"
# selected_text_generator_model = "meta-llama/Meta-Llama-3-8B-Instruct"

# Selected RAG embedding model
embedding_model_name="sentence-transformers/all-MiniLM-L6-v2"
# embedding_model_name="sentence-transformers/all-mpnet-base-v2"

# Selected guardrail model
input_guardrail_model_name = "microsoft/Phi-3-mini-4k-instruct"

# Selected output guardrail model
# output_guardrail_model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
output_guardrail_model_name = "microsoft/Phi-3-mini-4k-instruct"

## ***PHASE 1*** - 🔧 RAG KNOWLEDGE BASE SETUP
Creating vectorized knowledge base from CV and job expectations

In [11]:
class RAGKnowledgeBase:

    def __init__(self, project_path: str):
        self.project_path = project_path
        self.embeddings = None
        self.vectorstore = None
        self.documents = []

        # Text splitter configuration for optimal chunking
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=500,
            chunk_overlap=50,
            length_function=len,
            separators=["\n\n", "\n", ". ", "##"] #, " ", ""]
            # separators=["##"]
        )

    def setup_embeddings(self):
        """Initialize sentence transformer embeddings optimized for recruitment context"""
        print("🤖 Initializing embeddings model...")

        # Using a higher-quality model optimized for semantic similarity and retrieval
        self.embeddings = HuggingFaceEmbeddings(
            model_name=embedding_model_name,
            model_kwargs={"device": "cuda" if torch.cuda.is_available() else "cpu"},
            encode_kwargs={"normalize_embeddings": True}
        )

        print("✅ Embeddings model loaded successfully")
        return self.embeddings

    def load_and_process_documents(self, cv_content: str, expectations_content: str):
        """Load and process CV and job expectations into document chunks"""
        print("📄 Processing documents for RAG...")

        # Create documents with metadata
        documents = [
            Document(
                page_content=cv_content,
                metadata={"source": "cv", "type": "professional_profile"}
            ),
            Document(
                page_content=expectations_content,
                metadata={"source": "job_expectations", "type": "requirements"}
            )
        ]

        # Split documents into chunks
        self.documents = self.text_splitter.split_documents(documents)

        print(f"✅ Created {len(self.documents)} document chunks")
        for doc in self.documents:
            print(f"   📋 {doc.metadata['source']}: {len(doc.page_content)} chars")

        return self.documents

    def create_vectorstore(self):
        """Create FAISS vectorstore from processed documents"""
        if not self.documents:
            raise ValueError("No documents loaded. Call load_and_process_documents first.")

        if not self.embeddings:
            self.setup_embeddings()

        print("🔍 Creating FAISS vectorstore...")

        # Create vectorstore
        self.vectorstore = FAISS.from_documents(
            documents=self.documents,
            embedding=self.embeddings
        )

        print("✅ Vectorstore created successfully")
        return self.vectorstore

    def search_relevant_context(self, query: str, k: int = 3):
        """Search for relevant context given a query"""
        if not self.vectorstore:
            raise ValueError("Vectorstore not created. Call create_vectorstore first.")

        # Perform similarity search
        relevant_docs = self.vectorstore.similarity_search(query, k=k)

        return relevant_docs

    def get_context_string(self, query: str, k: int = 3):
        """Get formatted context string for prompt injection with job_expectations always included"""
        relevant_docs = self.search_relevant_context(query, k)

        context_parts = []

        # ALWAYS include job_expectations first (critical but small)
        job_expectations_included = False
        for doc in relevant_docs:
            source = doc.metadata.get("source", "unknown")
            if source == "job_expectations":
                content = doc.page_content.strip()
                context_parts.append(f"[JOB_EXPECTATIONS - ALWAYS CONSIDER]: {content}")
                job_expectations_included = True
                break

        # If job_expectations wasn't in the retrieved docs, find and add it
        if not job_expectations_included:
            for doc in self.documents:
                if doc.metadata.get("source") == "job_expectations":
                    content = doc.page_content.strip()
                    context_parts.append(f"[JOB_EXPECTATIONS - ALWAYS CONSIDER]: {content}")
                    break

        # Add other relevant documents
        for doc in relevant_docs:
            source = doc.metadata.get("source", "unknown")
            if source != "job_expectations":  # Skip if already added
                content = doc.page_content.strip()
                context_parts.append(f"[{source.upper()}]: {content}")

        return "\n\n".join(context_parts)

# Initialize RAG knowledge base
rag_kb = RAGKnowledgeBase(project_path)

print("🧠 RAG Knowledge Base class initialized!")
print("📚 Ready to process CV and job expectations")


🧠 RAG Knowledge Base class initialized!
📚 Ready to process CV and job expectations


## ***PHASE 2*** -  🚀 SETUP RAG KNOWLEDGE BASE
Load and process documents for retrieval

In [12]:
# Setup embeddings and process documents
print("🔧 Setting up RAG Knowledge Base...")
rag_kb.setup_embeddings()

# Load and process documents
if cv_content and expectations_content:
    rag_kb.load_and_process_documents(cv_content, expectations_content)
    rag_kb.create_vectorstore()

    print("\n🧪 Testing retrieval system...")

    # Test retrieval with sample queries
    test_queries = [
        "What are my technical skills in data engineering?",
        "What technologies do I work with?",
        "What type of job am I looking for?"
    ]

    for query in test_queries:
        print(f"\n❓ Query: {query}")
        context = rag_kb.get_context_string(query, k=2)
        # print(f"📄 Retrieved context (first 200 chars): {context[:200]}...")
        print(f"📄 Retrieved context: \n{context}...")

    print("\n✅ RAG Knowledge Base ready for AI Assistant!")

else:
    print("❌ CV or expectations content missing. Check data loading.")


🔧 Setting up RAG Knowledge Base...
🤖 Initializing embeddings model...


  self.embeddings = HuggingFaceEmbeddings(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Embeddings model loaded successfully
📄 Processing documents for RAG...
✅ Created 24 document chunks
   📋 cv: 193 chars
   📋 cv: 18 chars
   📋 cv: 479 chars
   📋 cv: 239 chars
   📋 cv: 231 chars
   📋 cv: 412 chars
   📋 cv: 217 chars
   📋 cv: 371 chars
   📋 cv: 190 chars
   📋 cv: 451 chars
   📋 cv: 433 chars
   📋 cv: 340 chars
   📋 cv: 216 chars
   📋 cv: 401 chars
   📋 cv: 391 chars
   📋 cv: 348 chars
   📋 cv: 287 chars
   📋 cv: 487 chars
   📋 cv: 40 chars
   📋 cv: 378 chars
   📋 cv: 457 chars
   📋 cv: 156 chars
   📋 cv: 255 chars
   📋 job_expectations: 326 chars
🔍 Creating FAISS vectorstore...
✅ Vectorstore created successfully

🧪 Testing retrieval system...

❓ Query: What are my technical skills in data engineering?
📄 Retrieved context: 
[JOB_EXPECTATIONS - ALWAYS CONSIDER]: ## Salary expectations: Between €60,000 and €65,000 gross per year, with an engineering profile. If the position is leadership, it's best to discuss salary in detail.
## Working arrangements: Preferably 100% remo

## ***PHASE 3*** - Input Guardrail System

Implementing an intelligent guardrail that performs **Intent Detection** to classify recruiter messages as either:
- **Generic messages** → State: "pending_details" → Request more information
- **Concrete job offers** → Pass to RAG system for analysis


In [13]:
from enum import Enum
from dataclasses import dataclass
from typing import Dict, Any, Tuple

class MessageType(Enum):
    GENERIC = "generic"
    CONCRETE_OFFER = "concrete_offer"

class GenericSubType(Enum):
    BASIC_INTRODUCTION = "basic_introduction"
    OPPORTUNITY_INQUIRY = "opportunity_inquiry"

class ConversationState(Enum):
    PENDING_DETAILS = "pending_details"
    ANALYZING = "analyzing"
    PASSED = "passed"
    STAND_BY = "stand_by"
    FINISHED = "finished"

@dataclass
class GuardrailResult:
    message_type: MessageType
    confidence: float
    state: ConversationState
    language: str = "English"
    generic_subtype: GenericSubType = None
    response: str = None
    should_continue_to_rag: bool = False

In [14]:
# 🛡️ INPUT GUARDRAIL WITH NATURAL RESPONSE GENERATION

class InputGuardrailNatural:
    """
    Updated InputGuardrail that generates natural responses using the model
    instead of using fixed templates
    """

    def __init__(self, cache_path: str):
        self.cache_path = cache_path
        self.model_name = input_guardrail_model_name
        self.model = None
        self.tokenizer = None

        # Load prompts from external files
        import sys
        import os
        import importlib
        sys.path.append(f"{project_path}/prompts/app")

        # Reload prompt_loader module to get latest changes
        if 'prompt_loader' in sys.modules:
            importlib.reload(sys.modules['prompt_loader'])

        from prompt_loader import PromptLoader

        prompt_loader = PromptLoader(f"{project_path}/prompts/app")
        input_prompts = prompt_loader.load_input_guardrail_prompts()

        self.classification_prompt = input_prompts["classification_prompt"]
        self.generic_response_template = input_prompts["generic_response_template"]
        self.basic_intro_prompt = input_prompts["basic_intro_prompt"]
        self.opportunity_inquiry_prompt = input_prompts["opportunity_inquiry_prompt"]

    def load_model(self):
        """Load model for fast classification"""
        print(f"\n🛡️ Loading guardrail model: {self.model_name}")

        # Check cache
        cached_models = check_cached_models(self.cache_path)
        model_folder_space = self.model_name.replace('/', ' ')

        if model_folder_space in cached_models:
            print("⚡ Loading from cache...")
            model_cache_dir = os.path.join(self.cache_path, model_folder_space)
        else:
            print("📥 Downloading model...")
            model_cache_dir = create_model_cache_dir(self.model_name, self.cache_path)

        # Quantization for efficiency
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=False
        )

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(
                self.model_name,
                cache_dir=model_cache_dir
            )
            if self.tokenizer.pad_token is None:
                self.tokenizer.pad_token = self.tokenizer.eos_token

            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_name,
                quantization_config=bnb_config,
                device_map="auto",
                trust_remote_code=True,
                cache_dir=model_cache_dir,
                attn_implementation="eager"  # Phi-3 optimization
            )

            print("✅ Guardrail model loaded successfully!")
            return self.model, self.tokenizer

        except Exception as e:
            print(f"❌ Error loading guardrail model: {str(e)}")
            raise

    def classify_message(self, recruiter_message: str) -> Tuple[MessageType, float, str, GenericSubType]:
        """Classify recruiter message and detect language"""

        if not self.model or not self.tokenizer:
            raise ValueError("Guardrail model not loaded. Call load_model() first.")

        # Format classification prompt
        prompt = self.classification_prompt.format(message=recruiter_message.strip())

        try:
            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=1024
            )

            device = next(self.model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=80,
                    do_sample=False,
                    temperature=0.1,
                    pad_token_id=self.tokenizer.eos_token_id,
                    use_cache=False
                )

            # Decode response
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            classification_response = full_response.replace(prompt, "").strip()

            # Parse classification result with new fields
            message_type, confidence, language, generic_subtype = self._parse_classification(classification_response)

            return message_type, confidence, language, generic_subtype

        except Exception as e:
            print(f"⚠️ Classification error: {str(e)}")
            # Fallback: basic keyword detection
            return self._fallback_classification(recruiter_message)

    def _parse_classification(self, response: str) -> Tuple[MessageType, float, str, GenericSubType]:
        """Parse model classification response including language and sub-classification"""
        import re
        response_lower = response.lower()

        # Extract language
        language = "English"  # Default
        language_match = re.search(r'language:\s*([^\n\r]+)', response_lower)
        if language_match:
            language = language_match.group(1).strip().title()

        # Extract classification
        if "concrete_offer" in response_lower or "concrete offer" in response_lower:
            message_type = MessageType.CONCRETE_OFFER
            generic_subtype = None
        elif "generic" in response_lower:
            message_type = MessageType.GENERIC

            # Extract sub-classification for generic messages
            generic_subtype = GenericSubType.OPPORTUNITY_INQUIRY  # Default
            if "basic_introduction" in response_lower or "introduction" in response_lower:
                generic_subtype = GenericSubType.BASIC_INTRODUCTION
            elif "opportunity_inquiry" in response_lower or "opportunity" in response_lower:
                generic_subtype = GenericSubType.OPPORTUNITY_INQUIRY
        else:
            # Fallback based on response content
            if any(keyword in response_lower for keyword in ["specific", "detailed", "role", "position", "salary"]):
                message_type = MessageType.CONCRETE_OFFER
                generic_subtype = None
            else:
                message_type = MessageType.GENERIC
                generic_subtype = GenericSubType.OPPORTUNITY_INQUIRY

        # Extract confidence
        confidence = 0.8  # Default confidence
        try:
            confidence_match = re.search(r'confidence:\s*([0-9.]+)', response_lower)
            if confidence_match:
                confidence = float(confidence_match.group(1))
                confidence = max(0.0, min(1.0, confidence))  # Clamp to [0,1]
        except:
            pass

        return message_type, confidence, language, generic_subtype

    def _fallback_classification(self, message: str) -> Tuple[MessageType, float, str, GenericSubType]:
        """Fallback classification using keyword detection"""
        message_lower = message.lower()

        # Simple language detection (basic heuristics)
        language = "English"  # Default
        if any(word in message_lower for word in ["hola", "gracias", "trabajo", "oportunidad"]):
            language = "Spanish"
        elif any(word in message_lower for word in ["bonjour", "merci", "travail", "opportunité"]):
            language = "French"
        elif any(word in message_lower for word in ["hallo", "danke", "arbeit", "gelegenheit"]):
            language = "German"

        # Keywords indicating concrete offers
        concrete_keywords = [
            "position", "role", "job", "salary", "€", "$", "£", "experience",
            "requirements", "responsibilities", "company", "team", "technologies",
            "remote", "on-site", "hybrid", "years", "senior", "junior", "developer",
            "engineer", "analyst", "manager", "python", "java", "javascript",
            "cloud","aws", "azure", "gcp", "react", "angular", "node", "sql", "nosql",
            "data", "ai"
        ]

        # Keywords indicating generic messages
        generic_keywords = [
            "open to", "opportunities", "connect", "network", "interested in",
            "catch up", "chat", "discuss", "explore", "available", "looking for", "job"
        ]

        # Keywords for basic introductions
        intro_keywords = [
            "my name is", "i am", "hello", "hi", "how are you", "nice to meet",
            "greetings", "good morning", "good afternoon"
        ]

        concrete_score = sum(1 for keyword in concrete_keywords if keyword in message_lower)
        generic_score = sum(1 for keyword in generic_keywords if keyword in message_lower)
        intro_score = sum(1 for keyword in intro_keywords if keyword in message_lower)

        if concrete_score > generic_score and concrete_score >= 2:
            return MessageType.CONCRETE_OFFER, 0.7, language, None
        else:
            # Determine sub-type for generic messages
            if intro_score > 0 or len(message.strip().split()) < 8:  # Very short messages likely introductions
                subtype = GenericSubType.BASIC_INTRODUCTION
            else:
                subtype = GenericSubType.OPPORTUNITY_INQUIRY
            return MessageType.GENERIC, 0.6, language, subtype

    def generate_natural_response(self, message: str, language: str, subtype: GenericSubType) -> str:
        """Generate natural response using the model instead of templates"""
        if not self.model or not self.tokenizer:
            raise ValueError("Guardrail model not loaded. Call load_model() first.")

        # Choose appropriate prompt based on subtype
        if subtype == GenericSubType.BASIC_INTRODUCTION:
            prompt = self.basic_intro_prompt.format(message=message, language=language)
        else:  # OPPORTUNITY_INQUIRY
            prompt = self.opportunity_inquiry_prompt.format(
                message=message,
                language=language,
                generic_template=self.generic_response_template
            )

        try:
            inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1024)
            device = next(self.model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=150,  # Enough for natural responses
                    do_sample=True,
                    temperature=0.7,  # More creative for natural responses
                    top_p=0.9,
                    pad_token_id=self.tokenizer.eos_token_id,
                    use_cache=False
                )

            # Decode and clean response
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            response = full_response.replace(prompt, "").strip()

            return response

        except Exception as e:
            print(f"⚠️ Error generating natural response: {str(e)}")
            # Fallback to basic response
            if subtype == GenericSubType.BASIC_INTRODUCTION:
                return "Hello! Thanks for reaching out. Best regards."
            else:
                return self.generic_response_template

    def process_message(self, recruiter_message: str) -> GuardrailResult:
        """Process recruiter message through the guardrail system"""

        print("🛡️ Processing message through guardrail...")

        # Step 1: Classify message type with language and sub-classification
        message_type, confidence, language, generic_subtype = self.classify_message(recruiter_message)

        print(f"   📊 Classification: {message_type.value} (confidence: {confidence:.2f})")
        print(f"   🌍 Language: {language}")
        if generic_subtype:
            print(f"   🔍 Sub-type: {generic_subtype.value}")

        # Step 2: Handle based on classification
        if message_type == MessageType.GENERIC:
            # Generate natural response using the model
            print(f"   💬 Generating natural response for {generic_subtype.value} in {language}")
            response = self.generate_natural_response(recruiter_message, language, generic_subtype)

            return GuardrailResult(
                message_type=message_type,
                confidence=confidence,
                language=language,
                generic_subtype=generic_subtype,
                state=ConversationState.PENDING_DETAILS,
                response=response,
                should_continue_to_rag=False
            )

        else:  # CONCRETE_OFFER
            # Concrete offer: pass to RAG system
            return GuardrailResult(
                message_type=message_type,
                confidence=confidence,
                language=language,
                generic_subtype=None,
                state=ConversationState.ANALYZING,
                response=None,
                should_continue_to_rag=True
            )

# Initialize Updated Input Guardrail
input_guardrail = InputGuardrailNatural(cache_path=cache_path)

print("🛡️ Updated Input Guardrail system initialized!")
print("📊 Message classification: GENERIC vs CONCRETE_OFFER")
print("🎯 Natural response generation: BASIC_INTRODUCTION (brief) vs OPPORTUNITY_INQUIRY (detailed)")
print("🌍 Language detection and natural responses")
print("⚡ Using Phi-3-mini for classification and response generation")


🛡️ Updated Input Guardrail system initialized!
📊 Message classification: GENERIC vs CONCRETE_OFFER
🎯 Natural response generation: BASIC_INTRODUCTION (brief) vs OPPORTUNITY_INQUIRY (detailed)
🌍 Language detection and natural responses
⚡ Using Phi-3-mini for classification and response generation


## ***PHASE 4*** - 🛡️ OUTPUT GUARDRAIL SYSTEM
Validates and improves response naturalness

In [15]:
from typing import List, Dict, Tuple
import re

class OutputGuardrail:
    """
    Output guardrail that validates response naturalness and ensures first-person perspective.
    """

    def __init__(self, cache_path: str):
        self.cache_path = cache_path
        self.model_name = output_guardrail_model_name #"meta-llama/Meta-Llama-3-8B-Instruct"
        # Alternative models for consideration:
        # self.model_name = "google/gemma-3-4b-it"  # Alternative option 1
        # self.model_name = "microsoft/Phi-3-mini-4k-instruct"  # Alternative option 2 (but already used in input guardrail)

        self.model = None
        self.tokenizer = None

        # Load prompts from external files
        import sys
        import os
        sys.path.append(f"{project_path}/prompts/app")
        from prompt_loader import PromptLoader

        prompt_loader = PromptLoader(f"{project_path}/prompts/app")
        output_prompts = prompt_loader.load_output_guardrail_prompts()

        self.validation_prompt = output_prompts["validation_prompt"]
        self.correction_prompt = output_prompts["correction_prompt"]

    def load_model(self):
        """Load Meta-Llama-3-8B-Instruct model for validation and correction"""
        print(f"\n🛡️ Loading output guardrail model: {self.model_name}")

        # Check cache
        cached_models = check_cached_models(self.cache_path)
        model_folder_space = self.model_name.replace('/', ' ')

        if model_folder_space in cached_models:
            print("⚡ Loading from cache...")
            model_cache_dir = os.path.join(self.cache_path, model_folder_space)
        else:
            print("📥 Downloading model...")
            model_cache_dir = create_model_cache_dir(self.model_name, self.cache_path)

        # Quantization for efficiency
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=False
        )

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(
                self.model_name,
                cache_dir=model_cache_dir
            )
            if self.tokenizer.pad_token is None:
                self.tokenizer.pad_token = self.tokenizer.eos_token

            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_name,
                quantization_config=bnb_config,
                device_map="auto",
                trust_remote_code=True,
                cache_dir=model_cache_dir
            )

            print("✅ Output guardrail model loaded successfully!")
            return self.model, self.tokenizer

        except Exception as e:
            print(f"❌ Error loading output guardrail model: {str(e)}")
            raise

    def validate_response(self, response: str) -> Tuple[bool, List[str]]:
        """Validate response for naturalness and first-person perspective"""

        if not self.model or not self.tokenizer:
            # Fallback to rule-based validation if model not loaded
            return self._rule_based_validation(response)

        prompt = self.validation_prompt.format(response=response.strip())

        try:
            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=1024
            )

            device = next(self.model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=100,
                    do_sample=False,
                    temperature=0.1,
                    pad_token_id=self.tokenizer.eos_token_id
                )

            # Decode and parse validation result
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            validation_response = full_response.replace(prompt, "").strip()

            return self._parse_validation_result(validation_response)

        except Exception as e:
            print(f"⚠️ Validation error: {str(e)}. Using fallback validation.")
            return self._rule_based_validation(response)

    def _rule_based_validation(self, response: str) -> Tuple[bool, List[str]]:
        """Fallback rule-based validation"""
        issues = []

        # Check for third person references
        third_person_patterns = [
            r'\bthe candidate\b',
            r'\bcandidate\'s\b',
            r'\bhis/her\b',
            r'\bhis or her\b',
            r'\bhis\b(?!\s+name)',  # "his" but not "his name"
            r'\bher\b(?!\s+email)', # "her" but not "her email"
        ]

        for pattern in third_person_patterns:
            if re.search(pattern, response, re.IGNORECASE):
                issues.append(f"Uses third person reference: '{pattern}'")

        # Check for placeholders
        placeholder_patterns = [
            r'\[recruiter name\]',
            r'\[name\]',
            r'\[.*?\]',  # Any text in square brackets
        ]

        for pattern in placeholder_patterns:
            if re.search(pattern, response, re.IGNORECASE):
                issues.append(f"Contains placeholder: '{pattern}'")

        return len(issues) == 0, issues

    def _parse_validation_result(self, response: str) -> Tuple[bool, List[str]]:
        """Parse model validation response"""
        response_lower = response.lower()

        # Extract validation status
        is_valid = "validation: pass" in response_lower or "pass" in response_lower

        # Extract issues
        issues = []
        if not is_valid:
            # Try to extract issues section
            issues_match = re.search(r'issues:\s*(.+?)(?=\n|$)', response, re.IGNORECASE | re.DOTALL)
            if issues_match:
                issues_text = issues_match.group(1).strip()
                if issues_text.lower() != "none":
                    # Split by common delimiters
                    issues = [issue.strip() for issue in re.split(r'[,;-]|\n', issues_text) if issue.strip()]

            # Fallback: use rule-based validation
            if not issues:
                _, issues = self._rule_based_validation(response)

        return is_valid, issues

    def correct_response(self, original_response: str, issues: List[str], recruiter_message: str) -> str:
        """Generate corrected response using the model"""

        if not self.model or not self.tokenizer:
            return self._rule_based_correction(original_response)

        issues_text = "; ".join(issues) if issues else "General naturalness improvements needed"

        prompt = self.correction_prompt.format(
            original_response=original_response,
            issues=issues_text,
            recruiter_message=recruiter_message
        )

        try:
            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=1024
            )

            device = next(self.model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=600,
                    do_sample=True,
                    temperature=0.5,
                    top_p=0.9,
                    pad_token_id=self.tokenizer.eos_token_id
                )

            # Decode corrected response
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            corrected_response = full_response.replace(prompt, "").strip()

            return corrected_response

        except Exception as e:
            print(f"⚠️ Correction error: {str(e)}. Using fallback correction.")
            return self._rule_based_correction(original_response)

    def _rule_based_correction(self, response: str) -> str:
        """Fallback rule-based correction"""
        corrected = response.strip()

        # Remove unwanted prefixes that shouldn't appear in final responses
        unwanted_prefixes = [
            r'^RESPONSE:\s*',
            r'^Response:\s*',
            r'^Cristopher:\s*',
            r'^[A-Za-z]+:\s*',  # Any "Name:" pattern at start
        ]

        for prefix in unwanted_prefixes:
            corrected = re.sub(prefix, '', corrected, flags=re.IGNORECASE | re.MULTILINE)

        # Fix third person references
        corrections = [
            (r'\bthe candidate\'s\b', 'my'),
            (r'\bthe candidate\b', 'I'),
            (r'\bcandidate\'s\b', 'my'),
            (r'\bhis/her\b', 'my'),
            (r'\bhis or her\b', 'my'),
        ]

        for pattern, replacement in corrections:
            corrected = re.sub(pattern, replacement, corrected, flags=re.IGNORECASE)

        # Remove ALL types of placeholders more aggressively
        placeholder_patterns = [
            r'\[recruiter name\]',
            r'\[name\]',
            r'\[recruiter\]',
            r'\[[^\]]*\]',  # Any text in brackets
            r'Dear \[.*?\],?',  # "Dear [...]," patterns
            r'Hello \[.*?\],?',  # "Hello [...]," patterns
        ]

        for pattern in placeholder_patterns:
            corrected = re.sub(pattern, '', corrected, flags=re.IGNORECASE)

        # Clean up multiple spaces but preserve line breaks
        corrected = re.sub(r' +', ' ', corrected)

        # Clean up multiple newlines but preserve single line breaks
        corrected = re.sub(r'\n\s*\n\s*\n+', '\n\n', corrected)

        # Cut everything after final signature to remove unwanted instructions/metadata
        # Multiple patterns to catch different signature formats and instruction leaks
        cutoff_patterns = [
            r'(.*?Best regards,\s*Cristopher).*',  # Standard signature
            r'(.*?Cristopher)(?:\s*\n\s*INSTRUCTIONS.*)',  # Instructions leak
            r'(.*?Cristopher)(?:\s*\n\s*IF MATCH SCORE.*)',  # Prompt leak
            r'(.*?Cristopher)(?:\s*\n\s*Generate a natural.*)',  # Generation instruction leak
            r'(.*?,\s*Cristopher).*',  # Any comma + Cristopher format
        ]

        for pattern in cutoff_patterns:
            match = re.search(pattern, corrected, re.DOTALL | re.IGNORECASE)
            if match:
                corrected = match.group(1).strip()
                break

        # Remove leading/trailing whitespace from each line while preserving structure
        lines = corrected.split('\n')
        cleaned_lines = [line.strip() for line in lines if line.strip()]
        corrected = '\n'.join(cleaned_lines)

        return corrected.strip()

    def validate_and_improve_response(self, original_response: str, recruiter_message: str, max_iterations: int = 5) -> str:
        """Main method: validate and iteratively improve response naturalness"""

        print(f"   🔍 Validating response naturalness...")

        current_response = original_response
        iteration = 0
        validation_errors = 0

        while iteration < max_iterations:
            iteration += 1
            print(f"      🔄 Iteration {iteration}/{max_iterations}")

            # Validate current response
            is_valid, issues = self.validate_response(current_response)

            if is_valid:
                print(f"      ✅ Response passed validation on iteration {iteration}")
                return current_response

            print(f"      ⚠️ Issues found: {'; '.join(issues)}")

            # If validation keeps failing due to model errors, use rule-based correction
            if any("DynamicCache" in str(issue) or "get_max_length" in str(issue) for issue in issues):
                validation_errors += 1
                if validation_errors >= 2:
                    print(f"      🔧 Multiple validation errors detected. Using rule-based correction...")
                    return self._rule_based_correction(current_response)

            # Correct the response
            current_response = self.correct_response(current_response, issues, recruiter_message)

        # If we reach here, we've exhausted max_iterations
        print(f"      🚨 Max iterations ({max_iterations}) reached. Using rule-based correction...")

        # Use rule-based correction as final fallback
        final_response = self._rule_based_correction(original_response)

        return final_response

# Initialize Output Guardrail
output_guardrail = OutputGuardrail(cache_path=cache_path)

print("🛡️ Output Guardrail system initialized!")
print("📝 Response validation: First person usage, no placeholders, natural tone")
print("🔄 Iterative improvement: Up to k=5 iterations")
print("🎯 Fallback generation: Guardrail creates corrected version if needed")
print(f"⚡ Using {output_guardrail_model_name} for validation and correction")
print("💡 Alternative models available: google/gemma-3-4b-it, microsoft/Phi-3-mini-4k-instruct")


🛡️ Output Guardrail system initialized!
📝 Response validation: First person usage, no placeholders, natural tone
🔄 Iterative improvement: Up to k=5 iterations
🎯 Fallback generation: Guardrail creates corrected version if needed
⚡ Using microsoft/Phi-3-mini-4k-instruct for validation and correction
💡 Alternative models available: google/gemma-3-4b-it, microsoft/Phi-3-mini-4k-instruct


## ***PHASE 5*** - Define 'AI Assistant' object with Guardrail Integration

Integrating the input guardrail with the RAG system and implementing the complete business logic including match scoring and state management.


In [16]:
class AIRecruiterAssistantNatural:
    """
    AI Assistant that generates natural responses for concrete offers
    instead of using fixed templates
    """

    def __init__(self, model_name: str, rag_knowledge_base: RAGKnowledgeBase,
                 input_guardrail: InputGuardrailNatural, cache_path: str):
        self.model_name = model_name
        self.rag_kb = rag_knowledge_base
        self.input_guardrail = input_guardrail
        self.cache_path = cache_path
        self.model = None
        self.tokenizer = None

        # Load prompts for natural response generation
        import sys
        import os
        import importlib
        sys.path.append(f"{project_path}/prompts/app")

        # Reload prompt_loader module to get latest changes
        if 'prompt_loader' in sys.modules:
            importlib.reload(sys.modules['prompt_loader'])

        from prompt_loader import PromptLoader

        prompt_loader = PromptLoader(f"{project_path}/prompts/app")
        main_prompts = prompt_loader.load_main_generator_prompts()

        self.match_scoring_prompt = main_prompts["match_scoring_prompt"]
        self.natural_response_prompt = main_prompts["natural_response_prompt"]

        # Initialize output guardrail
        self.output_guardrail = OutputGuardrail(cache_path=cache_path)

    def load_models(self):
        """Load both guardrail and main model"""
        print("🔧 Loading models...")

        # Load input guardrail model first
        self.input_guardrail.load_model()

        # Load output guardrail model
        self.output_guardrail.load_model()

        # Load main model using benchmark's logic
        print(f"📥 Loading main model: {self.model_name}")
        benchmark = CacheAwareModelBenchmark()
        self.model, self.tokenizer = benchmark.load_model_with_quantization(self.model_name)

        print("✅ All models loaded successfully!")

    def calculate_match_score(self, recruiter_message: str) -> Dict[str, Any]:
        """Calculate match score between job offer and profile using RAG"""
        if not self.model or not self.tokenizer:
            raise ValueError("Main model not loaded. Call load_models() first.")

        print("📊 Calculating match score...")

        try:
            # Get relevant context from RAG
            context = self.rag_kb.get_context_string(recruiter_message, k=3)

            # Format scoring prompt
            scoring_prompt = self.match_scoring_prompt.format(
                context=context,
                job_offer=recruiter_message
            )

            inputs = self.tokenizer(scoring_prompt, return_tensors="pt", truncation=True, max_length=2048)
            device = next(self.model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=600,
                    do_sample=True,
                    temperature=0.3,
                    top_p=0.9,
                    pad_token_id=self.tokenizer.eos_token_id
                )

            # Decode and parse scoring response
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            scoring_response = full_response.replace(scoring_prompt, "").strip()

            # Parse the scoring response
            match_data = self._parse_match_score(scoring_response)
            match_data["context_used"] = context

            return match_data

        except Exception as e:
            print(f"⚠️ Error calculating match score: {str(e)}")
            # Fallback scoring based on keywords
            return self._fallback_match_scoring(recruiter_message)

    def _parse_match_score(self, response: str) -> Dict[str, Any]:
        """Parse match scoring response from the model"""
        import re

        # Extract overall match score
        match_score = 50  # Default fallback
        match_pattern = re.search(r'MATCH_SCORE:\s*(\d+)', response)
        if match_pattern:
            match_score = int(match_pattern.group(1))
            match_score = max(0, min(100, match_score))  # Clamp to [0,100]

        # Extract component scores
        components = {}
        component_patterns = {
            "technical_skills": r'TECHNICAL_SKILLS:\s*(\d+)\s*-\s*(.+?)(?=\n|$)',
            "role_type": r'ROLE_TYPE:\s*(\d+)\s*-\s*(.+?)(?=\n|$)',
            "salary": r'SALARY:\s*(\d+)\s*-\s*(.+?)(?=\n|$)',
            "work_arrangement": r'WORK_ARRANGEMENT:\s*(\d+)\s*-\s*(.+?)(?=\n|$)',
            "experience": r'EXPERIENCE:\s*(\d+)\s*-\s*(.+?)(?=\n|$)'
        }

        for component, pattern in component_patterns.items():
            match = re.search(pattern, response, re.IGNORECASE)
            if match:
                score = int(match.group(1))
                reason = match.group(2).strip()
                components[component] = {"score": score, "reason": reason}

        # Extract overall reasoning
        reasoning_pattern = re.search(r'OVERALL_REASONING:\s*(.+?)(?=\n\n|$)', response, re.DOTALL)
        overall_reasoning = reasoning_pattern.group(1).strip() if reasoning_pattern else "Good potential fit based on available information."

        return {
            "match_score": match_score,
            "components": components,
            "overall_reasoning": overall_reasoning,
            "raw_response": response
        }

    def _fallback_match_scoring(self, message: str) -> Dict[str, Any]:
        """Fallback match scoring using keyword analysis"""
        message_lower = message.lower()

        # Simple keyword-based scoring
        score = 50  # Base score

        # Technical skills boost
        tech_keywords = ["python", "ai", "data", "engineering", "cloud", "aws", "azure", "gcp", "langchain", "rag"]
        tech_matches = sum(1 for keyword in tech_keywords if keyword in message_lower)
        score += min(tech_matches * 5, 25)

        # Salary analysis
        if any(sal in message_lower for sal in ["60k", "65k", "70k", "€60", "€65", "€70"]):
            score += 15
        elif any(sal in message_lower for sal in ["80k", "90k", "€80", "€90"]):
            score += 10

        # Remote work boost
        if "remote" in message_lower:
            score += 10

        return {
            "match_score": min(score, 100),
            "components": {},
            "overall_reasoning": "Fallback scoring based on keyword analysis",
            "raw_response": "Fallback analysis used"
        }

    def generate_natural_response_for_match(self, match_data: Dict[str, Any], recruiter_message: str, language: str) -> Tuple[str, ConversationState]:
        """Generate natural response based on match score instead of using templates"""
        if not self.model or not self.tokenizer:
            raise ValueError("Main model not loaded. Call load_models() first.")

        match_score = match_data["match_score"]
        print(f"   🎯 Match Score: {match_score}%")

        # Determine state based on score
        if match_score > 80:
            state = ConversationState.PASSED
        elif match_score >= 60:
            state = ConversationState.STAND_BY
        else:
            state = ConversationState.FINISHED

        # Format prompt for natural response generation
        prompt = self.natural_response_prompt.format(
            match_score=match_score,
            match_analysis=match_data["overall_reasoning"],
            recruiter_message=recruiter_message,
            language=language
        )

        try:
            inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
            device = next(self.model.parameters()).device
            inputs = {k: v.to(device) for k, v in inputs.items()}

            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    max_new_tokens=600,  # Increased to prevent text truncation
                    do_sample=True,
                    temperature=0.7,  # Creative but controlled
                    top_p=0.9,
                    pad_token_id=self.tokenizer.eos_token_id
                )

            # Decode and clean response
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            response = full_response.replace(prompt, "").strip()

            return response, state

        except Exception as e:
            print(f"⚠️ Error generating natural response: {str(e)}")
            # Fallback based on score
            if match_score > 80:
                return "Thank you for this opportunity! This looks like an excellent fit for my background. I'd love to discuss this further. When would be a good time for a call?", state
            elif match_score >= 60:
                return "Thanks for reaching out! This opportunity seems interesting. I'd like to review the details more thoroughly. Could you provide some additional information?", state
            else:
                return "Thank you for thinking of me. While this opportunity doesn't align perfectly with my current focus, I appreciate you reaching out.", state

    def process_recruiter_message(self, recruiter_message: str) -> Dict[str, Any]:
        """Complete end-to-end processing of recruiter message"""

        print("🚀 Processing recruiter message through complete system...")
        print("="*60)

        start_time = time.time()

        # Step 1: Input Guardrail Processing - FIXED: correct reference
        guardrail_result = self.input_guardrail.process_message(recruiter_message)

        if not guardrail_result.should_continue_to_rag:
            # Generic message: return guardrail response
            total_time = time.time() - start_time
            return {
                "final_response": guardrail_result.response,
                "state": guardrail_result.state,
                "message_type": guardrail_result.message_type,
                "confidence": guardrail_result.confidence,
                "language": guardrail_result.language,
                "generic_subtype": guardrail_result.generic_subtype.value if guardrail_result.generic_subtype else None,
                "processing_time": total_time,
                "pipeline_stage": "guardrail_only"
            }

        # Step 2: RAG + Match Scoring for concrete offers
        print("🔍 Proceeding to RAG analysis...")
        match_data = self.calculate_match_score(recruiter_message)

        # Step 3: Generate natural response based on score
        final_response, final_state = self.generate_natural_response_for_match(
            match_data, recruiter_message, guardrail_result.language
        )

        # Step 4: Output Guardrail Processing (for concrete offers that got responses)
        print("🛡️ Processing through output guardrail...")
        final_response = self.output_guardrail.validate_and_improve_response(
            final_response, recruiter_message, max_iterations=5
        )

        total_time = time.time() - start_time

        return {
            "final_response": final_response,
            "state": final_state,
            "message_type": guardrail_result.message_type,
            "confidence": guardrail_result.confidence,
            "language": guardrail_result.language,
            "match_score": match_data["match_score"],
            "match_details": match_data,
            "processing_time": total_time,
            "pipeline_stage": "complete_rag_analysis"
        }

print("✅ CORRECTED AIRecruiterAssistantNatural class ready!")
print("🔧 Fixed issues:")
print("   • OutputGuardrail constructor: only takes cache_path")
print("   • Method references: self.input_guardrail instead of self.guardrail")
print("   • Constructor parameters: removed unnecessary output_guardrail param")
print("🚀 Ready to create ai_assistant object!")


✅ CORRECTED AIRecruiterAssistantNatural class ready!
🔧 Fixed issues:
   • OutputGuardrail constructor: only takes cache_path
   • Method references: self.input_guardrail instead of self.guardrail
   • Constructor parameters: removed unnecessary output_guardrail param
🚀 Ready to create ai_assistant object!


## ***PHASE 6*** - 🔧 CREATE AI ASSISTANT OBJECT WITH ALL COMPONENTS
Now that all components are initialized, create the complete AI assistant

In [17]:
print("🔧 Creating corrected AI Assistant...")

try:
    # Ensure we have all required components
    if 'rag_kb' not in globals():
        raise NameError("rag_kb not found. Run RAG setup cells first.")
    if 'input_guardrail' not in globals():
        raise NameError("input_guardrail not found. Run InputGuardrailNatural setup first.")

    # Create AI Assistant with corrected parameters (no output_guardrail parameter)
    ai_assistant = AIRecruiterAssistantNatural(
        model_name=selected_text_generator_model,
        rag_knowledge_base=rag_kb,
        input_guardrail=input_guardrail,
        cache_path=cache_path
    )

    print("✅ AI Assistant created successfully!")
    print("🔧 Corrections applied:")
    print("   • OutputGuardrail constructor: only takes cache_path")
    print("   • Method references: self.input_guardrail instead of self.guardrail")
    print("   • Removed unnecessary output_guardrail parameter from constructor")
    print("   • Output guardrail is now initialized internally")

    print(f"\n🤖 AI ASSISTANT READY!")
    print(f"✅ Model: {ai_assistant.model_name}")
    print(f"✅ Input Guardrail: {ai_assistant.input_guardrail.model_name}")
    print(f"✅ Output Guardrail: {ai_assistant.output_guardrail.model_name}")
    print(f"✅ RAG Knowledge Base: Ready")

    print(f"\n🚀 HOW TO USE:")
    print(f"1. ai_assistant.load_models()  # Load all models")
    print(f"2. test_custom_message('Your message', 'Test Name')  # Test single message")
    print(f"3. run_complete_demo()  # Test all scenarios")

except Exception as e:
    print(f"❌ Error creating AI Assistant: {str(e)}")
    print("💡 Make sure to run all required setup cells first")
    print("   • RAG Knowledge Base setup")
    print("   • InputGuardrailNatural setup")
    print("   • Model configuration")


🔧 Creating corrected AI Assistant...
✅ AI Assistant created successfully!
🔧 Corrections applied:
   • OutputGuardrail constructor: only takes cache_path
   • Method references: self.input_guardrail instead of self.guardrail
   • Removed unnecessary output_guardrail parameter from constructor
   • Output guardrail is now initialized internally

🤖 AI ASSISTANT READY!
✅ Model: mistralai/Mistral-7B-Instruct-v0.3
✅ Input Guardrail: microsoft/Phi-3-mini-4k-instruct
✅ Output Guardrail: microsoft/Phi-3-mini-4k-instruct
✅ RAG Knowledge Base: Ready

🚀 HOW TO USE:
1. ai_assistant.load_models()  # Load all models
2. test_custom_message('Your message', 'Test Name')  # Test single message
3. run_complete_demo()  # Test all scenarios


In [18]:
# 📂 PROMPTS ORGANIZATION - STAGE 3 IMPROVEMENT
print("="*80)
print("📂 ENHANCED PROMPT ORGANIZATION - EXTERNAL FILE STRUCTURE")
print("="*80)

print("\n🎯 PROMPT ORGANIZATION COMPLETED!")
print("✅ All prompts moved from hardcoded to external file structure")
print("✅ Following the complete process flow order:")

print("\n📁 prompts/app/ structure:")
print("   📂 01_input_guardrail/")
print("      📄 classification_prompt.txt")
print("      📄 generic_response_template.txt")
print("   📂 02_main_generator/")
print("      📄 match_scoring_prompt.txt")
print("      📂 response_templates/")
print("         📄 passed_template.txt")
print("         📄 stand_by_template.txt")
print("         📄 finished_template.txt")
print("   📂 03_output_guardrail/")
print("      📄 validation_prompt.txt")
print("      📄 correction_prompt.txt")
print("   📄 prompt_loader.py")

print("\n🔧 MODIFIED CLASSES:")
print("✅ InputGuardrail - Now loads prompts from 01_input_guardrail/")
print("✅ OutputGuardrail - Now loads prompts from 03_output_guardrail/")
print("✅ AIRecruiterAssistant - Now loads prompts from 02_main_generator/")

print("\n🧪 KEPT IN CODE (for testing only):")
print("✅ benchmark.system_prompt - Used for model benchmarking tests")
print("✅ benchmark.test_prompts - Used for testing scenarios")

print("\n💡 BENEFITS:")
print("• 🎯 Clear separation of concerns - prompts organized by process flow")
print("• 📝 Easy prompt editing without code changes")
print("• 🔄 Version control for prompt improvements")
print("• 🚀 Faster iteration on prompt engineering")
print("• 🧹 Cleaner, more maintainable code")

print("\n" + "="*80)
print("🎉 STAGE 3 PROMPT ENGINEERING OPTIMIZATION: COMPLETE!")
print("="*80)


📂 ENHANCED PROMPT ORGANIZATION - EXTERNAL FILE STRUCTURE

🎯 PROMPT ORGANIZATION COMPLETED!
✅ All prompts moved from hardcoded to external file structure
✅ Following the complete process flow order:

📁 prompts/app/ structure:
   📂 01_input_guardrail/
      📄 classification_prompt.txt
      📄 generic_response_template.txt
   📂 02_main_generator/
      📄 match_scoring_prompt.txt
      📂 response_templates/
         📄 passed_template.txt
         📄 stand_by_template.txt
         📄 finished_template.txt
   📂 03_output_guardrail/
      📄 validation_prompt.txt
      📄 correction_prompt.txt
   📄 prompt_loader.py

🔧 MODIFIED CLASSES:
✅ InputGuardrail - Now loads prompts from 01_input_guardrail/
✅ OutputGuardrail - Now loads prompts from 03_output_guardrail/
✅ AIRecruiterAssistant - Now loads prompts from 02_main_generator/

🧪 KEPT IN CODE (for testing only):
✅ benchmark.system_prompt - Used for model benchmarking tests
✅ benchmark.test_prompts - Used for testing scenarios

💡 BENEFITS:
• 🎯 Clear

In [19]:
# 🔧 VERIFY AI ASSISTANT OBJECT
# Confirm that ai_assistant object was created correctly

print("🔍 VERIFYING AI ASSISTANT OBJECT...")

try:
    # Check if ai_assistant exists and has all required components
    assert hasattr(ai_assistant, 'guardrail'), "Input guardrail missing"
    assert hasattr(ai_assistant, 'output_guardrail'), "Output guardrail missing"
    assert hasattr(ai_assistant, 'rag_kb'), "RAG knowledge base missing"
    assert hasattr(ai_assistant, 'model_name'), "Model name missing"

    print("✅ AI Assistant object verified successfully!")
    print(f"✅ Model: {ai_assistant.model_name}")
    print(f"✅ Input Guardrail: {ai_assistant.guardrail.model_name}")
    print(f"✅ Output Guardrail: {ai_assistant.output_guardrail.model_name}")
    print(f"✅ RAG Embeddings: {embedding_model_name}")

    print(f"\n🚀 AI ASSISTANT READY FOR USE!")
    print(f"• Call ai_assistant.load_models() to load all models")
    print(f"• Call test_custom_message() to test individual messages")
    print(f"• Call run_complete_demo() to test all scenarios")

except NameError:
    print("❌ ai_assistant object not found - check previous cells")
except AssertionError as e:
    print(f"❌ ai_assistant object incomplete: {str(e)}")
except Exception as e:
    print(f"❌ Error verifying ai_assistant: {str(e)}")


🔍 VERIFYING AI ASSISTANT OBJECT...
❌ ai_assistant object incomplete: Input guardrail missing


In [20]:
# 📋 COMPLETE SYSTEM SUMMARY
print("="*80)
print("✅ ENHANCED AI RECRUITER ASSISTANT - COMPLETE PIPELINE READY!")
print("="*80)

print("\n🛡️ INPUT GUARDRAIL (Phi-3-mini-4k-instruct):")
print("   • Generic vs Concrete message classification")
print("   • Automatic response for generic messages")
print("   • Pass concrete offers to RAG analysis")

print("\n🧠 MAIN GENERATOR (Mistral-7B-Instruct-v0.3):")
print("   • RAG-powered match scoring analysis")
print("   • State-based decision making (passed/stand_by/finished)")
print("   • Context-aware professional response generation")

print("\n🛡️ OUTPUT GUARDRAIL (Meta-Llama-3-8B-Instruct):")
print("   • Validates response naturalness (first person usage)")
print("   • Removes placeholders like '[recruiter name]'")
print("   • Iterative improvement (up to k=5 iterations)")
print("   • Fallback correction if max iterations reached")

print("\n📊 BUSINESS LOGIC IMPLEMENTED:")
print("   • Generic messages → 'pending_details' → Request more information")
print("   • High match (>80%) → 'passed' → Schedule call")
print("   • Medium match (60-80%) → 'stand_by' → Manual review")
print("   • Low match (<60%) → 'finished' → Polite decline")

print("\n🔧 ALTERNATIVE MODELS AVAILABLE:")
print("   • Output Guardrail alternatives: google/gemma-3-4b-it, microsoft/Phi-3-mini-4k-instruct")
print("   • All models cached for faster loading")

print("\n🚀 SYSTEM READY FOR TESTING!")
print("="*80)


✅ ENHANCED AI RECRUITER ASSISTANT - COMPLETE PIPELINE READY!

🛡️ INPUT GUARDRAIL (Phi-3-mini-4k-instruct):
   • Generic vs Concrete message classification
   • Automatic response for generic messages
   • Pass concrete offers to RAG analysis

🧠 MAIN GENERATOR (Mistral-7B-Instruct-v0.3):
   • RAG-powered match scoring analysis
   • State-based decision making (passed/stand_by/finished)
   • Context-aware professional response generation

🛡️ OUTPUT GUARDRAIL (Meta-Llama-3-8B-Instruct):
   • Validates response naturalness (first person usage)
   • Removes placeholders like '[recruiter name]'
   • Iterative improvement (up to k=5 iterations)
   • Fallback correction if max iterations reached

📊 BUSINESS LOGIC IMPLEMENTED:
   • Generic messages → 'pending_details' → Request more information
   • High match (>80%) → 'passed' → Schedule call
   • Medium match (60-80%) → 'stand_by' → Manual review
   • Low match (<60%) → 'finished' → Polite decline

🔧 ALTERNATIVE MODELS AVAILABLE:
   • Output 

## ***PHASE 7*** - Complete System Testing

Testing the full pipeline with different recruiter message scenarios to validate the guardrail and RAG integration.


### 🧪 COMPREHENSIVE TESTING SUITE
Testing all scenarios: Generic messages, High/Medium/Low match concrete offers

In [21]:
def run_complete_demo():
    """Complete demonstration of the AI assistant with all business logic scenarios"""

    print("="*80)
    print("🚀 AI RECRUITER ASSISTANT - COMPLETE BUSINESS LOGIC DEMO")
    print("="*80)

    # Load all models
    print("\n📥 STEP 1: Loading models...")
    print("="*80)
    ai_assistant.load_models()

    # Test scenarios covering all business logic paths
    test_scenarios = [
        {
            "name": "Generic Networking Message",
            "message": """
            Hi Cristopher,

            I hope you're doing well! I came across your profile and was impressed by your background.

            Are you currently open to new opportunities? I'd love to connect and discuss some exciting possibilities.

            Best regards,
            Sarah
            """,
            "expected_outcome": "Should be classified as GENERIC → State: pending_details"
        },

        {
            "name": "High Match Concrete Offer",
            "message": """
            Hi Cristopher,

            We have an exciting Senior Data Engineer position at our AI-focused fintech startup.

            Role details:
            - Building ETL/ELT pipelines with Python and Apache Airflow
            - Developing RAG systems using Semantic Kernel
            - Working with cloud platforms (Azure/GCP)
            - 100% remote work
            - Salary: €60,000-65,000 gross
            - Team: 15 engineers, very collaborative culture

            Would you be interested in discussing this opportunity?

            Best regards,
            Maria Rodriguez
            Technical Recruiter
            """,
            "expected_outcome": "Should be classified as CONCRETE_OFFER → High match score (>80%) → State: passed"
        },

        {
            "name": "Medium Match Concrete Offer",
            "message": """
            Hello,

            We're looking for a Data Scientist for our e-commerce platform.

            Requirements:
            - 3+ years experience with Python and machine learning
            - Experience with recommendation systems
            - SQL and data analysis skills
            - Hybrid work (2 days office in Madrid)
            - Salary: €75,000-85,000

            The role involves building ML models for customer behavior prediction.

            Interested?

            Thanks,
            Roberto
            """,
            "expected_outcome": "Should be classified as CONCRETE_OFFER → Medium match score (60-80%) → State: stand_by"
        },

        {
            "name": "Low Match Concrete Offer",
            "message": """
            Hi,

            We have a Java Backend Developer position available.

            Requirements:
            - 5+ years Java/Spring Boot experience
            - Microservices architecture
            - On-site work in London
            - Banking domain experience preferred
            - Competitive salary

            Let me know if you're interested.

            Best,
            John Smith
            """,
            "expected_outcome": "Should be classified as CONCRETE_OFFER → Low match score (<60%) → State: finished"
        }
    ]

    print(f"\n🧪 STEP 2: Testing {len(test_scenarios)} scenarios...")
    print("="*80)

    results = []

    for i, scenario in enumerate(test_scenarios, 1):
        print(f"\n📨 SCENARIO #{i}: {scenario['name']}")
        print("="*60)
        print(f"Expected: {scenario['expected_outcome']}")
        print("\nMessage:")
        print(scenario['message'].strip())

        print(f"\n🤖 PROCESSING...")
        print("-"*40)

        # Process through complete system
        try:
            result = ai_assistant.process_recruiter_message(scenario['message'])
            results.append({**result, "scenario_name": scenario['name']})

            # Display results
            print(f"\n💬 FINAL RESPONSE:")
            print("<<START>>")
            print(result['final_response'])
            print("<<END>>")

            print(f"\n📊 ANALYSIS SUMMARY:")
            print(f"   🛡️ Message Type: {result['message_type'].value}")
            print(f"   🎯 Final State: {result['state'].value}")
            print(f"   📈 Confidence: {result['confidence']:.2f}")
            if 'match_score' in result:
                print(f"   🏆 Match Score: {result['match_score']}%")
            print(f"   ⚡ Processing Time: {result['processing_time']:.2f}s")
            print(f"   🔧 Pipeline Stage: {result['pipeline_stage']}")

            # Show detailed match analysis for concrete offers
            if result['pipeline_stage'] == 'complete_rag_analysis' and 'match_details' in result:
                match_details = result['match_details']
                if 'components' in match_details and match_details['components']:
                    print(f"\n📋 DETAILED MATCH BREAKDOWN:")
                    for component, data in match_details['components'].items():
                        component_name = component.replace('_', ' ').title()
                        score = data.get('score', 'N/A')
                        reason = data.get('reason', 'No details')
                        print(f"   • {component_name}: {score}% - {reason}")

        except Exception as e:
            print(f"❌ Error processing scenario: {str(e)}")
            results.append({
                "scenario_name": scenario['name'],
                "error": str(e),
                "final_response": f"Error: {str(e)}",
                "state": "error"
            })

        print("\n" + "="*60)

    # Summary
    print(f"\n✅ TESTING COMPLETE!")
    print("="*80)

    successful_tests = len([r for r in results if 'error' not in r])
    print(f"📊 Results: {successful_tests}/{len(test_scenarios)} scenarios processed successfully")

    # State distribution
    states = {}
    for result in results:
        if 'error' not in result:
            state = result['state'].value if hasattr(result['state'], 'value') else str(result['state'])
            states[state] = states.get(state, 0) + 1

    print(f"🎯 State Distribution:")
    for state, count in states.items():
        print(f"   • {state}: {count} scenario(s)")

    print(f"\n🧹 STEP 3: Cleaning up GPU memory...")
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        print("✅ GPU memory cleared")

    return results

def test_custom_message(message: str, scenario_name: str = "Custom Test"):
    """Test a single message through the complete system with detailed analysis"""

    print(f"🧪 TESTING: {scenario_name}")
    print("="*60)
    print("Message:")
    print(message.strip())

    print("\n🔧 Processing through complete system...")

    try:
        result = ai_assistant.process_recruiter_message(message)

        print(f"\n💬 RESPONSE:")
        print("<<START>>")
        print(result['final_response'])
        print("<<END>>")

        print(f"\n📊 DETAILED ANALYSIS:")
        print(f"   🛡️ Message Classification: {result['message_type'].value}")
        print(f"   🎯 Final State: {result['state'].value}")
        print(f"   📈 Classification Confidence: {result['confidence']:.2f}")

        if 'match_score' in result:
            print(f"   🏆 Match Score: {result['match_score']}%")
            print(f"   📋 Match Reasoning: {result['match_details'].get('overall_reasoning', 'N/A')}")

        print(f"   ⚡ Total Processing Time: {result['processing_time']:.2f}s")
        print(f"   🔧 Pipeline Stage: {result['pipeline_stage']}")

        return result

    except Exception as e:
        print(f"❌ Error: {str(e)}")
        return {"error": str(e)}

print("🧪 Complete Testing Suite Ready!")
print("\n🔧 Available functions:")
print("   • run_complete_demo() - Complete demo with all 4 scenarios")
print("   • test_custom_message(message, name) - Test specific message")
print("\n💡 Example usage:")
print("   run_complete_demo()")
print("   test_custom_message('Your message here', 'My Test')")


🧪 Complete Testing Suite Ready!

🔧 Available functions:
   • run_complete_demo() - Complete demo with all 4 scenarios
   • test_custom_message(message, name) - Test specific message

💡 Example usage:
   run_complete_demo()
   test_custom_message('Your message here', 'My Test')


### 🚀 EXECUTE COMPLETE ENHANCED SYSTEM DEMO
Run the full demonstration of the guardrail + RAG system

In [22]:
print("🎯 AI RECRUITER ASSISTANT WITH GUARDRAIL")
print("="*80)

print("\n🛡️ SYSTEM COMPONENTS:")
print("✅ Input Guardrail: Generic vs Concrete classification")
print("✅ RAG Knowledge Base: CV + Job expectations vector store")
print("✅ Match Scoring System: Detailed 5-criteria evaluation")
print("✅ State Management: pending_details, analyzing, passed, stand_by, finished")
print("✅ Response Generation: Context-aware, state-specific templates")

print("\n📊 BUSINESS LOGIC IMPLEMENTED:")
print("• Generic messages → State: 'pending_details' → Request details")
print("• Concrete offers → RAG analysis → Match scoring → State-based response")
print("• High match (>80%) → State: 'passed' → Schedule call")
print("• Medium match (60-80%) → State: 'stand_by' → Manual review")
print("• Low match (<60%) → State: 'finished' → Polite decline")

print("\n🎭 TEST SCENARIOS:")
print("1. Generic networking message")
print("2. High match concrete offer (Data Engineer, Python, Remote, €60-65k)")
print("3. Medium match concrete offer (Data Scientist, different domain)")
print("4. Low match concrete offer (Java Developer, on-site)")

# Set to True to run the complete demo
RUN_COMPLETE_DEMO = False # True #

if RUN_COMPLETE_DEMO:
    print(f"\n🚀 STARTING COMPLETE SYSTEM DEMO...\n")

    try:
        results = run_complete_demo()

        print(f"\n🎉 DEMO COMPLETED SUCCESSFULLY!")
        print("="*80)
        print("✅ All business logic scenarios tested")
        print("✅ Guardrail integration working correctly")
        print("✅ RAG pipeline functioning properly")
        print("✅ Match scoring system operational")
        print("✅ State management implemented")
        print("\n🚀 Ready for Stage 4: Application Integration!")

    except Exception as e:
        print(f"❌ Demo failed with error: {str(e)}")
        print(f"💡 Check models are loaded and try again")

else:
    print(f"\n⏸️ Set RUN_COMPLETE_DEMO = True to start the demo")
    print(f"Or use: run_complete_demo()")

print("\n💡 For custom testing:")
print("test_custom_message('Your recruiter message here', 'Custom Test')")


🎯 AI RECRUITER ASSISTANT WITH GUARDRAIL

🛡️ SYSTEM COMPONENTS:
✅ Input Guardrail: Generic vs Concrete classification
✅ RAG Knowledge Base: CV + Job expectations vector store
✅ Match Scoring System: Detailed 5-criteria evaluation
✅ State Management: pending_details, analyzing, passed, stand_by, finished
✅ Response Generation: Context-aware, state-specific templates

📊 BUSINESS LOGIC IMPLEMENTED:
• Generic messages → State: 'pending_details' → Request details
• Concrete offers → RAG analysis → Match scoring → State-based response
• High match (>80%) → State: 'passed' → Schedule call
• Medium match (60-80%) → State: 'stand_by' → Manual review
• Low match (<60%) → State: 'finished' → Polite decline

🎭 TEST SCENARIOS:
1. Generic networking message
2. High match concrete offer (Data Engineer, Python, Remote, €60-65k)
3. Medium match concrete offer (Data Scientist, different domain)
4. Low match concrete offer (Java Developer, on-site)

⏸️ Set RUN_COMPLETE_DEMO = True to start the demo
Or use

### 🚀 TEST COMPLETE SYSTEM WITH OUTPUT GUARDRAIL
Test the same message that previously had problems to verify the output guardrail fixes them


In [23]:
RUN_TEST_CUSTOM_MESSAGE = False

In [24]:
if RUN_TEST_CUSTOM_MESSAGE:
    print("🧪 TESTING COMPLETE SYSTEM WITH OUTPUT GUARDRAIL")
    print("="*80)

    # Test the message that previously generated "the candidate's" instead of "my"
    test_message = """
    Exciting REMOTE GenAI Opportunity – Long-Term Contract with Virtusa
    Hi Cristopher,

    I hope you're doing well!

    I'm reaching out regarding an exciting opportunity for a Spanish-speaking GenAI Engineer with GCP on a long-term B2B contract with Virtusa. This is a REMOTE role and it's an urgent requirement.

    We're specifically looking for someone with experience in:

    Dialogflow CX and GCP
    Contact Center AI (CCAI)
    Visual flow design
    Native GCP integration (BigQuery, Cloud Functions, etc.)
    Fluent Spanish – This is a must-have

    If this sounds like a good fit or if you know someone in your network who might be interested, I'd love to connect and share more details.

    Looking forward to hearing from you!!

    Best regards,
    Priyanka
    """

    print("📨 TEST MESSAGE:")
    print(test_message.strip())
    print("\n" + "="*60)

    print("🚀 PROCESSING WITH OUTPUT GUARDRAIL...")
    print("Expected: Should generate natural first-person response without 'the candidate' references")
    print("\n🔧 Processing...")

    try:
        # This will now go through the complete pipeline including output guardrail
        result = test_custom_message(test_message, "Output Guardrail Test")

        print("\n✅ TESTING COMPLETE!")
        print("🔍 Check the response above to verify it uses 'I', 'my', 'me' instead of 'the candidate'")

    except Exception as e:
        print(f"❌ Error during test: {str(e)}")
        print("💡 Make sure to run ai_assistant.load_models() first if models aren't loaded")

    print("\n" + "="*80)
    print("🎯 OUTPUT GUARDRAIL FUNCTIONALITY:")
    print("• Validates first-person usage (I, my, me)")
    print("• Removes placeholders like [recruiter name]")
    print("• Iterates up to k=5 times for improvement")
    print("• Provides fallback correction if needed")
    print(f"• Uses '{output_guardrail_model_name}' for validation")
    print("="*80)


In [25]:
# 🎯 FINAL SYSTEM READY - HOW TO USE
print("="*90)
print("🎉 AI RECRUITER ASSISTANT WITH OUTPUT GUARDRAIL - SYSTEM READY!")
print("="*90)

print("\n🏗️ SYSTEM ARCHITECTURE IMPLEMENTED:")
print("   📨 Recruiter Message")
print("      ↓")
print("   🛡️ Input Guardrail → Generic/Concrete Classification")
print("      ↓")
print("   🧠 Main Generator → RAG Analysis + Response Generation")
print("      ↓")
print("   🛡️ Output Guardrail → Naturalness Validation")
print("      ↓")
print("   💬 Final Natural Response")

print("\n✅ COMPONENTS READY:")
print(f"   🛡️ Input Guardrail: {input_guardrail_model_name}")
print(f"   🧠 Main Generator: {selected_text_generator_model}")
print(f"   🛡️ Output Guardrail: {output_guardrail_model_name}")
print(f"   🧠 RAG Embeddings: {embedding_model_name}")

print("\n🚀 HOW TO USE:")
print("="*50)

print("\n1️⃣ LOAD MODELS (Required first step):")
print("   ai_assistant.load_models()")

print("\n2️⃣ TEST INDIVIDUAL MESSAGE:")
print("   message = 'Your recruiter message here'")
print("   result = test_custom_message(message, 'Test Name')")

print("\n3️⃣ RUN COMPLETE DEMO (All 4 scenarios):")
print("   results = run_complete_demo()")

print("\n🛡️ OUTPUT GUARDRAIL FEATURES:")
print("   • Converts 'the candidate' → 'I'")
print("   • Converts 'candidate's' → 'my'")
print("   • Removes '[recruiter name]' placeholders")
print("   • Iterates up to k=5 times for improvement")
print("   • Provides fallback correction if needed")

print("\n💡 EXAMPLE USAGE:")
print("="*50)
print("# Step 1: Load models")
print("ai_assistant.load_models()")
print("")
print("# Step 2: Test message")
print("test_message = '''")
print("Hi! We have a Data Engineer position using Python and RAG.")
print("€65k salary, 100% remote. Interested?")
print("'''")
print("result = test_custom_message(test_message, 'Quick Test')")

print("\n" + "="*90)
print("🎯 SYSTEM READY - NO MORE SETUP NEEDED!")
print("="*90)


🎉 AI RECRUITER ASSISTANT WITH OUTPUT GUARDRAIL - SYSTEM READY!

🏗️ SYSTEM ARCHITECTURE IMPLEMENTED:
   📨 Recruiter Message
      ↓
   🛡️ Input Guardrail → Generic/Concrete Classification
      ↓
   🧠 Main Generator → RAG Analysis + Response Generation
      ↓
   🛡️ Output Guardrail → Naturalness Validation
      ↓
   💬 Final Natural Response

✅ COMPONENTS READY:
   🛡️ Input Guardrail: microsoft/Phi-3-mini-4k-instruct
   🧠 Main Generator: mistralai/Mistral-7B-Instruct-v0.3
   🛡️ Output Guardrail: microsoft/Phi-3-mini-4k-instruct
   🧠 RAG Embeddings: sentence-transformers/all-MiniLM-L6-v2

🚀 HOW TO USE:

1️⃣ LOAD MODELS (Required first step):
   ai_assistant.load_models()

2️⃣ TEST INDIVIDUAL MESSAGE:
   message = 'Your recruiter message here'
   result = test_custom_message(message, 'Test Name')

3️⃣ RUN COMPLETE DEMO (All 4 scenarios):
   results = run_complete_demo()

🛡️ OUTPUT GUARDRAIL FEATURES:
   • Converts 'the candidate' → 'I'
   • Converts 'candidate's' → 'my'
   • Removes '[re

# **Stage 4: Application Integration**

In [26]:
import gradio as gr

def process_chat_message(message, history):
    """
    Wrapper function for Gradio ChatInterface that processes recruiter messages
    using the existing ai_assistant object.

    Args:
        message (str): User's input message
        history (list): Chat history (standard Gradio ChatInterface parameter)

    Returns:
        str: Assistant's response to be displayed in the chat
    """
    try:
        # Call the core logic using the existing ai_assistant object
        result = ai_assistant.process_recruiter_message(message)

        # Extract the final_response from the dictionary returned by the method
        final_response = result.get('final_response', 'Sorry, we could not process your message. Try again.')

        return final_response

    except Exception as e:
        # Gracefully handle any potential errors during inference
        error_message = f"""There is an error processing your message.

Please try again or contact support if the issue persists.

Error details: {str(e)}"""

        return error_message



In [None]:
# First, ensure RAG knowledge base is properly set up
print("🚀 Setting up AI Assistant for Gradio interface...")
try:
    # Step 1: Setup RAG Knowledge Base if not already done
    print("📚 Setting up RAG Knowledge Base...")
    if not rag_kb.vectorstore:
        print("   🔧 RAG vectorstore not found, creating it...")
        rag_kb.setup_embeddings()
        if cv_content and expectations_content:
            rag_kb.load_and_process_documents(cv_content, expectations_content)
            rag_kb.create_vectorstore()
            print("   ✅ RAG vectorstore created successfully!")
        else:
            raise ValueError("CV or expectations content not found. Please run earlier cells first.")
    else:
        print("   ✅ RAG vectorstore already exists!")

    # Step 2: Load all models
    print("🤖 Loading AI Assistant models...")
    ai_assistant.load_models()
    print("✅ All models loaded successfully!")

    # Create and launch the Gradio ChatInterface
    print("\n🎯 Creating Gradio Chat Interface...")

    # Create the ChatInterface with the wrapper function
    chat_interface = gr.ChatInterface(
        fn=process_chat_message,
        title="🤖 AI Recruiter Assistant",
        description="""
        **Welcome Recruiter!**

        **How it works:**
        - Send me a message and I'll analyze if it's a good fit

        **Try the examples below or send your message!**
        """,
        examples=[
            "Hi, are you open to new opportunities?" #,
        #     "We have a Senior Data Engineer role with Python and Azure, fully remote, €60-65k salary. Interested?",
        #     "Looking for a Java developer, 5 years experience, on-site in London. Competitive salary.",
        #     "Exciting GenAI opportunity with GCP, Dialogflow, remote work, B2B contract. Spanish required."
        ],
        theme=gr.themes.Soft(),
        chatbot=gr.Chatbot(
            height=800,
            show_label=False,
            avatar_images=(None, "🤖")
        )
    )

    # Launch the interface with public URL and debugging enabled
    print("🌐 Launching Gradio interface...")
    chat_interface.launch(
        share=True,  # Create a public URL
        debug=True,  # Enable debugging output
        server_name="0.0.0.0",  # Allow connections from any IP (important for Colab)
        server_port=7860,  # Default Gradio port
        show_error=True  # Show detailed error messages
    )

except Exception as e:
    print(f"❌ Error setting up Gradio interface: {str(e)}")
    print("\nTroubleshooting tips:")
    print("1. Make sure all previous cells have been executed successfully")
    print("2. Verify that the ai_assistant object exists")
    print("3. Check if there are any memory issues")
    print("4. Try restarting the runtime if problems persist")


🚀 Setting up AI Assistant for Gradio interface...
📚 Setting up RAG Knowledge Base...
   ✅ RAG vectorstore already exists!
🤖 Loading AI Assistant models...
🔧 Loading models...

🛡️ Loading guardrail model: microsoft/Phi-3-mini-4k-instruct
⚡ Loading from cache...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

✅ Guardrail model loaded successfully!

🛡️ Loading output guardrail model: microsoft/Phi-3-mini-4k-instruct
⚡ Loading from cache...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

✅ Output guardrail model loaded successfully!
📥 Loading main model: mistralai/Mistral-7B-Instruct-v0.3
⚡ Loading mistralai/Mistral-7B-Instruct-v0.3 from cache...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

	✅ mistralai/Mistral-7B-Instruct-v0.3 loaded successfully!
✅ All models loaded successfully!
✅ All models loaded successfully!

🎯 Creating Gradio Chat Interface...
🌐 Launching Gradio interface...


  chatbot=gr.Chatbot(


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://5fe6cb02faccda0ef6.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: generic (confidence: 0.90)
   🌍 Language: English
   🔍 Sub-type: opportunity_inquiry
   💬 Generating natural response for opportunity_inquiry in English


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: concrete_offer (confidence: 1.00)
   🌍 Language: English
🔍 Proceeding to RAG analysis...
📊 Calculating match score...
   🎯 Match Score: 85%


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


🛡️ Processing through output guardrail...
   🔍 Validating response naturalness...
      🔄 Iteration 1/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ✅ Response passed validation on iteration 1


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: concrete_offer (confidence: 0.95)
   🌍 Language: English
🔍 Proceeding to RAG analysis...
📊 Calculating match score...
   🎯 Match Score: 85%


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🛡️ Processing through output guardrail...
   🔍 Validating response naturalness...
      🔄 Iteration 1/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ✅ Response passed validation on iteration 1


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: concrete_offer (confidence: 0.95)
   🌍 Language: English
🔍 Proceeding to RAG analysis...
📊 Calculating match score...
   🎯 Match Score: 80%


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🛡️ Processing through output guardrail...
   🔍 Validating response naturalness...
      🔄 Iteration 1/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ⚠️ Issues found: Contains placeholder: '\[.*?\]'
⚠️ Correction error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback correction.
      🔄 Iteration 2/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ✅ Response passed validation on iteration 2


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: concrete_offer (confidence: 0.90)
   🌍 Language: English
🔍 Proceeding to RAG analysis...
📊 Calculating match score...
   🎯 Match Score: 75%


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🛡️ Processing through output guardrail...
   🔍 Validating response naturalness...
      🔄 Iteration 1/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ✅ Response passed validation on iteration 1


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: generic (confidence: 0.80)
   🌍 Language: English
   🔍 Sub-type: opportunity_inquiry
   💬 Generating natural response for opportunity_inquiry in English


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: concrete_offer (confidence: 0.80)
   🌍 Language: [Detected Language Name In English]
🔍 Proceeding to RAG analysis...
📊 Calculating match score...
   🎯 Match Score: 85%


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🛡️ Processing through output guardrail...
   🔍 Validating response naturalness...
      🔄 Iteration 1/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ⚠️ Issues found: Contains placeholder: '\[.*?\]'
⚠️ Correction error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback correction.
      🔄 Iteration 2/5
⚠️ Validation error: 'DynamicCache' object has no attribute 'get_max_length'. Using fallback validation.
      ✅ Response passed validation on iteration 2


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🚀 Processing recruiter message through complete system...
🛡️ Processing message through guardrail...
   📊 Classification: generic (confidence: 0.90)
   🌍 Language: [Detected Language Name In English]
   🔍 Sub-type: basic_introduction
   💬 Generating natural response for basic_introduction in [Detected Language Name In English]
