# Resume-to-Job Description Matching: Model Performance Comparison

This notebook compares different embedding models for resume-job description matching. It includes:
1. Model configuration and loading
2. Dataset preparation
3. Similarity calculations
4. Comparative performance analysis

## 1. Setup Environment

First, we'll import all required libraries and configure API keys.

In [2]:
%pip install voyageai

Collecting voyageai
  Downloading voyageai-0.3.5-py3-none-any.whl.metadata (2.9 kB)
Collecting aiolimiter (from voyageai)
  Downloading aiolimiter-1.2.1-py3-none-any.whl.metadata (4.5 kB)
Downloading voyageai-0.3.5-py3-none-any.whl (28 kB)
Downloading aiolimiter-1.2.1-py3-none-any.whl (6.7 kB)
Installing collected packages: aiolimiter, voyageai
Successfully installed aiolimiter-1.2.1 voyageai-0.3.5


In [3]:
# Import required libraries
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import torch.nn.functional as F
import openai
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
import voyageai
import json
from sklearn.metrics import classification_report, accuracy_score, precision_recall_fscore_support

print("All libraries imported successfully!")

All libraries imported successfully!


In [5]:
# Configure API keys
from google.colab import userdata

# Get API keys from environment variables
openai.api_key = userdata.get('OPENAI_API_KEY')
voyage_api_key = userdata.get('VOYAGE_API_KEY')

# Set up Voyage API client if key exists
if voyage_api_key:
    voyageai.api_key = voyage_api_key
    print("‚úÖ Voyage API key configured.")
else:
    print("‚ö†Ô∏è  Voyage API key not found.")

# Configure Hugging Face token for remote LLaMA inference
hf_token = userdata.get('HF_TOKEN')
if hf_token:
    print("‚úÖ Hugging Face token configured.")
else:
    print("‚ö†Ô∏è  Hugging Face token not found.")

‚úÖ Voyage API key configured.
‚úÖ Hugging Face token configured.


## 2. Data Preparation

Load the dataset and prepare a balanced sample for testing. We'll also set up model configuration flags.

In [6]:
# Define model configuration
model_config = {
    'bge': {'enabled': True},
    'openai': {'enabled': True},
    'bge_m3': {'enabled': True},
    'careerbert': {'enabled': True},
    'confit': {'enabled': True},
    'voyage': {'enabled': True},
    'llama_remote': {'enabled': True}
}

print("Model configuration initialized.")

Model configuration initialized.


In [7]:
# Load the dataset
from google.colab import drive
drive.mount('/content/drive')
DATASET_PATH='/content/drive/MyDrive/AI-ML Self Learning/next_horizon/resume_job_recommendation/model-shashu2325-resume-job-matcher-lora'

# Load and sample the dataset
print("Loading dataset...")
df = pd.read_csv(f'{DATASET_PATH}/dataset.csv')

# Display dataset info
print(f"\nDataset Info:")
print(f"Total records: {len(df)}")
print(f"Columns: {list(df.columns)}")

# Create balanced sample
select_df = df[df['Decision'] == 'select']
reject_df = df[df['Decision'] == 'reject']

total_samples = 100
num_select = min(len(select_df), total_samples // 2)
num_reject = min(len(reject_df), total_samples - num_select)

# Adjust if needed
if num_select + num_reject < total_samples:
    if len(select_df) > len(reject_df):
        num_select = min(len(select_df), total_samples - num_reject)
    else:
        num_reject = min(len(reject_df), total_samples - num_select)

# Sample and combine
sampled_select_df = select_df.sample(n=num_select, random_state=42)
sampled_reject_df = reject_df.sample(n=num_reject, random_state=42)
sampled_df = pd.concat([sampled_select_df, sampled_reject_df])
sampled_df = sampled_df.sample(frac=1, random_state=42).reset_index(drop=True)

print(f"\n‚úÖ Final sample size: {len(sampled_df)} records")
print(f"Sample distribution: {sampled_df['Decision'].value_counts().to_dict()}")

Mounted at /content/drive
Loading dataset...

Dataset Info:
Total records: 10174
Columns: ['Role', 'Resume', 'Decision', 'Reason_for_decision', 'Job_Description']

‚úÖ Final sample size: 100 records
Sample distribution: {'reject': 50, 'select': 50}


## 3. Model-Specific Operations

For each model, we'll define a class that handles:
1. Model loading
2. Similarity calculation
3. Prediction generation

Models will only be loaded if enabled in the configuration.

In [8]:
# Utility function for cosine similarity
def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors"""
    a = np.asarray(a, dtype=float)
    b = np.asarray(b, dtype=float)
    na = np.linalg.norm(a)
    nb = np.linalg.norm(b)
    if na == 0 or nb == 0:
        return 0.0
    return float(a.dot(b) / (na * nb))

# BGE Model
if model_config['bge']['enabled']:
    print("Loading BGE model...")
    try:
        base_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")
        model = PeftModel.from_pretrained(base_model, "shashu2325/resume-job-matcher-lora")
        tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")

        def calculate_bge_similarity(resume_text, job_text):
            resume_inputs = tokenizer(resume_text, return_tensors="pt", max_length=512, padding="max_length", truncation=True)
            job_inputs = tokenizer(job_text, return_tensors="pt", max_length=512, padding="max_length", truncation=True)

            with torch.no_grad():
                resume_outputs = model(**resume_inputs)
                job_outputs = model(**job_inputs)

                resume_emb = resume_outputs.last_hidden_state.mean(dim=1)
                job_emb = job_outputs.last_hidden_state.mean(dim=1)

                resume_emb = F.normalize(resume_emb, p=2, dim=1)
                job_emb = F.normalize(job_emb, p=2, dim=1)

                similarity = torch.sum(resume_emb * job_emb, dim=1)
                match_score = torch.sigmoid(similarity).item()

            return match_score

        print("‚úÖ BGE model loaded and similarity function defined.")
    except Exception as e:
        print(f"‚ùå Error loading BGE model: {e}")
        model_config['bge']['enabled'] = False

Loading BGE model...


config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/4.74M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

‚úÖ BGE model loaded and similarity function defined.


In [9]:
# OpenAI Model
if model_config['openai']['enabled'] and openai.api_key:
    print("Setting up OpenAI model...")
    try:
        def get_openai_embedding(text, model="text-embedding-3-small"):
            text = text.replace("\n", " ")
            return openai.embeddings.create(input=[text], model=model).data[0].embedding

        def calculate_openai_similarity(resume_text, job_text):
            resume_embedding = get_openai_embedding(resume_text)
            job_embedding = get_openai_embedding(job_text)
            return cosine_similarity(resume_embedding, job_embedding)

        print("‚úÖ OpenAI embedding function defined.")
    except Exception as e:
        print(f"‚ùå Error setting up OpenAI: {e}")
        model_config['openai']['enabled'] = False
else:
    print("‚ö†Ô∏è OpenAI model disabled (no API key)")
    model_config['openai']['enabled'] = False

Setting up OpenAI model...
‚úÖ OpenAI embedding function defined.


In [10]:
# Other Transformer Models (BGE-M3, CareerBERT, ConFit)
for model_name, model_details in [
    ('bge_m3', ('BAAI/bge-m3', 'BGE-M3')),
    ('careerbert', ('lwolfrum2/careerbert-g', 'CareerBERT')),
    ('confit', ('sentence-transformers/all-mpnet-base-v2', 'ConFit'))
]:
    if model_config[model_name]['enabled']:
        print(f"Loading {model_details[1]}...")
        try:
            model = SentenceTransformer(model_details[0])

            def make_calculate_similarity(loaded_model):
                def calculate_similarity(resume_text, job_text):
                    resume_embedding = loaded_model.encode(resume_text)
                    job_embedding = loaded_model.encode(job_text)
                    return cosine_similarity(resume_embedding, job_embedding)
                return calculate_similarity

            if model_name == 'bge_m3':
                calculate_bge_m3_similarity = make_calculate_similarity(model)
            elif model_name == 'careerbert':
                calculate_careerbert_similarity = make_calculate_similarity(model)
            else:  # confit
                calculate_confit_similarity = make_calculate_similarity(model)

            print(f"‚úÖ {model_details[1]} model loaded and similarity function defined.")
        except Exception as e:
            print(f"‚ùå Error loading {model_details[1]}: {e}")
            model_config[model_name]['enabled'] = False

Loading BGE-M3...


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

‚úÖ BGE-M3 model loaded and similarity function defined.
Loading CareerBERT...


modules.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/212 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/56.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/732 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/305 [00:00<?, ?B/s]

‚úÖ CareerBERT model loaded and similarity function defined.
Loading ConFit...


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

‚úÖ ConFit model loaded and similarity function defined.


In [11]:
# Voyage Model
if model_config['voyage']['enabled'] and voyage_api_key:
    print("Setting up Voyage model...")
    try:
        def calculate_voyage_similarity(resume_text, job_text, model="voyage-3-large"):
            vo = voyageai.Client()
            resume_embedding = vo.embed([resume_text], model=model).embeddings[0]
            job_embedding = vo.embed([job_text], model=model).embeddings[0]
            return cosine_similarity(resume_embedding, job_embedding)

        print("‚úÖ Voyage embedding function defined.")
    except Exception as e:
        print(f"‚ùå Error setting up Voyage: {e}")
        model_config['voyage']['enabled'] = False
else:
    print("‚ö†Ô∏è Voyage model disabled (no API key)")
    model_config['voyage']['enabled'] = False

Setting up Voyage model...
‚úÖ Voyage embedding function defined.


## 4. Generate Predictions and Classifications

Process the dataset through all enabled models and create a comparative table with role indices.

In [12]:
# Initialize score dictionary for each model
scores = {
    'bge': [],
    'openai': [],
    'bge_m3': [],
    'careerbert': [],
    'confit': [],
    'voyage': [],
    'llama_remote': []
}

# Process each resume-job pair
print("üöÄ Calculating similarity scores for all enabled models...")
for index, row in sampled_df.iterrows():
    resume_text = str(row['Resume']) if pd.notna(row['Resume']) else ""
    job_text = str(row['Job_Description']) if pd.notna(row['Job_Description']) else ""

    print(f"\nüìã Processing pair {index+1}/{len(sampled_df)}")

    # Calculate scores for each enabled model
    if model_config['bge']['enabled']:
        try:
            scores['bge'].append(calculate_bge_similarity(resume_text, job_text))
        except Exception as e:
            scores['bge'].append(None)
            print(f"‚ùå BGE error: {e}")
    else:
        scores['bge'].append(None)

    if model_config['openai']['enabled']:
        try:
            scores['openai'].append(calculate_openai_similarity(resume_text, job_text))
        except Exception as e:
            scores['openai'].append(None)
            print(f"‚ùå OpenAI error: {e}")
    else:
        scores['openai'].append(None)

    if model_config['bge_m3']['enabled']:
        try:
            scores['bge_m3'].append(calculate_bge_m3_similarity(resume_text, job_text))
        except Exception as e:
            scores['bge_m3'].append(None)
            print(f"‚ùå BGE-M3 error: {e}")
    else:
        scores['bge_m3'].append(None)

    if model_config['careerbert']['enabled']:
        try:
            scores['careerbert'].append(calculate_careerbert_similarity(resume_text, job_text))
        except Exception as e:
            scores['careerbert'].append(None)
            print(f"‚ùå CareerBERT error: {e}")
    else:
        scores['careerbert'].append(None)

    if model_config['confit']['enabled']:
        try:
            scores['confit'].append(calculate_confit_similarity(resume_text, job_text))
        except Exception as e:
            scores['confit'].append(None)
            print(f"‚ùå ConFit error: {e}")
    else:
        scores['confit'].append(None)

    if model_config['voyage']['enabled']:
        try:
            scores['voyage'].append(calculate_voyage_similarity(resume_text, job_text))
        except Exception as e:
            scores['voyage'].append(None)
            print(f"‚ùå Voyage error: {e}")
    else:
        scores['voyage'].append(None)

    if model_config['llama_remote']['enabled']:
        try:
            scores['llama_remote'].append(calculate_llama_remote_similarity(resume_text, job_text))
        except Exception as e:
            scores['llama_remote'].append(None)
            print(f"‚ùå LLaMA Remote error: {e}")
    else:
        scores['llama_remote'].append(None)

print("\n‚úÖ Score calculation complete!")

üöÄ Calculating similarity scores for all enabled models...

üìã Processing pair 1/100
‚ùå BGE error: SentenceTransformer.forward() missing 1 required positional argument: 'input'
‚ùå LLaMA Remote error: name 'calculate_llama_remote_similarity' is not defined

üìã Processing pair 2/100
‚ùå BGE error: SentenceTransformer.forward() missing 1 required positional argument: 'input'
‚ùå LLaMA Remote error: name 'calculate_llama_remote_similarity' is not defined

üìã Processing pair 3/100
‚ùå BGE error: SentenceTransformer.forward() missing 1 required positional argument: 'input'
‚ùå LLaMA Remote error: name 'calculate_llama_remote_similarity' is not defined

üìã Processing pair 4/100
‚ùå BGE error: SentenceTransformer.forward() missing 1 required positional argument: 'input'
‚ùå LLaMA Remote error: name 'calculate_llama_remote_similarity' is not defined

üìã Processing pair 5/100
‚ùå BGE error: SentenceTransformer.forward() missing 1 required positional argument: 'input'
‚ùå LLaMA Remo

In [13]:
# Add scores to DataFrame and generate predictions
threshold = 0.5
print(f"Using classification threshold: {threshold}")

# Function to make predictions
def make_prediction(score, threshold=0.5):
    if score is None or pd.isna(score):
        return 'unknown'
    return 'select' if score > threshold else 'reject'

# Add scores and predictions to DataFrame
for model in scores.keys():
    if model_config[model]['enabled']:
        sampled_df[f'{model}_similarity'] = scores[model]
        sampled_df[f'{model}_prediction'] = [make_prediction(s, threshold) for s in scores[model]]

# Create display columns including Role and Reason_for_decision
display_columns = ['Role', 'Decision', 'Reason_for_decision']
for model in scores.keys():
    if model_config[model]['enabled']:
        display_columns.extend([f'{model}_similarity', f'{model}_prediction'])

# Create comparative table with original index
comparison_table = sampled_df[display_columns]

print("\nüìä Comparative Table of Similarity Scores and Predictions:")
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', lambda x: '%.4f' % x)
print(comparison_table.to_string())

Using classification threshold: 0.5

üìä Comparative Table of Similarity Scores and Predictions:
                          Role Decision                                                                                                                                                                                Reason_for_decision bge_similarity bge_prediction  openai_similarity openai_prediction  bge_m3_similarity bge_m3_prediction  careerbert_similarity careerbert_prediction  confit_similarity confit_prediction  voyage_similarity voyage_prediction llama_remote_similarity llama_remote_prediction
0            Robotics Engineer   reject                                                                                                                                                    Lacked leadership skills for a senior position.           None        unknown             0.4729            reject             0.6034            select                 0.6868                select             

In [14]:
# Calculate performance metrics for each enabled model
print("\nüìà Model Performance Metrics:")
print("=" * 80)

for model in scores.keys():
    if model_config[model]['enabled']:
        print(f"\nüîç {model.upper()} Model Performance:")
        print("-" * 50)

        # Filter out unknown predictions
        valid_mask = sampled_df[f'{model}_prediction'] != 'unknown'
        valid_df = sampled_df[valid_mask]

        if len(valid_df) > 0:
            accuracy = accuracy_score(valid_df['Decision'], valid_df[f'{model}_prediction'])
            precision, recall, f1, _ = precision_recall_fscore_support(
                valid_df['Decision'],
                valid_df[f'{model}_prediction'],
                average='weighted'
            )

            print(f"Accuracy: {accuracy:.4f}")
            print(f"Precision: {precision:.4f}")
            print(f"Recall: {recall:.4f}")
            print(f"F1-Score: {f1:.4f}")
            print(f"Valid predictions: {len(valid_df)}/{len(sampled_df)}")

            print("\nDetailed Classification Report:")
            print(classification_report(valid_df['Decision'], valid_df[f'{model}_prediction']))
        else:
            print("No valid predictions available")


üìà Model Performance Metrics:

üîç BGE Model Performance:
--------------------------------------------------
No valid predictions available

üîç OPENAI Model Performance:
--------------------------------------------------
Accuracy: 0.4800
Precision: 0.4740
Recall: 0.4800
F1-Score: 0.4482
Valid predictions: 100/100

Detailed Classification Report:
              precision    recall  f1-score   support

      reject       0.49      0.72      0.58        50
      select       0.46      0.24      0.32        50

    accuracy                           0.48       100
   macro avg       0.47      0.48      0.45       100
weighted avg       0.47      0.48      0.45       100


üîç BGE_M3 Model Performance:
--------------------------------------------------
Accuracy: 0.4700
Precision: 0.3848
Recall: 0.4700
F1-Score: 0.3498
Valid predictions: 100/100

Detailed Classification Report:
              precision    recall  f1-score   support

      reject       0.29      0.04      0.07        50
