**CV-Job Matching with JobBERT**

**Purpose**:
This notebook allows matching CVs with job descriptions using the JobBERT model. It includes two modes:

**Single CV–Job Mode**: Computes the similarity and suitability score for a single CV and a single job description.

**Batch Matching Mode**: Computes a similarity matrix between multiple CVs and multiple job descriptions to find top matches.

**Inputs**:

**my_cv.txt** : Text file containing one or more CVs

**job_description.txt** : Text file containing one or more job descriptions

**Outputs**:

Similarity Score and Suitability Percentage (single mode)

Similarity Matrix and Top Matches (batch mode)

**Usage**:

Run cells sequentially.

For single CV-job matching, enter text interactively or read from files.

For batch matching, encode all CVs and jobs and compute a similarity matrix.

In [1]:
pip install -U sentence-transformers


Collecting sentence-transformers
  Downloading sentence_transformers-5.1.0-py3-none-any.whl.metadata (16 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_

In [2]:
import torch
import numpy as np
from tqdm.auto import tqdm
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import batch_to_device, cos_sim

# Load the model
model = SentenceTransformer("TechWolf/JobBERT-v3")

def encode_batch(jobbert_model, texts):
    features = jobbert_model.tokenize(texts)
    features = batch_to_device(features, jobbert_model.device)
    features["text_keys"] = ["anchor"]
    with torch.no_grad():
        out_features = jobbert_model.forward(features)
    return out_features["sentence_embedding"].cpu().numpy()

def encode(jobbert_model, texts, batch_size: int = 8):
    # Sort texts by length and keep track of original indices
    sorted_indices = np.argsort([len(text) for text in texts])
    sorted_texts = [texts[i] for i in sorted_indices]
    
    embeddings = []
    
    # Encode in batches
    for i in tqdm(range(0, len(sorted_texts), batch_size)):
        batch = sorted_texts[i:i+batch_size]
        embeddings.append(encode_batch(jobbert_model, batch))
    
    # Concatenate embeddings and reorder to original indices
    sorted_embeddings = np.concatenate(embeddings)
    original_order = np.argsort(sorted_indices)
    return sorted_embeddings[original_order]

# Example usage
job_titles = [
    'Software Engineer',
    '高级软件开发人员',  # senior software developer
    'Produktmanager',  # product manager
    'Científica de datos'  # data scientist
]

# Get embeddings
embeddings = encode(model, job_titles)

# Calculate cosine similarity matrix
similarities = cos_sim(embeddings, embeddings)
print(similarities)


2025-09-15 09:18:41.315967: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1757927921.662189      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1757927921.752195      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


modules.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/199 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/697 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

tensor([[1.0000, 0.8087, 0.4673, 0.5669],
        [0.8087, 1.0000, 0.4428, 0.4968],
        [0.4673, 0.4428, 1.0000, 0.4292],
        [0.5669, 0.4968, 0.4292, 1.0000]])


In [1]:
import torch
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import batch_to_device, cos_sim

# -----------------------------
# Load the JobBERT model
# -----------------------------
model = SentenceTransformer("TechWolf/JobBERT-v3")

# -----------------------------
# Helper function to encode text
# -----------------------------
def encode_text(jobbert_model, text):
    features = jobbert_model.tokenize([text])
    features = batch_to_device(features, jobbert_model.device)
    features["text_keys"] = ["anchor"]
    with torch.no_grad():
        out_features = jobbert_model.forward(features)
    return out_features["sentence_embedding"].cpu()

# -----------------------------
# Compute similarity
# -----------------------------
def compute_similarity(cv_text, job_text):
    cv_emb = encode_text(model, cv_text)
    job_emb = encode_text(model, job_text)
    similarity = cos_sim(cv_emb, job_emb)[0][0].item()  # scalar
    return similarity

# -----------------------------
# Read CV and Job Description files
# -----------------------------
with open("/kaggle/input/resumejob/resume.txt", "r", encoding="utf-8") as f:
    my_cv = f.read()

with open("/kaggle/input/resumejob/Job.txt", "r", encoding="utf-8") as f:
    job_description = f.read()

# -----------------------------
# Calculate similarity & suitability
# -----------------------------
similarity_score = compute_similarity(my_cv, job_description)
percentage_score = similarity_score * 100

# -----------------------------
# Print results
# -----------------------------
print("\n===============================")
print(f"Similarity Score: {similarity_score:.4f}")
print(f"Suitability: {percentage_score:.2f}%")
print("===============================\n")


2025-09-15 09:37:13.126362: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1757929033.323066      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1757929033.386361      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


modules.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/199 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/697 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]


Similarity Score: 0.7717
Suitability: 77.17%



In [3]:
import torch
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import batch_to_device, cos_sim

# Load the model
model = SentenceTransformer("TechWolf/JobBERT-v3")

# -----------------------------
# Helper function to encode text
# -----------------------------
def encode_text(jobbert_model, text):
    features = jobbert_model.tokenize([text])
    features = batch_to_device(features, jobbert_model.device)
    features["text_keys"] = ["anchor"]
    with torch.no_grad():
        out_features = jobbert_model.forward(features)
    return out_features["sentence_embedding"].cpu()

# -----------------------------
# Compute similarity
# -----------------------------
def compute_similarity(cv_text, job_text):
    cv_emb = encode_text(model, cv_text)
    job_emb = encode_text(model, job_text)
    similarity = cos_sim(cv_emb, job_emb)[0][0].item()
    return similarity

# -----------------------------
# Interactive input
# -----------------------------
print("Enter your CV (one paragraph):")
# As example write this AI Engineer with 5 years experience in Python, TensorFlow, PyTorch, NLP, and computer vision. Experienced in building ML pipelines and deploying models to cloud platforms like AWS and Azure.
my_cv = input()

print("\nEnter the Job Description (one paragraph):")
# As example write this Looking for a Senior AI Engineer skilled in Python, TensorFlow, PyTorch, NLP, computer vision, and cloud deployment (AWS/Azure). Responsible for building ML pipelines and deploying AI models.
job_description = input()

similarity_score = compute_similarity(my_cv, job_description)
percentage_score = similarity_score * 100

print("\n----------------------------")
print(f"Similarity Score: {similarity_score:.4f}")
print(f"Suitability: {percentage_score:.2f}%")
print("----------------------------")


Enter your CV (one paragraph):


 AI Engineer with 5 years experience in Python, TensorFlow, PyTorch, NLP, and computer vision. Experienced in building ML pipelines and deploying models to cloud platforms like AWS and Azure.



Enter the Job Description (one paragraph):


 Looking for a Senior AI Engineer skilled in Python, TensorFlow, PyTorch, NLP, computer vision, and cloud deployment (AWS/Azure). Responsible for building ML pipelines and deploying AI models.



----------------------------
Similarity Score: 0.9421
Suitability: 94.21%
----------------------------
