# CV–Job Matching Notebook (Sentence-BERT)

**Purpose**  
This notebook demonstrates matching CVs with Job Descriptions using **Sentence-BERT models**.  
It combines both **global similarity scoring** and **detailed line-by-line semantic search**.

---

**Models Used**  
- `multi-qa-mpnet-base-dot-v1`  
- `all-MiniLM-L6-v2`  

**Approaches**  
1. **Global Similarity (Whole Text)**  
   - Computes cosine similarity and dot-product scores between the entire CV and the entire Job Description.  
   - Converts scores to percentages to indicate overall suitability.

2. **Detailed Line-by-Line Matching**  
   - Splits CVs and Job Descriptions into sentences/lines.  
   - Finds the best matching JD sentence for each CV sentence.  
   - Computes an overall average suitability score from line-level matches.

---

**Inputs**  
- `cv.txt` : A text file containing a CV (multi-line supported).  
- `job_desc.txt` : A text file containing a Job Description (multi-line supported).  

---

**Outputs**  
- **Global similarity scores**: dot-product, cosine similarity, percentages.  
- **Line-by-line best matches**: each CV line with its best-matching JD line and score.  
- **Overall suitability**: average score based on line-level matches.

---

**Usage**  
1. Place your CV and Job Description text files in the input path.  
2. Run the notebook cells sequentially.  
3. Review both **global similarity results** and **detailed line matches**.


In [1]:
pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-5.1.0-py3-none-any.whl.metadata (16 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_

In [9]:
from sentence_transformers import SentenceTransformer, util
import math

# Load model
model = SentenceTransformer('sentence-transformers/multi-qa-mpnet-base-dot-v1')

# Read texts
cv_text = open("/kaggle/input/dataset/cv.txt", encoding="utf-8").read()
jd_text = open("/kaggle/input/resume-job-matcher/job_desc.txt", encoding="utf-8").read()

# Encode as tensors
cv_emb = model.encode(cv_text, convert_to_tensor=True)
jd_emb = model.encode(jd_text, convert_to_tensor=True)

# Dot & Cosine (both return torch.Tensor)
dot = util.dot_score(cv_emb, jd_emb)         # tensor shape (1,1)
cos = util.cos_sim(cv_emb, jd_emb)           # tensor shape (1,1)

# Convert to python floats (safe even if on GPU)
dot_score = float(dot.cpu().item())
cos_score = float(cos.cpu().item())

# Percent conversions
percent_cos = (cos_score + 1) / 2 * 100
percent_sigmoid = 100 * (1 / (1 + math.exp(-dot_score)))

# Print
print(f"Dot score: {dot_score:.4f}")
print(f"Cosine score: {cos_score:.4f}")
print(f"Cosine-based %: {percent_cos:.2f}%")
print(f"Sigmoid-based %: {percent_sigmoid:.2f}%")


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Dot score: 28.3793
Cosine score: 0.7108
Cosine-based %: 85.54%
Sigmoid-based %: 100.00%


In [10]:
from sentence_transformers import SentenceTransformer, util

# -----------------------------
# Load model
# -----------------------------
model = SentenceTransformer("all-MiniLM-L6-v2")

# -----------------------------
# Helper: Split text into lines/sentences
# -----------------------------
def split_text(text):
    return [line.strip() for line in text.split("\n") if line.strip()]

# -----------------------------
# Read CV & JD text files
# -----------------------------
with open("/kaggle/input/dataset/cv.txt", "r", encoding="utf-8") as f:
    cv_text = f.read()

with open("/kaggle/input/resume-job-matcher/job_desc.txt", "r", encoding="utf-8") as f:
    jd_text = f.read()

cv_sents = split_text(cv_text)
jd_sents = split_text(jd_text)

# -----------------------------
# Encode embeddings
# -----------------------------
cv_emb = model.encode(cv_sents, convert_to_tensor=True)
jd_emb = model.encode(jd_sents, convert_to_tensor=True)

# -----------------------------
# Detailed matching
# -----------------------------
results = util.semantic_search(cv_emb, jd_emb, top_k=1)

print("\n===== Detailed Matching (CV ↔ JD) =====\n")
for i, res in enumerate(results):
    best_idx = res[0]['corpus_id']
    score = res[0]['score']
    print(f"CV: {cv_sents[i]}")
    print(f"Best JD match: {jd_sents[best_idx]} (score: {score:.2f})\n")

# -----------------------------
# Overall average match
# -----------------------------
avg_score = sum([res[0]['score'] for res in results]) / len(results)
print("===== Overall Suitability =====")
print(f"Average similarity score: {avg_score:.4f}")
print(f"Overall match: {avg_score*100:.2f}%")


Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]


===== Detailed Matching (CV ↔ JD) =====

CV: SARAH JOHNSON
Best JD match: • ML/AI: TensorFlow, PyTorch, Hugging Face Transformers, Sentence Transformers (score: 0.25)

CV: Senior Software Engineer
Best JD match: Senior Backend Engineer - AI/ML Focus (score: 0.61)

CV: Email: sarah.johnson@email.com | Phone: (555) 123-4567
Best JD match: • ML/AI: TensorFlow, PyTorch, Hugging Face Transformers, Sentence Transformers (score: 0.17)

CV: LinkedIn: linkedin.com/in/sarahjohnson | GitHub: github.com/sarahjohnson
Best JD match: Please submit your resume along with a cover letter highlighting your experience with machine learning and backend development. Include links to relevant projects or GitHub repositories that demonstrate your expertise in semantic search, embedding models, or similar AI/ML applications. (score: 0.28)

CV: PROFESSIONAL SUMMARY
Best JD match: COMPANY OVERVIEW (score: 0.49)

CV: Experienced Senior Software Engineer with 6+ years of expertise in backend development, speciali

In [11]:
from sentence_transformers import SentenceTransformer, util
import math

# -----------------------------
# Load model
# -----------------------------
model = SentenceTransformer("all-MiniLM-L6-v2")

# -----------------------------
# Helper: Split text into lines/sentences
# -----------------------------
def split_text(text):
    return [line.strip() for line in text.split("\n") if line.strip()]

# -----------------------------
# Read CV & JD text files
# -----------------------------
with open("/kaggle/input/dataset/cv.txt", "r", encoding="utf-8") as f:
    cv_text = f.read()

with open("/kaggle/input/resume-job-matcher/job_desc.txt", "r", encoding="utf-8") as f:
    jd_text = f.read()

cv_sents = split_text(cv_text)
jd_sents = split_text(jd_text)

# -----------------------------
# Encode embeddings
# -----------------------------
cv_emb = model.encode(cv_sents, convert_to_tensor=True)
jd_emb = model.encode(jd_sents, convert_to_tensor=True)

# -----------------------------
# (1) Global similarity (whole text vs. whole JD)
# -----------------------------
cv_whole_emb = model.encode(cv_text, convert_to_tensor=True)
jd_whole_emb = model.encode(jd_text, convert_to_tensor=True)

dot = util.dot_score(cv_whole_emb, jd_whole_emb)
cos = util.cos_sim(cv_whole_emb, jd_whole_emb)

dot_score = float(dot.cpu().item())
cos_score = float(cos.cpu().item())

percent_cos = (cos_score + 1) / 2 * 100
percent_sigmoid = 100 * (1 / (1 + math.exp(-dot_score)))

print("===== Global Similarity (CV ↔ JD) =====")
print(f"Dot score: {dot_score:.4f}")
print(f"Cosine score: {cos_score:.4f}")
print(f"Cosine-based %: {percent_cos:.2f}%")
print(f"Sigmoid-based %: {percent_sigmoid:.2f}%")

# -----------------------------
# (2) Detailed line-by-line matching
# -----------------------------
results = util.semantic_search(cv_emb, jd_emb, top_k=1)

print("\n===== Detailed Matching (CV ↔ JD) =====\n")
for i, res in enumerate(results):
    best_idx = res[0]['corpus_id']
    score = res[0]['score']
    print(f"CV: {cv_sents[i]}")
    print(f"Best JD match: {jd_sents[best_idx]} (score: {score:.2f})\n")

# -----------------------------
# (3) Overall average match (from line matches)
# -----------------------------
avg_score = sum([res[0]['score'] for res in results]) / len(results)
print("===== Overall Suitability (Line-based) =====")
print(f"Average similarity score: {avg_score:.4f}")
print(f"Overall match: {avg_score*100:.2f}%")


Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

===== Global Similarity (CV ↔ JD) =====
Dot score: 0.8038
Cosine score: 0.8038
Cosine-based %: 90.19%
Sigmoid-based %: 69.08%

===== Detailed Matching (CV ↔ JD) =====

CV: SARAH JOHNSON
Best JD match: • ML/AI: TensorFlow, PyTorch, Hugging Face Transformers, Sentence Transformers (score: 0.25)

CV: Senior Software Engineer
Best JD match: Senior Backend Engineer - AI/ML Focus (score: 0.61)

CV: Email: sarah.johnson@email.com | Phone: (555) 123-4567
Best JD match: • ML/AI: TensorFlow, PyTorch, Hugging Face Transformers, Sentence Transformers (score: 0.17)

CV: LinkedIn: linkedin.com/in/sarahjohnson | GitHub: github.com/sarahjohnson
Best JD match: Please submit your resume along with a cover letter highlighting your experience with machine learning and backend development. Include links to relevant projects or GitHub repositories that demonstrate your expertise in semantic search, embedding models, or similar AI/ML applications. (score: 0.28)

CV: PROFESSIONAL SUMMARY
Best JD match: COMPAN