# CV–Job Matching with BGE + LoRA Fine-Tuning  

**Purpose:**  
This notebook matches CVs/resumes with job descriptions using a fine-tuned version of [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) (a high-quality sentence embedding model).  
It loads the base BGE model and applies a LoRA adapter (`shashu2325/resume-job-matcher-lora`) to specialize the embeddings for resume–job relevance.  

**Modes:**  

- **Single CV–Job Mode**: Computes similarity and a match score between one CV/resume and one job description.  
- **Batch Matching Mode** *(optional extension)*: Encodes multiple CVs and multiple job descriptions, then builds a similarity matrix to find top matches.  

**Inputs:**  
- `resume.txt` : text file containing one or more CVs/resumes.  
- `job_desc.txt` : text file containing one or more job descriptions.  

**Outputs:**  
- **Match Score** (single mode): cosine similarity normalized and mapped to a 0–1 “suitability” score.  
- **Similarity Matrix** and **Top Matches** (batch mode).  

**How It Works:**  
1. Load the base model and LoRA adapter.  
2. Tokenize the input texts (up to 512 tokens).  
3. Pass through the model to get contextual embeddings.  
4. Apply mean pooling over the token embeddings.  
5. L2-normalize both embeddings and compute similarity.  
6. Convert similarity to a “match score” with a sigmoid.  

**Usage:**  
Run the cells sequentially.  
- For **single CV–job** matching: provide the file paths or paste text into the variables.  
- For **batch matching**: loop through multiple CVs and job descriptions, encode each, and compute the similarity matrix.  


In [2]:
from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch
import torch.nn.functional as F

# Load models
base_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")
model = PeftModel.from_pretrained(base_model, "shashu2325/resume-job-matcher-lora")
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")

# Example texts
# resume_text = "Software engineer with Python experience"
# job_text = "Looking for Python developer"
resume_text = open("/kaggle/input/resjob/resume.txt").read()
job_text = open("/kaggle/input/resjob/job_desc.txt").read()

# resume_inputs = model.encode(resume_text, convert_to_tensor=True)
# job_inputs = model.encode(jd_text, convert_to_tensor=True)
# Process texts
resume_inputs = tokenizer(resume_text, return_tensors="pt", max_length=512, padding="max_length", truncation=True)
job_inputs = tokenizer(job_text, return_tensors="pt", max_length=512, padding="max_length", truncation=True)

# Get embeddings
with torch.no_grad():
    # Get embeddings using mean pooling
    resume_outputs = model(**resume_inputs)
    job_outputs = model(**job_inputs)
    
    # Mean pooling
    resume_emb = resume_outputs.last_hidden_state.mean(dim=1)
    job_emb = job_outputs.last_hidden_state.mean(dim=1)
    
    # Normalize and calculate similarity
    resume_emb = F.normalize(resume_emb, p=2, dim=1)
    job_emb = F.normalize(job_emb, p=2, dim=1)
    
    similarity = torch.sum(resume_emb * job_emb, dim=1)
    match_score = torch.sigmoid(similarity).item()

print(f"Match score: {match_score:.4f}")


Match score: 0.6401
