<a href="https://colab.research.google.com/github/Roaa27/Resume-Screening-Using-NLP/blob/main/res_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step 1: Install and Import Libraries  
In this step, I installed the required libraries (`sentence-transformers`, scikit-learn, and `pandas`) and imported the necessary modules for data processing, embeddings, and similarity calculation.

In [None]:
!pip install sentence-transformers scikit-learn pandas

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

print("✅ Libraries installed and imported successfully!")




Step 2: Upload the Datasets  
Here, I uploaded two datasets:  
- UpdatedResumeDataSet.csv → contains candidate resumes.  
- job_descriptions_sample.csv → contains job descriptions (a smaller sampled version for faster processing).  
After uploading, I loaded both datasets into Pandas DataFrames and displayed their first few rows.

In [None]:
from google.colab import files
import pandas as pd

print("⬆️ Upload UpdatedResumeDataSet.csv")
uploaded = files.upload()
resumes = pd.read_csv("UpdatedResumeDataSet.csv")

print("⬆️ Upload job_descriptions_sample.csv")
uploaded = files.upload()
jobs = pd.read_csv("job_descriptions_sample.csv")

print("\n✅ Data loaded successfully!")
print(resumes.head())
print(jobs.head())

⬆️ Upload UpdatedResumeDataSet.csv


Saving UpdatedResumeDataSet.csv to UpdatedResumeDataSet (3).csv
⬆️ Upload job_descriptions_sample.csv


Saving job_descriptions_sample.csv to job_descriptions_sample.csv

✅ Data loaded successfully!
       Category                                             Resume
0  Data Science  Skills * Programming Languages: Python (pandas...
1  Data Science  Education Details \r\nMay 2013 to May 2017 B.E...
2  Data Science  Areas of Interest Deep Learning, Control Syste...
3  Data Science  Skills â¢ R â¢ Python â¢ SAP HANA â¢ Table...
4  Data Science  Education Details \r\n MCA   YMCAUST,  Faridab...
         Job Id     Experience Qualifications Salary Range    location  \
0  1.089840e+15  5 to 15 Years         M.Tech    $59K-$99K     Douglas   
1  3.984540e+14  2 to 12 Years            BCA   $56K-$116K    Ashgabat   
2  4.816400e+14  0 to 12 Years            PhD   $61K-$104K       Macao   
3  6.881930e+14  4 to 11 Years            PhD    $65K-$91K  Porto-Novo   
4  1.170580e+14  1 to 12 Years            MBA    $64K-$87K    Santiago   

            Country  latitude  longitude  Work Type  Compa

Step 3: Inspect Dataset Columns  
I printed the column names from both datasets to confirm which fields to use for embedding:  
- From resumes → Resume  
- From jobs → Job Description and Job Title  
This helps avoid errors due to column name mismatches.

In [None]:
print(resumes.columns)
print(jobs.columns)

Index(['Category', 'Resume'], dtype='object')
Index(['Job Id', 'Experience', 'Qualifications', 'Salary Range', 'location',
       'Country', 'latitude', 'longitude', 'Work Type', 'Company Size',
       'Job Posting Date', 'Preference', 'Contact Person', 'Contact',
       'Job Title', 'Role', 'Job Portal', 'Job Description', 'Benefits',
       'skills', 'Responsibilities', 'Company', 'Company Profile'],
      dtype='object')


Step 4: Load the Embedding Model  
I loaded the pre-trained all-MiniLM-L6-v2 model from SentenceTransformers.  
This model converts text into numerical embeddings that capture semantic meaning, which we will use to compare resumes and job descriptions.

In [None]:
model = SentenceTransformer("all-MiniLM-L6-v2")
print("✅ Embedding model loaded!")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Embedding model loaded!


Step 5: Encode Texts into Embeddings  
I converted all resumes and job descriptions into embeddings using the model.  
Now each resume and job description is represented as a high-dimensional vector, enabling similarity calculations.

In [None]:
resume_embeddings = model.encode(resumes["Resume"].tolist(), convert_to_tensor=True)
job_embeddings = model.encode(jobs["Job Description"].tolist(), convert_to_tensor=True)
print("✅ Texts converted to embeddings!")

✅ Texts converted to embeddings!


Step 6: Compute Similarity Scores  
I calculated the cosine similarity between each resume embedding and each job description embedding.  
This produced a similarity matrix where:  
- Rows = resumes  
- Columns = jobs  
- Values = similarity scores

In [None]:
similarity_matrix = cosine_similarity(resume_embeddings.cpu(), job_embeddings.cpu())
print("✅ Similarity scores computed!")

✅ Similarity scores computed!


Step 7: Rank Resumes for Each Job  
For every job description, I ranked all resumes by their similarity score.  
I then selected the top 3 resumes that best matched each job.

In [None]:
results = []
for job_idx, job in jobs.iterrows():
    sims = similarity_matrix[:, job_idx]
    top_indices = sims.argsort()[::-1][:3]

    for rank, resume_idx in enumerate(top_indices, start=1):
        results.append({
            "job_id": job_idx,
            "job_title": job["Job Title"],
            "resume_id": resume_idx,
            "match_score": round(sims[resume_idx]*100, 2),
            "resume_excerpt": resumes.iloc[resume_idx]["Resume"][:150] + "..."
        })

results_df = pd.DataFrame(results)
print("✅ Ranking completed!")
print(results_df.head(10))


✅ Ranking completed!
   job_id                     job_title  resume_id  match_score  \
0       0  Digital Marketing Specialist        259    26.840000   
1       0  Digital Marketing Specialist        264    26.840000   
2       0  Digital Marketing Specialist        249    26.840000   
3       1                 Web Developer        335    47.250000   
4       1                 Web Developer        391    47.250000   
5       1                 Web Developer        321    47.250000   
6       2            Operations Manager        544    37.040001   
7       2            Operations Manager        532    37.040001   
8       2            Operations Manager        520    37.040001   
9       3              Network Engineer        664    36.889999   

                                      resume_excerpt  
0  Skill Sets: â¢ Multi-tasking â¢ Collaborativ...  
1  Skill Sets: â¢ Multi-tasking â¢ Collaborativ...  
2  Skill Sets: â¢ Multi-tasking â¢ Collaborativ...  
3  TECHNICAL SKILLS S

Step 8: Display Results  
I organized the results into a Pandas DataFrame showing:  
- Job ID and title  
- Resume ID  
- Match score (%)  
- Resume excerpt  

Finally, I printed the top matches for the first two jobs to demonstrate how the system ranks resumes for different roles.

In [None]:
for job_idx in jobs.index[:2]:
    print("\n🔹 Job:", jobs.loc[job_idx, "Job Title"])
    job_matches = results_df[results_df["job_id"] == job_idx].sort_values(by="match_score", ascending=False)
    print(job_matches[["resume_id", "match_score", "resume_excerpt"]].to_string(index=False))


🔹 Job: Digital Marketing Specialist
 resume_id  match_score                                                                                                                                            resume_excerpt
       259        26.84 Skill Sets: â¢ Multi-tasking â¢ Collaborative â¢ Optimistic Thinking â¢ Effective teamleader/team trainer â¢ Visualizing the work which is to be ...
       264        26.84 Skill Sets: â¢ Multi-tasking â¢ Collaborative â¢ Optimistic Thinking â¢ Effective teamleader/team trainer â¢ Visualizing the work which is to be ...
       249        26.84 Skill Sets: â¢ Multi-tasking â¢ Collaborative â¢ Optimistic Thinking â¢ Effective teamleader/team trainer â¢ Visualizing the work which is to be ...

🔹 Job: Web Developer
 resume_id  match_score                                                                                                                                            resume_excerpt
       335        47.25 TECHNICAL SKILLS Skills: Ja