Machine Learning Task 3 (2026)
Resume / Candidate Screening System

About the Task:
Hiring teams receive hundreds of resumes for a single job role.
Manually reviewing resumes is slow, inconsistent, and error-prone.

In this project, I built a machine learningâ€“based resume screening system
that automatically analyzes resumes, extracts relevant skills, compares
them with a job description, and ranks candidates based on role fit.

This system helps recruiters shortlist candidates faster and identify
missing or weak skills efficiently.


In [3]:
import pandas as pd
import re


In [4]:
import pandas as pd

df = pd.read_csv("Resume.csv")
df.head()


Unnamed: 0,ID,Resume_str,Resume_html,Category
0,16852973,HR ADMINISTRATOR/MARKETING ASSOCIATE\...,"<div class=""fontsize fontface vmargins hmargin...",HR
1,22323967,"HR SPECIALIST, US HR OPERATIONS ...","<div class=""fontsize fontface vmargins hmargin...",HR
2,33176873,HR DIRECTOR Summary Over 2...,"<div class=""fontsize fontface vmargins hmargin...",HR
3,27018550,HR SPECIALIST Summary Dedica...,"<div class=""fontsize fontface vmargins hmargin...",HR
4,17812897,HR MANAGER Skill Highlights ...,"<div class=""fontsize fontface vmargins hmargin...",HR


In [5]:
def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'[^a-zA-Z ]', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    return text

df['clean_resume'] = df['Resume_str'].apply(clean_text)
df.head()



Unnamed: 0,ID,Resume_str,Resume_html,Category,clean_resume
0,16852973,HR ADMINISTRATOR/MARKETING ASSOCIATE\...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr administrator marketing associate hr admin...
1,22323967,"HR SPECIALIST, US HR OPERATIONS ...","<div class=""fontsize fontface vmargins hmargin...",HR,hr specialist us hr operations summary versat...
2,33176873,HR DIRECTOR Summary Over 2...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr director summary over years experience in ...
3,27018550,HR SPECIALIST Summary Dedica...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr specialist summary dedicated driven and dy...
4,17812897,HR MANAGER Skill Highlights ...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr manager skill highlights hr skills hr depa...


In [6]:
job_description = """
Looking for a data scientist with strong skills in python,
machine learning, data analysis, statistics, sql, pandas,
scikit-learn and data visualization.
"""


In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words='english')

documents = df['clean_resume'].tolist() + [job_description]

tfidf_matrix = vectorizer.fit_transform(documents)


In [8]:
from sklearn.metrics.pairwise import cosine_similarity

resume_vectors = tfidf_matrix[:-1]
job_vector = tfidf_matrix[-1]

similarity_scores = cosine_similarity(resume_vectors, job_vector)

df['Match_Score'] = similarity_scores
df.head()


Unnamed: 0,ID,Resume_str,Resume_html,Category,clean_resume,Match_Score
0,16852973,HR ADMINISTRATOR/MARKETING ASSOCIATE\...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr administrator marketing associate hr admin...,0.042243
1,22323967,"HR SPECIALIST, US HR OPERATIONS ...","<div class=""fontsize fontface vmargins hmargin...",HR,hr specialist us hr operations summary versat...,0.000584
2,33176873,HR DIRECTOR Summary Over 2...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr director summary over years experience in ...,0.000882
3,27018550,HR SPECIALIST Summary Dedica...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr specialist summary dedicated driven and dy...,0.002873
4,17812897,HR MANAGER Skill Highlights ...,"<div class=""fontsize fontface vmargins hmargin...",HR,hr manager skill highlights hr skills hr depa...,0.005107


In [9]:
ranked_candidates = df.sort_values(by='Match_Score', ascending=False)

ranked_candidates[['Category', 'Match_Score']].head(10)


Unnamed: 0,Category,Match_Score
1218,CONSULTANT,0.281664
926,AGRICULTURE,0.259313
1339,AUTOMOBILE,0.258267
1762,ENGINEERING,0.248843
1303,DIGITAL-MEDIA,0.154916
1040,SALES,0.148583
1091,SALES,0.148164
1142,CONSULTANT,0.146432
331,INFORMATION-TECHNOLOGY,0.143145
243,INFORMATION-TECHNOLOGY,0.116512


In [10]:
job_skills = set(job_description.lower().split())

def missing_skills(resume):
    resume_words = set(resume.split())
    return list(job_skills - resume_words)

df['Missing_Skills'] = df['clean_resume'].apply(missing_skills)

df[['Category', 'Match_Score', 'Missing_Skills']].head()


Unnamed: 0,Category,Match_Score,Missing_Skills
0,HR,0.042243,"[python,, scikit-learn, statistics,, sql,, loo..."
1,HR,0.000584,"[python,, scikit-learn, statistics,, sql,, dat..."
2,HR,0.000882,"[python,, scikit-learn, statistics,, sql,, dat..."
3,HR,0.002873,"[python,, scikit-learn, statistics,, sql,, dat..."
4,HR,0.005107,"[python,, scikit-learn, statistics,, sql,, loo..."


Business Impact:
This resume screening system helps recruiters automatically score and rank
candidates based on job relevance. It reduces manual resume screening time,
highlights missing skills, and improves hiring efficiency and consistency.
