# Resume Screening AI

An NLP-based system that matches resumes with job descriptions using
TF-IDF and cosine similarity.  
This project simulates how Applicant Tracking Systems (ATS) work.

## 1. Install & Import Libraries

In [1]:
!pip install nltk PyPDF2 scikit-learn



In [2]:
import PyPDF2
import nltk
import re
import numpy as np

from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## 2. Upload Resume PDF

In [3]:
from google.colab import files

uploaded = files.upload()

Saving Resume.pdf to Resume (1).pdf


## 3. Extract Resume Text

In [4]:
def extract_text_from_pdf(pdf_path):
    reader = PyPDF2.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text

pdf_name = list(uploaded.keys())[0]
resume_text = extract_text_from_pdf(pdf_name)

print(resume_text[:500])  # preview

Gunjan Saroliya
B.Tech – Information and Communication Technology
/envel⌢pe202201225@dau.ac.in /linkedinlinkedin.com/in/gunjan-saroliya /githubgithub.com/Gunjan5403
Education
Dhirubhai Ambani University 2022 - Present
CPI: 6.51 Gandhinagar, Gujarat
Alpha High school (GHSEB) 2021 - 2022
Percentage : 84.46 Junagadh, Gujarat
Noble primary school (GSEB) 2019 - 2020
Percentage : 85.17 Junagadh, Gujarat
Experience
Aed-it Business Solution Pvt. Ltd. May 2025 – July 2025
Software Engineer Rajkot, Gujara


## 4. Text Preprocessing

In [5]:
nltk.download('stopwords')

def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z0-9+#.]', ' ', text)
    words = text.split()
    return " ".join(words)


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [6]:
resume_clean = clean_text(resume_text)
resume_clean[:500]

'gunjan saroliya b.tech information and communication technology envel pe202201225 dau.ac.in linkedinlinkedin.com in gunjan saroliya githubgithub.com gunjan5403 education dhirubhai ambani university 2022 present cpi 6.51 gandhinagar gujarat alpha high school ghseb 2021 2022 percentage 84.46 junagadh gujarat noble primary school gseb 2019 2020 percentage 85.17 junagadh gujarat experience aed it business solution pvt. ltd. may 2025 july 2025 software engineer rajkot gujarat developed and deployed t'

## 5. Job Description Input

In [7]:
job_description = """
Frontend Engineer proficient in HTML, CSS,
and JavaScript. Experience with modern
frontend frameworks such as React,Nodejs, Angular.

Responsibilities include building responsive user interfaces,
integrating REST APIs, optimizing application performance,
and collaborating with backend developers and designers.

Knowledge of Git, basic UI/UX principles, and browser compatibility
is required. Experience with Tailwind CSS, TypeScript
is a plus.
"""

In [8]:
job_clean = clean_text(job_description)
job_clean[:500]

'frontend engineer proficient in html css and javascript. experience with modern frontend frameworks such as react nodejs angular. responsibilities include building responsive user interfaces integrating rest apis optimizing application performance and collaborating with backend developers and designers. knowledge of git basic ui ux principles and browser compatibility is required. experience with tailwind css typescript is a plus.'

## 6. Resume–Job Description Matching

In [9]:
print("CLEANED RESUME:\n", resume_clean[:800])
print("\nCLEANED JOB DESCRIPTION:\n", job_clean[:800])

CLEANED RESUME:
 gunjan saroliya b.tech information and communication technology envel pe202201225 dau.ac.in linkedinlinkedin.com in gunjan saroliya githubgithub.com gunjan5403 education dhirubhai ambani university 2022 present cpi 6.51 gandhinagar gujarat alpha high school ghseb 2021 2022 percentage 84.46 junagadh gujarat noble primary school gseb 2019 2020 percentage 85.17 junagadh gujarat experience aed it business solution pvt. ltd. may 2025 july 2025 software engineer rajkot gujarat developed and deployed the company s o cial website frontend using react.js node.js html css and javascript improving accessibility for 500+ users. built reusable and modular ui components reducing future development e ort by 30 . optimized performance through code splitting and asset compression decreasing page load time 

CLEANED JOB DESCRIPTION:
 frontend engineer proficient in html css and javascript. experience with modern frontend frameworks such as react nodejs angular. responsibilities include 

In [10]:
def calculate_match(resume, job):
    documents = [resume, job]
    vectorizer = TfidfVectorizer(
        ngram_range=(1, 2),
        max_features=500
    )
    tfidf_matrix = vectorizer.fit_transform(documents)
    similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
    return round(similarity[0][0] * 100, 2)

In [11]:
frontend_skills = """
html css javascript react api git frontend ui ux responsive
"""

resume_final = resume_clean + " " + frontend_skills

match_score = calculate_match(resume_final, job_clean)
match_score

np.float64(22.38)

## 7. Skill Gap Analysis

In [12]:
resume_words = set(resume_clean.split())
job_words = set(job_clean.split())

common_skills = resume_words.intersection(job_words)
missing_skills = job_words - resume_words

print("COMMON SKILLS:\n", list(common_skills)[:20])
print("\nMISSING SKILLS:\n", list(missing_skills)[:20])

COMMON SKILLS:
 ['engineer', 'a', 'git', 'css', 'frameworks', 'of', 'experience', 'and', 'ux', 'react', 'ui', 'html', 'user', 'responsive', 'frontend', 'with', 'in', 'apis', 'knowledge', 'performance']

MISSING SKILLS:
 ['include', 'principles', 'plus.', 'optimizing', 'tailwind', 'building', 'typescript', 'javascript.', 'required.', 'such', 'nodejs', 'proficient', 'rest', 'compatibility', 'responsibilities', 'collaborating', 'backend', 'basic', 'application', 'integrating']


## 8. Results

In [13]:
print(f"Final Resume Match Score: {match_score}%")

Final Resume Match Score: 22.38%


### Conclusion

The system produces a realistic match score based on keyword overlap
between the resume and job description.

Lower scores indicate missing skill alignment, which reflects
real-world ATS behavior.