# Smart reume screener 

This notebook looks into using different data science libraries in an attempt to build an AI that takes a resume and a job description, and tells how well the resume fits the job. 

I am going to take this approach:
1. Define the problem
2. Data collection and preparation
3. Feature Extraction (Encoding Text)
4. Similarity + Scoring
5. Visualization & Insights
6. UI / App
7. Wrap-Up & GitHub


## 1. Define the problem:
in a statement:
> Given a resume for a candidate, can we know how he gets along with the job description?

## 2. Data:
- The job description data come from Job Dataset in Kaggle at the description below:
<br>https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset/data</br>
- While the Resumes data come from Resume Dataset in Kaggle at the description below:
<br>https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset/data</br>


# 3. Feature Extraction (Encoding Text):
- Convert each resume and job description to a vector <b>(using TF-IDF or BERT)</b>
- Store vectors in memory or file

# 4. Similarity + Scoring
- Use <b>cosine similarity</b> to compare vectors.
- Output is a match score between 0 and 1.

- Match:

    - 1 job → many resumes
    - (Optionally) many jobs → many resumes

# 5. Visualization & Insights
- Sort resumes by similarity score
- Use matplotlib or seaborn to:
    - show top matching resumes
    - Highlight matched vs. missing skills
- show a simple chart of the top 5 candidates

# 6. UI/APP:
- Use <b>Streamlit</b> to create a simple interface:
 - Upload a resume
 - Upload a job description
 - Show the match score and highlights. 

# Data features
- For the job description dataset:
    - Job Id: A unique identifier for each job posting.
    - Experience: The required or preferred years of experience for the job.
    - Qualifications: The educational qualifications needed for the job.
    - Salary Range: The range of salaries or compensation offered for the position.
    - Location: The city or area where the job is located.
    - Country: The country where the job is located.
    - Latitude: The latitude coordinate of the job location.
    - Longitude: The longitude coordinate of the job location.
    - Work Type: The type of employment (e.g., full-time, part-time, contract).
    - Company Size: The approximate size or scale of the hiring company.
    - Job Posting Date: The date when the job posting was made public.
    - Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)
    - Contact Person: The name of the contact person or recruiter for the job.
    - Contact: Contact information for job inquiries.
    - Job Title: The job title or position being advertised.
    - Role: The role or category of the job (e.g., software developer, marketing manager).
    - Job Portal: The platform or website where the job was posted.
    - Job Description: A detailed description of the job responsibilities and requirements.
    - Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).
    - Skills: The skills or qualifications required for the job.
    - Responsibilities: Specific responsibilities and duties associated with the job.
    - Company Name: The name of the hiring company.
    - Company Profile: A brief overview of the company's background and mission.

- For the resume dataset:
    - ID: Unique identifier and file name for the respective pdf.
    - Resume_str : Contains the resume text only in string format.
    - Resume_html : Contains the resume data in html format as present while web scrapping.
    - Category : Category of the job the resume was used to apply.


## import the essential libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Collection and prepration:

## import datasets:

In [4]:
df_resumes = pd.read_csv("Resume.csv")

In [5]:
df_resumes.head()

Unnamed: 0,ID,Resume_str,Resume_html,Category
0,16852973,HR ADMINISTRATOR/MARKETING ASSOCIATE\...,"<div class=""fontsize fontface vmargins hmargin...",HR
1,22323967,"HR SPECIALIST, US HR OPERATIONS ...","<div class=""fontsize fontface vmargins hmargin...",HR
2,33176873,HR DIRECTOR Summary Over 2...,"<div class=""fontsize fontface vmargins hmargin...",HR
3,27018550,HR SPECIALIST Summary Dedica...,"<div class=""fontsize fontface vmargins hmargin...",HR
4,17812897,HR MANAGER Skill Highlights ...,"<div class=""fontsize fontface vmargins hmargin...",HR


In [6]:
df_job_descripation= pd.read_csv("job_descriptions.csv")

In [7]:
df_job_descripation.head()

Unnamed: 0,Job Id,Experience,Qualifications,Salary Range,location,Country,latitude,longitude,Work Type,Company Size,...,Contact,Job Title,Role,Job Portal,Job Description,Benefits,skills,Responsibilities,Company,Company Profile
0,1089843540111562,5 to 15 Years,M.Tech,$59K-$99K,Douglas,Isle of Man,54.2361,-4.5481,Intern,26801,...,001-381-930-7517x737,Digital Marketing Specialist,Social Media Manager,Snagajob,Social Media Managers oversee an organizations...,"{'Flexible Spending Accounts (FSAs), Relocatio...","Social media platforms (e.g., Facebook, Twitte...","Manage and grow social media accounts, create ...",Icahn Enterprises,"{""Sector"":""Diversified"",""Industry"":""Diversifie..."
1,398454096642776,2 to 12 Years,BCA,$56K-$116K,Ashgabat,Turkmenistan,38.9697,59.5563,Intern,100340,...,461-509-4216,Web Developer,Frontend Web Developer,Idealist,Frontend Web Developers design and implement u...,"{'Health Insurance, Retirement Plans, Paid Tim...","HTML, CSS, JavaScript Frontend frameworks (e.g...","Design and code user interfaces for websites, ...",PNC Financial Services Group,"{""Sector"":""Financial Services"",""Industry"":""Com..."
2,481640072963533,0 to 12 Years,PhD,$61K-$104K,Macao,"Macao SAR, China",22.1987,113.5439,Temporary,84525,...,9687619505,Operations Manager,Quality Control Manager,Jobs2Careers,Quality Control Managers establish and enforce...,"{'Legal Assistance, Bonuses and Incentive Prog...",Quality control processes and methodologies St...,Establish and enforce quality control standard...,United Services Automobile Assn.,"{""Sector"":""Insurance"",""Industry"":""Insurance: P..."
3,688192671473044,4 to 11 Years,PhD,$65K-$91K,Porto-Novo,Benin,9.3077,2.3158,Full-Time,129896,...,+1-820-643-5431x47576,Network Engineer,Wireless Network Engineer,FlexJobs,"Wireless Network Engineers design, implement, ...","{'Transportation Benefits, Professional Develo...",Wireless network design and architecture Wi-Fi...,"Design, configure, and optimize wireless netwo...",Hess,"{""Sector"":""Energy"",""Industry"":""Mining, Crude-O..."
4,117057806156508,1 to 12 Years,MBA,$64K-$87K,Santiago,Chile,-35.6751,-71.5429,Intern,53944,...,343.975.4702x9340,Event Manager,Conference Manager,Jobs2Careers,A Conference Manager coordinates and manages c...,"{'Flexible Spending Accounts (FSAs), Relocatio...",Event planning Conference logistics Budget man...,Specialize in conference and convention planni...,Cairn Energy,"{""Sector"":""Energy"",""Industry"":""Energy - Oil & ..."


In [10]:
df_resumes.describe()

Unnamed: 0,ID
count,2484.0
mean,31826160.0
std,21457350.0
min,3547447.0
25%,17544300.0
50%,25210310.0
75%,36114440.0
max,99806120.0


In [19]:
np.sum(pd.isna(df_resumes),axis=0)

ID             0
Resume_str     0
Resume_html    0
Category       0
dtype: int64

In [23]:
df_job_descripation["Company Size"].describe()

count    1.615940e+06
mean     7.370467e+04
std      3.529886e+04
min      1.264600e+04
25%      4.311400e+04
50%      7.363300e+04
75%      1.043000e+05
max      1.348340e+05
Name: Company Size, dtype: float64