Retrieving top 'n' resumes for a given job description

In [64]:
import pandas as pd
import numpy as np

import joblib

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

from nltk.corpus import stopwords

from preprocess import preprocess

In [65]:
# Loading vectorizer and models
vectorizer = joblib.load('vectorizer.joblib')   
linear_SVC = joblib.load('model.joblib')

In [66]:
# Job Description
job_desc = []
with open('JobDescriptions\job-description-DataScientist.txt',"r", encoding='UTF8', errors='ignore') as jdf:
    job_desc.append(jdf.read())
job_desc = pd.DataFrame(job_desc, columns=['Content'])

preprocessed_job_desc = preprocess(job_desc.loc[[0], 'Content'])    # Preprocessing job description
job_desc = vectorizer.transform(preprocessed_job_desc)              # Vectorizing

job_category = linear_SVC.predict(job_desc)[0]                      # Predicting the category
print(f'Job description is for {job_category} category')

Job description is for Data Science category


In [67]:
resume_dataset = pd.read_csv('Labelled_Resume_Dataset.csv')
resume_dataset

Unnamed: 0,Category,Resume
0,Data Science,Skills * Programming Languages: Python (pandas...
1,Data Science,Education Details \nMay 2013 to May 2017 B.E ...
2,Data Science,"Areas of Interest Deep Learning, Control Syste..."
3,Data Science,Skills â¢ R â¢ Python â¢ SAP HANA â¢ Table...
4,Data Science,"Education Details \n MCA YMCAUST, Faridabad..."
...,...,...
957,Testing,Computer Skills: â¢ Proficient in MS office (...
958,Testing,â Willingness to accept the challenges. â ...
959,Testing,"PERSONAL SKILLS â¢ Quick learner, â¢ Eagerne..."
960,Testing,COMPUTER SKILLS & SOFTWARE KNOWLEDGE MS-Power ...


We can use classify.ipynb to label an unlabelled dataset and then perform the following tasks


In [68]:
# Retrieving all rows with predicted category
predicted_category_resume = pd.DataFrame(resume_dataset[resume_dataset['Category'] == job_category]['Resume'], columns=['Resume'])  
predicted_category_resume.reset_index(inplace=True)
predicted_category_resume

Unnamed: 0,index,Resume
0,0,Skills * Programming Languages: Python (pandas...
1,1,Education Details \nMay 2013 to May 2017 B.E ...
2,2,"Areas of Interest Deep Learning, Control Syste..."
3,3,Skills â¢ R â¢ Python â¢ SAP HANA â¢ Table...
4,4,"Education Details \n MCA YMCAUST, Faridabad..."
5,5,"SKILLS C Basics, IOT, Python, MATLAB, Data Sci..."
6,6,Skills â¢ Python â¢ Tableau â¢ Data Visuali...
7,7,Education Details \n B.Tech Rayat and Bahra ...
8,8,Personal Skills â¢ Ability to quickly grasp t...
9,9,Expertise â Data and Quantitative Analysis â...


In [69]:
preprocessed_resumes = preprocess(predicted_category_resume['Resume'])  # Preprocessing retrieved resumes
preprocessed_resumes.append(preprocessed_job_desc[0])                   # appending preprocessed job description

vectorizer = TfidfVectorizer(stop_words=stopwords.words('english'))     # Creating a clean vectorizer
vectorized_resumes = vectorizer.fit_transform(preprocessed_resumes)     # Vectorizing resumes and jd

In [70]:
cosine_sim = cosine_similarity(vectorized_resumes[-1], vectorized_resumes).flatten()    # Getting cosine similarity
cosine_sim = np.delete(cosine_sim, -1)  # Deleting cosine similarity of job description

In [74]:
data = []
num_of_resumes = 10     # Top n relevant resumes

# Iterating over dataframe and appending a tuple with cosine similarities, index and resume data
for idx, row in predicted_category_resume.iterrows():
    data.append((cosine_sim[idx]*100, idx, row['Resume']))

data.sort(reverse=True)     # Sorting according to cosine similarity
for idx, record in enumerate(data[:(num_of_resumes+1)]):    # Getting n relevant resumes
    print(record[2], end = '\n\n\t\t\t\t\t\t\t++++++++\n\n')
    with open(f'Resumes\{str(idx)}_resume.txt', 'w', encoding='UTF8') as f:
        f.write(record[2])

rience - 6 months
RETAIL MARKETING- Exprience - 6 months
SCM- Exprience - 6 months
SQL- Exprience - Less than 1 year months
Deep Learning- Exprience - Less than 1 year months
Machine learning- Exprience - Less than 1 year months
Python- Exprience - Less than 1 year months
R- Exprience - Less than 1 year monthsCompany Details 
company - Deloitte USI
description - The project involved analysing historic deals and coming with insights to optimize future deals.
Role: Was given raw data, carried out end to end analysis and presented insights to client.
Key Responsibilities:
â¢ Extract data from client systems across geographies.
â¢ Understand and build reports in tableau. Infer meaningful insights to optimize prices and find out process blockades.
Technical Environment: R, Tableau.

Industry: Cross Industry
Service Area: Cross Industry - Products
Project Name: Handwriting recognition
Consultant: 3 months.
The project involved taking handwritten images and converting them to digital text i