<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Capstone - Resumes and Job Ads Recommender

# Problem Statement

HR practitioners and/or hiring managers could have been spending too much time trying to sieve through many resumes for shortlisting suitable candidates whom they can contact for interview.
As a job seeker, we may also find ourselves spending so much times looking through plentiful job advertisements which may not be relevant to us.
Wouldn't it be nice if pre-selection can be done which will effectively save time for all of us?

We will be using Natural Language Processing (NLP) and Recommender System. Using cosine similarity, to identify similar job seekers / job advertisements.
Success will be evaluated based on the top recommendations and its features to see how closely matched are the jobs for the job seekers and vice versa.

# Executive Summary

We scrape the website jobspider.com for resumes. In view that api key is not available, we used BeautifulSoup and regex to get the desired information. As we encounter roadblock on the time connection timeout despite introducing bot agent, we limit the job categories to Accounting and Information Technology for this capstone. 
For the job ads, we based it on existing dataset that is available on Kaggle which was used for predicting fake job posting since the features in this dataset has 80% simiarity to the resumes dataset.

As we were cleaning the resumes dataset,we also decide which features will be important for us to have and which are the one to drop. In view that job title, objective, experience and skills are free texts that hold meaningful words for our analysis, we create a new feature and combine all where we then split the text into words, return them to their root form and also remove the stop words. These are performed for the job ads dataset as well.
We use TFidfVectorizer to calculate the weight of the words and not surprisely, words appear in resumes and often found in job ads.

For feature selection, we picked `location`, `education level`, `employment type`, `function`, `job level` as our initial criteria. We then put these features through cosine similarity metric, measuring the consine of the angle the features projected in a multi-dimension space to determine their similarity.

Noting that our dataset comprises of both job ads and resumes, we then need to slice the dataset together with the similarity scores that showcase resumes on the row index and job ads on the columns.

With that, we will be able to filter based on resumes returning top similarity scores for job ads and by tranposing the dataset, we then will also be able to get recommendated resumes based on similarity scores against job ads.


### Contents:
- [Pre-processing](#Pre-processing)
- [Data Dictionary](#Data-Dictionary)
- [Recommender System](#Recommender-System)
- [Conclusion](#Conclusion)
- [Limitation and Future Works](#Limitation-and-Future-Works)

In [1]:
import requests
import time
import pandas as pd
import numpy as np
import random
import matplotlib as plt
import seaborn as sns
import re
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity, pairwise_distances
from IPython.display import HTML, display

%matplotlib inline

  import pandas.util.testing as tm


# Pre-processing

In [2]:
# Import resumes dataset
resumes = pd.read_csv('.\datasets\\resumes.csv')
resumes.head()

Unnamed: 0,date_posted,job_title,industry,state,state_name,resume_href,id,emp_type,availability,desired_wage,...,job_level,will_travel,edu_level,will_reloc,objective,exp,edu,skills,add_info,combine_text
0,2020-02-22,auditor,Accounting,TX,Texas,/job/view-resume-82470.html,82470,contract,02/22/2020,,...,mid_senior,Yes,masters,Yes,Strategist provides clients with more than a d...,"CPA Firm Consultant, Auditgoals.com 2019 - 202...","Master of Business Administration, FinanceBach...",,,auditor strategist provides client decade hand...
1,2020-02-11,senior accountant,Accounting,OH,Ohio,/job/view-resume-82450.html,82450,full-time,2/1/2020,75000.0,...,mid_senior,No,high_school,No,,Accomplished finance and accounting profession...,Three years at University Wisconsin Superior,BudgetsAdvanced ExcelFinancial StatementsStaff...,Software:Microsoft Dynamics GPPrism HRUltiproD...,senior accountant accomplished finance account...
2,2020-01-31,senior/mid-level financial services manager,Accounting,TX,Texas,/job/view-resume-82423.html,82423,full-time,,,...,mid_senior,Yes,bachelors,Yes,Growth-focused and astute executive and influe...,"WELLS FARGO ▪ DALLAS, TX (1994–2019)Senior Vic...",EDUCATIONApplied Business Administration Manag...,Banking Administration | Risk Mitigation |Staf...,,senior mid level financial service manager gro...
3,2020-01-16,brokerage operations,Accounting,NY,New York,/job/view-resume-82397.html,82397,full-time,Immediate,90000.0,...,mid_senior,No,certificate,No,"Analytical, performance-focused, and forward-t...","SOCIÉTÉ GÉNÉRALE – JERSEY CITY, NJVice Preside...",SCS Business and Technical Institute -New York...,Trade Posting and Payment EscalationInvestment...,"EARLIER CAREERMAN FINANCIAL, INC. – NEW YORK, ...",brokerage operation analytical performance foc...
4,2019-12-16,full charge bookkeeper,Accounting,CO,Colorado,/job/view-resume-82338.html,82338,full-time,,35000.0,...,associate,No,bachelors,No,"Dynamic, goal-oriented, and analytical profess...","Echo River Expeditions, Cañon City, COReservat...",Coursework in Accounting and ComputerBessemer ...,"Microsoft Office Applications (Excel, Word, Po...",,full charge bookkeeper dynamic goal oriented a...


In [3]:
# combine id and job_title as 1 for our index
resumes['index'] = 'R' + '_' + resumes['id'].astype(str) + '_' + resumes['job_title']

In [4]:
# Select features that we want to keep
res_df = resumes[['index', 'industry', 'state_name', 'emp_type', 'job_level', 'edu_level']]

# set id as the index
res_df = res_df.set_index('index')

res_df.head()

Unnamed: 0_level_0,industry,state_name,emp_type,job_level,edu_level
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
R_82470_auditor,Accounting,Texas,contract,mid_senior,masters
R_82450_senior accountant,Accounting,Ohio,full-time,mid_senior,high_school
R_82423_senior/mid-level financial services manager,Accounting,Texas,full-time,mid_senior,bachelors
R_82397_ brokerage operations,Accounting,New York,full-time,mid_senior,certificate
R_82338_full charge bookkeeper,Accounting,Colorado,full-time,associate,bachelors


As previously, we just assign number to the state names in accordance to alphabetical order, we get recommendations for location that could be far away from job seeker / job ads preference. Hence, we noq include 1 more feature name region.

In [5]:
# Listed out the states and it's region with reference from
# https://www.worldatlas.com/articles/the-regions-of-the-united-states.html

west = ['Alaska', 'Arizona', 'California', 'Colorado', 'Hawaii', 'Idaho', 'Montana', 'Nevada','New Mexico', 'Oregon',
       'Utah', 'Washington', 'Wyoming']

midwest = ['Illinois', 'Indiana', 'Iowa', 'Kansas', 'Michigan', 'Missouri', 'Minnesota', 'Nebraska', 'North Dakota', 
           'Ohio', 'South Dakota','Wisconsin']

south = ['Alabama', 'Arkansas', 'Delaware', 'Florida', 'Georgia', 'Kentucky', 'Louisiana', 'Maryland', 'Mississippi',
        'Oklahoma', 'North Carolina', 'South Carolina', 'Tennessee', 'Texas', 'Virginia', 'West Virginia', 'WashingtonDC']

northeast = ['Connecticut', 'Maine', 'New Hampshire', 'Massachusetts', 'New Jersey', 'New York', 'Pennsylvania',
            'Rhode Island', 'Vermont']

print('Number of states in the west region: ', len(west))
print('Number of states in the midwest region: ', len(midwest))
print('Number of states in the south region: ', len(south))
print('Number of states in the northeast region: ', len(northeast))

Number of states in the west region:  13
Number of states in the midwest region:  12
Number of states in the south region:  17
Number of states in the northeast region:  9


In [6]:
# To create 1 new feature 'region' for indicating the region of the states
region = []

for s in res_df['state_name']:
    if s in west:
        region.append('west')
    elif s in midwest:
        region.append('midwest')
    elif s in south:
        region.append('south')
    else:
        region.append('northeast')
        
res_df['region'] = region
res_df.head()

Unnamed: 0_level_0,industry,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
R_82470_auditor,Accounting,Texas,contract,mid_senior,masters,south
R_82450_senior accountant,Accounting,Ohio,full-time,mid_senior,high_school,midwest
R_82423_senior/mid-level financial services manager,Accounting,Texas,full-time,mid_senior,bachelors,south
R_82397_ brokerage operations,Accounting,New York,full-time,mid_senior,certificate,northeast
R_82338_full charge bookkeeper,Accounting,Colorado,full-time,associate,bachelors,west


In [7]:
# Import job ads dataset
jobs = pd.read_csv('.\datasets\\job_ads.csv')
print(jobs.shape)
jobs.head()

(792, 19)


Unnamed: 0,job_id,title,state,state_name,department,salary_range,company_profile,description,requirement,benefits,telecommuting,has_company_logo,has_questions,employment_type,required_experience,required_education,industry,function,combine_text
0,9,ASP.net Developer Job opportunity at United St...,NJ,New Jersey,,100000-120000,,DeveloperJob Location :United States-New Jerse...,#URL_86fd830a95a64e2b30ceed829e63fd384c289e4f...,Benefits - FullBonus Eligible - YesInterview T...,0,0,0,full-time,mid_senior,bachelors,Information Technology and Services,Information Technology,asp net developer job opportunity united state...
1,10,"Applications Developer, Digital",CT,Connecticut,,,"Novitex Enterprise Solutions, formerly Pitney ...","The Applications Developer, Digital will devel...",s:4 â 5 yearsâ experience in developing an...,,0,1,0,full-time,associate,bachelors,Management Consulting,Information Technology,application developer digital application deve...
2,38,Technical Project Manager,NY,New York,,,,GBI is a growing company developing several cu...,Must have excellent oral and written communica...,"Experience with CRM, such as SugarCRM.Past emp...",0,0,0,full-time,associate,bachelors,Financial Services,Information Technology,technical project manager gbi growing company ...
3,49,Ruby Automation Engineer & Ruby on Rails Engin...,CA,California,IT,,,"# 1Ruby Automation Engineer Menlo Park, CA # ...",Position # 1Ruby Automation EngineerLocation: ...,,0,0,1,contract,mid_senior,degree,,Information Technology,ruby automation engineer ruby rail engineer 2 ...
4,54,Mid-Senior .NET or Xamarin Developer,GA,Georgia,,75-115,,DataFinch Technologies is the leader in electr...,We are looking for candidates who are generall...,"Health, Vision, Dental, FSA, HSA, 401(k)Privat...",0,0,0,full-time,mid_senior,bachelors,Computer Software,Information Technology,mid senior net xamarin developer datafinch tec...


In [8]:
# combine id and job_title as 1 for our index
jobs['index'] = 'J' + '_' + jobs['job_id'].astype(str) + '_' + jobs['title']

In [9]:
# Select features that we want to keep
jobs_df = jobs[['index', 'function', 'state_name', 'employment_type', 'required_experience', 'required_education']]

# set id as the index
jobs_df = jobs_df.set_index('index')
print(jobs_df.shape)
jobs_df.head()

(792, 5)


Unnamed: 0_level_0,function,state_name,employment_type,required_experience,required_education
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"J_9_ASP.net Developer Job opportunity at United States,New Jersey",Information Technology,New Jersey,full-time,mid_senior,bachelors
"J_10_Applications Developer, Digital",Information Technology,Connecticut,full-time,associate,bachelors
J_38_Technical Project Manager,Information Technology,New York,full-time,associate,bachelors
J_49_Ruby Automation Engineer & Ruby on Rails Engineer - 2 roles,Information Technology,California,contract,mid_senior,degree
J_54_Mid-Senior .NET or Xamarin Developer,Information Technology,Georgia,full-time,mid_senior,bachelors


In [10]:
# Check out the columns for each dataset
print(res_df.columns)
print('---------------------------------------------------------------------------------------')
print(jobs_df.columns)

Index(['industry', 'state_name', 'emp_type', 'job_level', 'edu_level',
       'region'],
      dtype='object')
---------------------------------------------------------------------------------------
Index(['function', 'state_name', 'employment_type', 'required_experience',
       'required_education'],
      dtype='object')


In [11]:
# Rename industry in res_df to function
res_df.rename(columns={'industry':'function'}, inplace=True)

# Rename employment_type, required_experience and required_education in jobs_df to the same as res_df
jobs_df.rename(columns={'employment_type':'emp_type', 'required_experience': 'job_level',
                       'required_education':'edu_level'}, inplace=True)

In [12]:
# Check the revised columns name
print(res_df.columns)
print('---------------------------------------------------------------------------------------')
print(jobs_df.columns)

Index(['function', 'state_name', 'emp_type', 'job_level', 'edu_level',
       'region'],
      dtype='object')
---------------------------------------------------------------------------------------
Index(['function', 'state_name', 'emp_type', 'job_level', 'edu_level'], dtype='object')


In [13]:
# To create 1 new feature 'region' for indicating the region of the states
region = []

for s in jobs_df['state_name']:
    if s in west:
        region.append('west')
    elif s in midwest:
        region.append('midwest')
    elif s in south:
        region.append('south')
    else:
        region.append('northeast')
        
jobs_df['region'] = region
jobs_df.head()

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"J_9_ASP.net Developer Job opportunity at United States,New Jersey",Information Technology,New Jersey,full-time,mid_senior,bachelors,northeast
"J_10_Applications Developer, Digital",Information Technology,Connecticut,full-time,associate,bachelors,northeast
J_38_Technical Project Manager,Information Technology,New York,full-time,associate,bachelors,northeast
J_49_Ruby Automation Engineer & Ruby on Rails Engineer - 2 roles,Information Technology,California,contract,mid_senior,degree,west
J_54_Mid-Senior .NET or Xamarin Developer,Information Technology,Georgia,full-time,mid_senior,bachelors,south


In [14]:
# Append both datasets together
combine = res_df.append(jobs_df, sort=False)

In [15]:
# Check to ensure we append correctly
print('res_df shape: ', res_df.shape)
print('job_df shape: ', jobs_df.shape)
print('combine shape: ', combine.shape)
combine.head()

res_df shape:  (1286, 6)
job_df shape:  (792, 6)
combine shape:  (2078, 6)


Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
R_82470_auditor,Accounting,Texas,contract,mid_senior,masters,south
R_82450_senior accountant,Accounting,Ohio,full-time,mid_senior,high_school,midwest
R_82423_senior/mid-level financial services manager,Accounting,Texas,full-time,mid_senior,bachelors,south
R_82397_ brokerage operations,Accounting,New York,full-time,mid_senior,certificate,northeast
R_82338_full charge bookkeeper,Accounting,Colorado,full-time,associate,bachelors,west


Let's change all categorical features to nominal/ordinal.

In [16]:
# Change function to 1 is Accounting and 5 is Information Technology
combine['function'].replace(to_replace=['Accounting', 'Information Technology'], value=[1, 5], inplace=True)

In [17]:
# Change state_name to be represented by a number
# Check the number of state
print('There are {} unique states in state_name feature'. format(len(set(combine['state_name']))))
print('------------------------------------------------- ')
print(sorted(set(combine['state_name'])))

There are 49 unique states in state_name feature
------------------------------------------------- 
['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Virginia', 'Washington', 'WashingtonDC', 'West Virginia', 'Wisconsin', 'Wyoming']


In [18]:
# Use pd.factorize to assign number to the state and create an nominal variable
combine['state_name'] = pd.factorize(combine['state_name'], sort=True)[0]

In [19]:
# Change emp_type to be represented by a number
# Check the number of emp_type
print('There are {} unique employment type in emp_type feature'. format(len(set(combine['emp_type']))))
print('-------------------------------------------------------- ')
print(set(combine['emp_type']))

There are 5 unique employment type in emp_type feature
-------------------------------------------------------- 
{'full-time', 'part-time', 'internship', 'other', 'contract'}


In [20]:
# Change emp_type to nominal
combine['emp_type'].replace(to_replace=['internship', 'part-time', 'contract', 'full-time', 'other'],
                           value=[1, 2, 3, 4, 5], inplace=True)

In [21]:
# Change job_level to be represented by a number
# Check the number of job_level
print('There are {} unique job level in job_level feature'. format(len(set(combine['job_level']))))
print('--------------------------------------------------- ')
print(set(combine['job_level']))

There are 5 unique job level in job_level feature
--------------------------------------------------- 
{'executive', 'mid_senior', 'internship', 'entry_level', 'associate'}


In [22]:
# Change job_level to ordinal where 1 is internship, 2 is entry_level..... 5 is executive
combine['job_level'].replace(to_replace=['internship', 'entry_level', 'associate', 'mid_senior', 'executive'],
                           value=[1, 2, 3, 4, 5], inplace=True)

In [23]:
# Change edu_level to be represented by a number
# Check the number of edu_level
print('There are {} unique education level in edu_level feature'. format(len(set(combine['edu_level']))))
print('--------------------------------------------------------')
print(set(combine['edu_level']))

There are 7 unique education level in edu_level feature
--------------------------------------------------------
{'college', 'masters', 'bachelors', 'certificate', 'high_school', 'unspecified', 'degree'}


In [24]:
# Change edu_level to ordinal where 1 is unspecified, 2 is high_school, 3 is college..... 7 is masters
combine['edu_level'].replace(to_replace=['unspecified', 'high_school', 'college', 'certificate',
                                        'degree', 'bachelors', 'masters'],
                           value=[1, 2, 3, 4, 5, 6, 7], inplace=True)

In [25]:
# Change region to ordinal where 1 is west, 2 is midwest, 3 is south and 4 is northeast
combine['region'].replace(to_replace=['west', 'midwest', 'south', 'northeast'],
                           value=[1, 2, 3, 4], inplace=True)

In [26]:
# Check out the final output for the changes
combine.head()

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
R_82470_auditor,1,41,3,4,7,3
R_82450_senior accountant,1,33,4,4,2,2
R_82423_senior/mid-level financial services manager,1,41,4,4,6,3
R_82397_ brokerage operations,1,31,4,4,4,4
R_82338_full charge bookkeeper,1,5,4,3,6,1


## Data Dictionary

In [27]:
%%html
<style>
table {float:left}
</style>

| Data Dictionary selected features 	|                                                                                                        	|
|:-----------------------------------	|:--------------------------------------------------------------------------------------------------------	|
| function                          	| Accounting = 1 and Information Technology = 5                                                          	|
| state_name                        	| Assigned number to each state in alphabetical order                                                    	|
| emp_type                          	| internship = 1, part-time = 2, contract = 3, full-time = 4, other = 5                                  	|
| job_level                         	| internship = 1, entry_level = 2, associate = 3, mid_senior = 4, executive = 5                          	|
| edu_level                         	| unspecified = 1, high_school = 2, college = 3, certificate = 4, degree = 5, bachelors = 6, masters = 7 	|
| region                            	| states by region where west = 1, midwest = 2, south = 3, northeast = 4                                 	|

In [28]:
# Get the mean centre of the matrix
def mean_centre(df):
    return(df.T - df.mean(axis=1)).T

combine_mc = mean_centre(combine)

In [29]:
# Get the similarity
sim_matrix = cosine_similarity(combine_mc)
combine_sim = pd.DataFrame(sim_matrix, columns=combine.index, index=combine.index)
combine_sim.head()

index,R_82470_auditor,R_82450_senior accountant,R_82423_senior/mid-level financial services manager,R_82397_ brokerage operations,R_82338_full charge bookkeeper,R_82293_remote bookkeeper or ranch management,R_82266_financial analyst,R_81922_financial analyst,R_81888_full or part time bookkeeping accounting,R_81854_accounting manager,...,J_10240_Payroll Clerk,J_10248_Accounting Clerk($20/hr),J_10257_Data Center Migration App Lead for FULL-TIME Opportunity.,J_10282_ iSeries Team Lead,J_10330_Accounts Payable Clerk,J_10403_Payroll Clerk,J_10412_Sr. Scm Web Development Technical Lead,J_10414_User Support Technician,J_10422_Sr. SQL Server DBA,J_10433_Payroll Accountant
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
R_82470_auditor,1.0,0.989553,0.999157,0.99523,0.492707,0.454494,0.998301,0.31575,0.997344,0.273098,...,0.987488,0.986817,0.994187,0.962191,0.978802,0.986817,0.096804,0.976433,0.209639,0.998922
R_82450_senior accountant,0.989553,1.0,0.994157,0.996556,0.424512,0.514949,0.989081,0.242434,0.992535,0.209369,...,0.996667,0.994473,0.989929,0.982086,0.995816,0.994473,0.024834,0.988677,0.297163,0.992427
R_82423_senior/mid-level financial services manager,0.999157,0.994157,1.0,0.99776,0.481347,0.472953,0.998337,0.30313,0.998959,0.260544,...,0.992249,0.99156,0.994686,0.967635,0.986278,0.99156,0.074363,0.981234,0.22574,0.999494
R_82397_ brokerage operations,0.99523,0.996556,0.99776,1.0,0.444331,0.480659,0.996805,0.263734,0.996529,0.224398,...,0.997551,0.996361,0.990797,0.96893,0.991366,0.996361,0.010247,0.98367,0.221507,0.998288
R_82338_full charge bookkeeper,0.492707,0.424512,0.481347,0.444331,1.0,0.55737,0.502186,0.981155,0.494903,0.951972,...,0.393272,0.393184,0.424671,0.300868,0.403761,0.393184,0.524672,0.324691,-0.231455,0.480128


In [30]:
# Using for and if to filter index that is only resumes
for i in range(len(combine_sim.index)):
    if combine_sim.index[i][0] == 'R':
        index_res = combine_sim.iloc[:i]

# Check the shape and head of the filtered dataframe
print(index_res.shape)
index_res.head()

(1285, 2078)


index,R_82470_auditor,R_82450_senior accountant,R_82423_senior/mid-level financial services manager,R_82397_ brokerage operations,R_82338_full charge bookkeeper,R_82293_remote bookkeeper or ranch management,R_82266_financial analyst,R_81922_financial analyst,R_81888_full or part time bookkeeping accounting,R_81854_accounting manager,...,J_10240_Payroll Clerk,J_10248_Accounting Clerk($20/hr),J_10257_Data Center Migration App Lead for FULL-TIME Opportunity.,J_10282_ iSeries Team Lead,J_10330_Accounts Payable Clerk,J_10403_Payroll Clerk,J_10412_Sr. Scm Web Development Technical Lead,J_10414_User Support Technician,J_10422_Sr. SQL Server DBA,J_10433_Payroll Accountant
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
R_82470_auditor,1.0,0.989553,0.999157,0.99523,0.492707,0.454494,0.998301,0.31575,0.997344,0.273098,...,0.987488,0.986817,0.994187,0.962191,0.978802,0.986817,0.096804,0.976433,0.209639,0.998922
R_82450_senior accountant,0.989553,1.0,0.994157,0.996556,0.424512,0.514949,0.989081,0.242434,0.992535,0.209369,...,0.996667,0.994473,0.989929,0.982086,0.995816,0.994473,0.024834,0.988677,0.297163,0.992427
R_82423_senior/mid-level financial services manager,0.999157,0.994157,1.0,0.99776,0.481347,0.472953,0.998337,0.30313,0.998959,0.260544,...,0.992249,0.99156,0.994686,0.967635,0.986278,0.99156,0.074363,0.981234,0.22574,0.999494
R_82397_ brokerage operations,0.99523,0.996556,0.99776,1.0,0.444331,0.480659,0.996805,0.263734,0.996529,0.224398,...,0.997551,0.996361,0.990797,0.96893,0.991366,0.996361,0.010247,0.98367,0.221507,0.998288
R_82338_full charge bookkeeper,0.492707,0.424512,0.481347,0.444331,1.0,0.55737,0.502186,0.981155,0.494903,0.951972,...,0.393272,0.393184,0.424671,0.300868,0.403761,0.393184,0.524672,0.324691,-0.231455,0.480128


In [31]:
# Now we need to further filter the column of index_res dataframe to list only the jobs
res_jobs = index_res.filter(regex='J_')

# Check the shape and head of the res_jobs dataframe
print(res_jobs.shape)
res_jobs.head()

(1285, 792)


index,"J_9_ASP.net Developer Job opportunity at United States,New Jersey","J_10_Applications Developer, Digital",J_38_Technical Project Manager,J_49_Ruby Automation Engineer & Ruby on Rails Engineer - 2 roles,J_54_Mid-Senior .NET or Xamarin Developer,J_103_Web Developer,"J_128_Manager, Network Engineering",J_147_Service Desk (Help Desk/ Desktop Support) Tier 1/2 - Healthcare IT,J_149_SharePoint Developer and Administrator,J_165_Senior Programme Analyst,...,J_10240_Payroll Clerk,J_10248_Accounting Clerk($20/hr),J_10257_Data Center Migration App Lead for FULL-TIME Opportunity.,J_10282_ iSeries Team Lead,J_10330_Accounts Payable Clerk,J_10403_Payroll Clerk,J_10412_Sr. Scm Web Development Technical Lead,J_10414_User Support Technician,J_10422_Sr. SQL Server DBA,J_10433_Payroll Accountant
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
R_82470_auditor,0.994142,0.575739,0.992046,0.135198,0.900818,-0.289073,0.63209,0.988488,0.993644,0.972873,...,0.987488,0.986817,0.994187,0.962191,0.978802,0.986817,0.096804,0.976433,0.209639,0.998922
R_82450_senior accountant,0.987669,0.481228,0.984821,0.088784,0.858192,-0.193178,0.532939,0.991483,0.989469,0.988268,...,0.996667,0.994473,0.989929,0.982086,0.995816,0.994473,0.024834,0.988677,0.297163,0.992427
R_82423_senior/mid-level financial services manager,0.994062,0.555691,0.992179,0.118238,0.890917,-0.265878,0.6072,0.991001,0.993829,0.977547,...,0.992249,0.99156,0.994686,0.967635,0.986278,0.99156,0.074363,0.981234,0.22574,0.999494
R_82397_ brokerage operations,0.989066,0.510094,0.987108,0.058617,0.858976,-0.270211,0.556507,0.98881,0.989411,0.980051,...,0.997551,0.996361,0.990797,0.96893,0.991366,0.996361,0.010247,0.98367,0.221507,0.998288
R_82338_full charge bookkeeper,0.432772,0.533002,0.429062,0.428746,0.619225,-0.37552,0.623706,0.394779,0.423313,0.31096,...,0.393272,0.393184,0.424671,0.300868,0.403761,0.393184,0.524672,0.324691,-0.231455,0.480128


In [32]:
# Saving a copy of the recommender system
res_jobs.to_csv('./datasets/res_jobs.csv', index=False)

## Recommender System

In [33]:
# Check the job recommendation for job seekers in Accounting function
print('Job seeker id {} is looking for {} job'.format(res_jobs.index[0][2:7], res_jobs.index[0][8:15]))
print('----------------------------------------------')
print('Top 10 job recommendations:')
res_jobs.iloc[0,:].sort_values(ascending=False)[0:10]

Job seeker id 82470 is looking for auditor job
----------------------------------------------
Top 10 job recommendations:


index
J_915_IT- Auditor                                            0.999812
J_6736_Senior Accountant                                     0.999472
J_4871_CPA Accounting Manager / Medical Billing              0.999157
J_701_Accountant / Book Keeper                               0.999058
J_4873_Controller - High Growth Specialty Finance Company    0.999058
J_10433_Payroll Accountant                                   0.998922
J_4193_Corporate Controller                                  0.998922
J_268_Sr. Accountant                                         0.998922
J_2128_Accounts Payable Supervisor                           0.998844
J_6081_Corporate Controller                                  0.998634
Name: R_82470_auditor, dtype: float64

In [34]:
# Using np.where to find the index number of the top 10 positions
print(np.where(jobs_df.index == "J_915_IT- Auditor"))
print(np.where(jobs_df.index == "J_6736_Senior Accountant"))
print(np.where(jobs_df.index == "J_4871_CPA Accounting Manager / Medical Billing"))
print(np.where(jobs_df.index == "J_701_Accountant / Book Keeper"))
print(np.where(jobs_df.index == "J_4873_Controller - High Growth Specialty Finance Company"))
print(np.where(jobs_df.index == "J_2128_Accounts Payable Supervisor"))
print(np.where(jobs_df.index == "J_687_Accountant"))
print(np.where(jobs_df.index == "J_3057_Accounting Manager"))
print(np.where(jobs_df.index == "J_6081_Corporate Controller "))
print(np.where(jobs_df.index == "J_10433_Payroll Accountant"))

(array([68], dtype=int64),)
(array([509], dtype=int64),)
(array([373], dtype=int64),)
(array([58], dtype=int64),)
(array([374], dtype=int64),)
(array([144], dtype=int64),)
(array([55], dtype=int64),)
(array([213], dtype=int64),)
(array([462], dtype=int64),)
(array([791], dtype=int64),)


In [35]:
# Creating the list of jobs into a dataframe
reco_acct = pd.DataFrame(jobs_df.iloc[[68, 509, 373, 58, 374, 144, 55, 213, 462, 791]])
reco_acct

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
J_915_IT- Auditor,Accounting,North Carolina,contract,mid_senior,bachelors,south
J_6736_Senior Accountant,Accounting,Texas,contract,associate,bachelors,south
J_4871_CPA Accounting Manager / Medical Billing,Accounting,Texas,full-time,mid_senior,bachelors,south
J_701_Accountant / Book Keeper,Accounting,Virginia,full-time,mid_senior,bachelors,south
J_4873_Controller - High Growth Specialty Finance Company,Accounting,Virginia,full-time,mid_senior,bachelors,south
J_2128_Accounts Payable Supervisor,Accounting,Oklahoma,full-time,associate,bachelors,south
J_687_Accountant,Accounting,New York,full-time,mid_senior,bachelors,northeast
J_3057_Accounting Manager,Accounting,Maine,full-time,mid_senior,bachelors,northeast
J_6081_Corporate Controller,Accounting,New York,full-time,mid_senior,bachelors,northeast
J_10433_Payroll Accountant,Accounting,Pennsylvania,full-time,mid_senior,bachelors,northeast


In [36]:
# Checking out the features of job seeker 82470
cdd_acct = res_df.head(1)
cdd_acct

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
R_82470_auditor,Accounting,Texas,contract,mid_senior,masters,south


In [37]:
# Putting both dataframe together where 1st row is the job seeker and remaining are recommendations
cdd_acct.append(reco_acct)

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
R_82470_auditor,Accounting,Texas,contract,mid_senior,masters,south
J_915_IT- Auditor,Accounting,North Carolina,contract,mid_senior,bachelors,south
J_6736_Senior Accountant,Accounting,Texas,contract,associate,bachelors,south
J_4871_CPA Accounting Manager / Medical Billing,Accounting,Texas,full-time,mid_senior,bachelors,south
J_701_Accountant / Book Keeper,Accounting,Virginia,full-time,mid_senior,bachelors,south
J_4873_Controller - High Growth Specialty Finance Company,Accounting,Virginia,full-time,mid_senior,bachelors,south
J_2128_Accounts Payable Supervisor,Accounting,Oklahoma,full-time,associate,bachelors,south
J_687_Accountant,Accounting,New York,full-time,mid_senior,bachelors,northeast
J_3057_Accounting Manager,Accounting,Maine,full-time,mid_senior,bachelors,northeast
J_6081_Corporate Controller,Accounting,New York,full-time,mid_senior,bachelors,northeast


In [38]:
# Check the job recommendation for job seekers in Information Technology function
print('Job seeker id {} is looking for {} job'.format(res_jobs.index[1017][2:7], res_jobs.index[1017][8:45]))
print('---------------------------------------------------------------------')
print('Top 10 job recommendations:')
res_jobs.iloc[1017,:].sort_values(ascending=False)[0:10]

Job seeker id 75818 is looking for oracle database administrator job
---------------------------------------------------------------------
Top 10 job recommendations:


index
J_3500_IT Security Engineer, Immediate full time opening at Fortune 500 Co.    0.994085
J_2193_Sr. Java Developer                                                      0.994085
J_1478_IT Software Tester                                                      0.994085
J_3277_Oracle Applications DBA                                                 0.994085
J_8971_Java Integration Engineer                                               0.994085
J_6044_Lead Software Engineer                                                  0.994085
J_6523_Senior Software Engineer                                                0.994085
J_4945_Senior Systems Administrator                                            0.994085
J_2279_Web Developer                                                           0.994085
J_9842_Director, Information Security                                          0.994085
Name: R_75818_oracle database administrator, dtype: float64

In [39]:
# Using np.where to find the index number of the top 10 positions
print(np.where(jobs_df.index == "J_2433_Enterprise Architect"))
print(np.where(jobs_df.index == "J_1787_Software Engineer - IL"))
print(np.where(jobs_df.index == "J_1460_MOBILE FRONT END PROGRAMMERS - VLinks Media"))
print(np.where(jobs_df.index == "J_5653_Junior .NET Software Engineer"))
print(np.where(jobs_df.index == "J_4065_Senior Software Engineer - CloudSpotter Technologies"))
print(np.where(jobs_df.index == "J_666_Javascript Developer - Rippleshot"))
print(np.where(jobs_df.index == "J_6544_Linux System Manager - SaaS"))
print(np.where(jobs_df.index == "J_1677_SAP BI/HANA Managing Consultant"))
print(np.where(jobs_df.index == "J_496_Front-End Web Developer"))
print(np.where(jobs_df.index == "J_4498_Curam Developer"))

(array([165], dtype=int64),)
(array([117], dtype=int64),)
(array([100], dtype=int64),)
(array([440], dtype=int64),)
(array([293], dtype=int64),)
(array([53], dtype=int64),)
(array([491], dtype=int64),)
(array([111], dtype=int64),)
(array([29], dtype=int64),)
(array([333], dtype=int64),)


In [40]:
reco_it = pd.DataFrame(jobs_df.iloc[[165, 117, 100, 440, 293, 53, 491, 111, 29, 333]])
reco_it

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
J_2433_Enterprise Architect,Information Technology,Michigan,contract,mid_senior,masters,midwest
J_1787_Software Engineer - IL,Information Technology,Illinois,full-time,associate,bachelors,midwest
J_1460_MOBILE FRONT END PROGRAMMERS - VLinks Media,Information Technology,Illinois,full-time,associate,bachelors,midwest
J_5653_Junior .NET Software Engineer,Information Technology,Illinois,full-time,associate,bachelors,midwest
J_4065_Senior Software Engineer - CloudSpotter Technologies,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_666_Javascript Developer - Rippleshot,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_6544_Linux System Manager - SaaS,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_1677_SAP BI/HANA Managing Consultant,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_496_Front-End Web Developer,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_4498_Curam Developer,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest


In [41]:
# Checking out the features of job seeker 75818
cdd_it = pd.DataFrame(res_df.iloc[1017]).T
cdd_it

Unnamed: 0,function,state_name,emp_type,job_level,edu_level,region
R_75818_oracle database administrator,Information Technology,Illinois,contract,associate,bachelors,midwest


In [42]:
# Putting both dataframe together where 1st row is the job seeker and remaining are recommendations
cdd_it.append(reco_it)

Unnamed: 0,function,state_name,emp_type,job_level,edu_level,region
R_75818_oracle database administrator,Information Technology,Illinois,contract,associate,bachelors,midwest
J_2433_Enterprise Architect,Information Technology,Michigan,contract,mid_senior,masters,midwest
J_1787_Software Engineer - IL,Information Technology,Illinois,full-time,associate,bachelors,midwest
J_1460_MOBILE FRONT END PROGRAMMERS - VLinks Media,Information Technology,Illinois,full-time,associate,bachelors,midwest
J_5653_Junior .NET Software Engineer,Information Technology,Illinois,full-time,associate,bachelors,midwest
J_4065_Senior Software Engineer - CloudSpotter Technologies,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_666_Javascript Developer - Rippleshot,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_6544_Linux System Manager - SaaS,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_1677_SAP BI/HANA Managing Consultant,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest
J_496_Front-End Web Developer,Information Technology,Illinois,full-time,mid_senior,bachelors,midwest


### Let's check out the candidates for HR/Hiring managers based on job ads

In [43]:
# Let's transpose the dataset
jobs_res = res_jobs.T
jobs_res.head()

index,R_82470_auditor,R_82450_senior accountant,R_82423_senior/mid-level financial services manager,R_82397_ brokerage operations,R_82338_full charge bookkeeper,R_82293_remote bookkeeper or ranch management,R_82266_financial analyst,R_81922_financial analyst,R_81888_full or part time bookkeeping accounting,R_81854_accounting manager,...,R_72754_sql developer,R_72752_network engineer,R_72742_sr java/j2ee developer,R_72741_ms sql bi developer,R_72740_java/j2ee developer,R_72739_biz talk developer,R_72738_oracle pl/sql developer / etl developer,R_72736_qa tester,R_72735_oracle pl/sql developer,"R_72731_vp of it, it director, it project director"
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"J_9_ASP.net Developer Job opportunity at United States,New Jersey",0.994142,0.987669,0.994062,0.989066,0.432772,0.403224,0.988026,0.250924,0.992307,0.200162,...,0.984075,0.996966,0.990922,0.871152,-0.739588,0.992559,0.998305,0.620944,0.997268,0.059801
"J_10_Applications Developer, Digital",0.575739,0.481228,0.555691,0.510094,0.533002,-0.182818,0.555159,0.453821,0.567926,0.307941,...,0.497797,0.568389,0.516356,0.359425,0.0533,0.523304,0.568864,0.125739,0.576363,0.493464
J_38_Technical Project Manager,0.992046,0.984821,0.992179,0.987108,0.429062,0.385679,0.986258,0.247222,0.991662,0.189608,...,0.983669,0.997306,0.98997,0.871328,-0.733843,0.991318,0.997614,0.624764,0.997262,0.054791
J_49_Ruby Automation Engineer & Ruby on Rails Engineer - 2 roles,0.135198,0.088784,0.118238,0.058617,0.428746,0.117647,0.089314,0.433501,0.109516,0.450377,...,0.07508,0.120319,0.108106,0.096374,0.364434,0.127263,0.144808,-0.12643,0.128785,0.952661
J_54_Mid-Senior .NET or Xamarin Developer,0.900818,0.858192,0.890917,0.858976,0.619225,0.331539,0.880931,0.47419,0.890607,0.412491,...,0.841922,0.885819,0.863852,0.712923,-0.422885,0.873316,0.894721,0.391921,0.889172,0.447447


In [44]:
# Check the candidates for job ads in Accounting function
print('Job ad position: {} '.format(jobs_res.index[10][6:26]))
print('---------------------------------------')
print('Top 10 candidates recommendations:')
jobs_res.iloc[10,:].sort_values(ascending=False)[0:10]

Job ad position: Financial Accountant 
---------------------------------------
Top 10 candidates recommendations:


index
R_57471_staff accountant                      1.0
R_67483_data entry/accounting                 1.0
R_59270_invoicing specialist                  1.0
R_71690_bookkeeping/ office administrative    1.0
R_74990_senior staff accountant               1.0
R_75007_accounting assistant                  1.0
R_75465_bookkeeping, AR AP                    1.0
R_65285_senior accountant                     1.0
R_64594_accounts payable                      1.0
R_65175_accountant                            1.0
Name: J_187_Financial Accountant, dtype: float64

In [45]:
# Using np.where to find the index number of the top 10 job seekers
print(np.where(res_df.index == "R_75465_bookkeeping, AR AP"))
print(np.where(res_df.index == "R_67483_data entry/accounting"))
print(np.where(res_df.index == "R_59270_invoicing specialist"))
print(np.where(res_df.index == "R_65175_accountant"))
print(np.where(res_df.index == "R_65285_senior accountant"))
print(np.where(res_df.index == "R_57471_staff accountant"))
print(np.where(res_df.index == "R_74990_senior staff accountant"))
print(np.where(res_df.index == "R_75007_accounting assistant"))
print(np.where(res_df.index == "R_71690_bookkeeping/ office administrative"))
print(np.where(res_df.index == "R_68481_accountant"))

(array([136], dtype=int64),)
(array([265], dtype=int64),)
(array([433], dtype=int64),)
(array([320], dtype=int64),)
(array([317], dtype=int64),)
(array([462], dtype=int64),)
(array([152], dtype=int64),)
(array([151], dtype=int64),)
(array([206], dtype=int64),)
(array([246], dtype=int64),)


In [46]:
rec_acct = pd.DataFrame(res_df.iloc[[136,265, 433, 320, 317, 462, 152, 151, 206, 246]])
rec_acct

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"R_75465_bookkeeping, AR AP",Accounting,New York,full-time,associate,bachelors,northeast
R_67483_data entry/accounting,Accounting,New York,full-time,associate,bachelors,northeast
R_59270_invoicing specialist,Accounting,New York,full-time,associate,bachelors,northeast
R_65175_accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_65285_senior accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_57471_staff accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_74990_senior staff accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_75007_accounting assistant,Accounting,New York,full-time,associate,bachelors,northeast
R_71690_bookkeeping/ office administrative,Accounting,New York,full-time,associate,bachelors,northeast
R_68481_accountant,Accounting,New York,full-time,associate,bachelors,northeast


In [47]:
# Checking out the features of job ads 187
job_acct = pd.DataFrame(jobs_df.iloc[10]).T
job_acct

Unnamed: 0,function,state_name,emp_type,job_level,edu_level,region
J_187_Financial Accountant,Accounting,New York,full-time,associate,bachelors,northeast


In [48]:
# Putting both dataframe together where 1st row is the job ads and remaining are recommendations
job_acct.append(rec_acct)

Unnamed: 0,function,state_name,emp_type,job_level,edu_level,region
J_187_Financial Accountant,Accounting,New York,full-time,associate,bachelors,northeast
"R_75465_bookkeeping, AR AP",Accounting,New York,full-time,associate,bachelors,northeast
R_67483_data entry/accounting,Accounting,New York,full-time,associate,bachelors,northeast
R_59270_invoicing specialist,Accounting,New York,full-time,associate,bachelors,northeast
R_65175_accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_65285_senior accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_57471_staff accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_74990_senior staff accountant,Accounting,New York,full-time,associate,bachelors,northeast
R_75007_accounting assistant,Accounting,New York,full-time,associate,bachelors,northeast
R_71690_bookkeeping/ office administrative,Accounting,New York,full-time,associate,bachelors,northeast


In [49]:
# Check the candidates for job ads in Information Technology function
print('Job ad position: {} '.format(jobs_res.index[2][5:30]))
print('---------------------------------------')
print('Top 10 candidates recommendations:')
jobs_res.iloc[2,:].sort_values(ascending=False)[0:10]

Job ad position: Technical Project Manager 
---------------------------------------
Top 10 candidates recommendations:


index
R_78734_it service desk lead analyst\it service desk manager    1.000000
R_81761_sql server database administrator                       1.000000
R_79802_business system analyst                                 1.000000
R_72842_it support, it technician, systems administrator        1.000000
R_74072_windows systems engineer                                0.999971
R_74244_technical writer/editor                                 0.999971
R_81453_systems administrator                                   0.999971
R_75521_network engineer                                        0.999971
R_74692_business analyst / business systems analyst             0.999971
R_79968_system administrator                                    0.999971
Name: J_38_Technical Project Manager, dtype: float64

In [50]:
# Using np.where to find the index number of the top 10 job seekers
print(np.where(res_df.index == "R_72842_it support, it technician, systems administrator"))
print(np.where(res_df.index == "R_79802_business system analyst"))
print(np.where(res_df.index == "R_78734_it service desk lead analyst\it service desk manager"))
print(np.where(res_df.index == "R_81761_sql server database administrator"))
print(np.where(res_df.index == "R_74072_windows systems engineer"))
print(np.where(res_df.index == "R_74244_technical writer/editor"))
print(np.where(res_df.index == "R_75521_network engineer"))
print(np.where(res_df.index == "R_81453_systems administrator"))
print(np.where(res_df.index == "R_74692_business analyst / business systems analyst"))
print(np.where(res_df.index == "R_79968_system administrator"))

(array([1268], dtype=int64),)
(array([704], dtype=int64),)
(array([787], dtype=int64),)
(array([541], dtype=int64),)
(array([1183], dtype=int64),)
(array([1171], dtype=int64),)
(array([1069], dtype=int64),)
(array([563], dtype=int64),)
(array([1134], dtype=int64),)
(array([684], dtype=int64),)


In [51]:
rec_it = pd.DataFrame(res_df.iloc[[1268, 704, 787, 541, 1183, 1171, 1069, 563, 1134, 684]])
rec_it

Unnamed: 0_level_0,function,state_name,emp_type,job_level,edu_level,region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"R_72842_it support, it technician, systems administrator",Information Technology,New York,full-time,associate,bachelors,northeast
R_79802_business system analyst,Information Technology,New York,full-time,associate,bachelors,northeast
R_78734_it service desk lead analyst\it service desk manager,Information Technology,New York,full-time,associate,bachelors,northeast
R_81761_sql server database administrator,Information Technology,New York,full-time,associate,bachelors,northeast
R_74072_windows systems engineer,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_74244_technical writer/editor,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_75521_network engineer,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_81453_systems administrator,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_74692_business analyst / business systems analyst,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_79968_system administrator,Information Technology,New Jersey,full-time,associate,bachelors,northeast


In [52]:
# Checking out the features of job seeker 38
job_it = pd.DataFrame(jobs_df.iloc[2]).T
job_it

Unnamed: 0,function,state_name,emp_type,job_level,edu_level,region
J_38_Technical Project Manager,Information Technology,New York,full-time,associate,bachelors,northeast


In [53]:
# Putting both dataframe together where 1st row is the job ads and remaining are recommendations
job_it.append(rec_it)

Unnamed: 0,function,state_name,emp_type,job_level,edu_level,region
J_38_Technical Project Manager,Information Technology,New York,full-time,associate,bachelors,northeast
"R_72842_it support, it technician, systems administrator",Information Technology,New York,full-time,associate,bachelors,northeast
R_79802_business system analyst,Information Technology,New York,full-time,associate,bachelors,northeast
R_78734_it service desk lead analyst\it service desk manager,Information Technology,New York,full-time,associate,bachelors,northeast
R_81761_sql server database administrator,Information Technology,New York,full-time,associate,bachelors,northeast
R_74072_windows systems engineer,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_74244_technical writer/editor,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_75521_network engineer,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_81453_systems administrator,Information Technology,New Jersey,full-time,associate,bachelors,northeast
R_74692_business analyst / business systems analyst,Information Technology,New Jersey,full-time,associate,bachelors,northeast


# Conclusion

Study shows that resumes screening remains the most time-consuming part of recruiting process where 52% of Talent Acquisition leaders cited that the hardest part of recruitment is screening candidates from a large applicant pool.
This process takes approximately 23 hours for just one hire and typically 75% to 88% of the resumes from the pool are unqualified.

Likewise for job seekers, we tend to send out tons of application only to hear nothing back in return. In addition, to ensure that the resume stands out, it is strongly recommended to tailor made the application materials to each job application. 

What we are trying achieve with the above model is to be able to take away the time-consuming part of the work for both recruitment process and job seekers.
This recommender system will return best-match profiles / job ads. So for the talent acquisition / job seekers, instead of looking through the large pool of information, they will be able to just focus on the recommendation which will already be matched based on their requirements.

With this recommendations, we plan to introduce a selection function (`Yes` or `No`). Talent acquisition leader / job seekers can have the option to choose `Yes` or `No` where yes to show interest and no refers to not interested.
When both job seeker and talent acquisition leader selected `Yes`, they will be connected to a chatroom to kickstart their conversation.

In [54]:
display(HTML("<table><tr><td><img src='images/app1.png'></td><td><img src='images/app2.png'></td><td><img src='images/app3.png'></td><td><img src='images/app4.png'></td></tr></table>"))

# Limitation and Future Works

**1. Unsupervised learning**\
The current metric is based on unsupervised learning. Upon successful implementation, as we go along, we will be able to collect more data on the success rate of matching or hire based on our recommendation. From there, we can ehance our metric to include supervised learning for better recommendation in the future.

**2. Adapt to local data**\
Currently, both dataset are US based data. Next, we need to explore to scrape local data, otherwise collect data through the app.

**3. Include all functions**\
To include more functions so that the app can cater to wider community.
FOCUS: General workers or student looking for part-time assignment to earn quick bucks. As there are many platforms to increase efficiency on recuitment for white collars / PMETs / Executive. This app aims to match quickly manpower that are needed urgently such as promoters, buffers, etc. Such roles typically don't require long interview process, thus the chat function allows employer to quickly connect to the job seeker for providing work information.

**4. Skill sets**\
As we observed earlier in EDA, job descriptions could consist of not only job description but experience, skills, compensation benefits, etc. The information we needed was the skill sets for better matching.
We need to further clean the data to capture the 'right' skill sets from multiple features.
Additionally, some function generally require more different skills and given the fast pace changing environment, need to take into consideration maintenance work for updating the set of skills.