# Job Applications Recommendation System

Let me step through the demo of machine learning model to recommend job applications to job applicants. For the purpose of this demo, let us use publicly [Australian job listings data from Seek job board](https://www.kaggle.com/PromptCloudHQ/australian-job-listings-data-from-seek-job-board/downloads/australian-job-listings-data-from-seek-job-board.zip/1) dataset.

Download data from above link and then load it onto path where it is accessible to notebook. 

Now we perform exploratory data analysis on the dataset.

In [1]:
!unzip australian-job-listings-data-from-seek-job-board.zip
!chmod 755 seek_australia_sample.csv

Archive:  australian-job-listings-data-from-seek-job-board.zip
  inflating: seek_australia_sample.csv  


In [2]:
# Generic imports
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('seek_australia_sample.csv', encoding='latin1')
print(f'Our data set has {df.shape[0]} records and {df.shape[1]} features or columns.')

# Identify initial records in the data
df.head()

Our data set has 20030 records and 13 features or columns.


Unnamed: 0,pageurl,crawl_timestamp,job_title,category,company_name,city,post_date,job_description,job_type,job_board,geo,state,salary_offered
0,https://www.seek.com.au/job/36028685,2018-04-20 03:57:04 +0000,Package Lead - Pipeline Installation,"Mining, Resources & Energy",FIRCROFT AUSTRALIA PTY LTD,Perth,2018-04-19T05:41:52Z,The Role: General Execution Accountable for sa...,Contract/Temp,seek,AU,,
1,https://www.seek.com.au/job/36028693,2018-04-20 03:52:46 +0000,Department Manager - Bakery - Campbelltown Region,Retail & Consumer Products,Coles,Sydney,2018-04-19T05:42:19Z,The role As a Coles Bakery Manager you will: w...,Full Time,seek,AU,South West & M5 Corridor,
2,https://www.seek.com.au/job/36027858,2018-04-20 04:12:01 +0000,Freight Handler,"Manufacturing, Transport & Logistics",Zoom Recruitment & Training,Sydney,2018-04-19T04:51:51Z,Our client is a leader within the Transport / ...,Casual/Vacation,seek,AU,Parramatta & Western Suburbs,$34 - $39 p.h.
3,https://www.seek.com.au/job/36028687,2018-04-20 03:51:23 +0000,Warehouse Assistant,"Manufacturing, Transport & Logistics",Private Advertiser,Bundaberg & Wide Bay Burnett,2018-04-19T05:42:02Z,One of our Clients is looking for a Warehouse ...,Full Time,seek,AU,,
4,https://www.seek.com.au/job/36026414,2018-04-20 04:14:42 +0000,HR Truck Subcontractors,"Manufacturing, Transport & Logistics",Sands Fridge Lines,Perth,2018-04-19T03:22:40Z,Sands Fridge Lines WA is seeking Subcontractor...,Contract/Temp,seek,AU,,


In [4]:
print('Checking the data consistency')
df.isnull().sum()

Checking the data consistency


pageurl                0
crawl_timestamp        0
job_title              0
category               0
company_name           0
city                   0
post_date              0
job_description    15305
job_type               0
job_board              0
geo                    0
state               6544
salary_offered     14286
dtype: int64

From the above output it appears most of the features are clean but **job_description** feature which is key for building our recommendation model is having lot of empty or null values.

For the sake of this demo we shall proceed with using the same data. But when we build our actual recommendation system, we need to ensure this key field/feature information is captured for each job listing.

**salary_offered** and **state** are also missing many records, so let us remove these feature information completely. 

In [5]:
df.drop(columns=['state','salary_offered'], inplace=True)
df.dropna(inplace=True)

In [6]:
print(f'After removing empty records our data set has {df.shape[0]} records and {df.shape[1]} features or columns.')
df.isnull().sum()

After removing empty records our data set has 4725 records and 11 features or columns.


pageurl            0
crawl_timestamp    0
job_title          0
category           0
company_name       0
city               0
post_date          0
job_description    0
job_type           0
job_board          0
geo                0
dtype: int64

### Jaccard Similarity:

For building our recommendation system, we shall compare the job posting with job applicants summary or skill set uploaded. For this we use commonly used text similarity metric **Jaccard Similarity**. 

Jaccard similarity or intersection over union is defined as size of intersection divided by size of union of two sets. This is especially useful since job description and job applicants summary both can have repeated words.

### Text processing using NLTK

Before we run Jaccard similarity on our data we have to further clean up our text data.

Cleaning of text data is done with the help of Natural Language Tool Kit(NLTK) library.

In [7]:
!pip install --upgrade pip
!pip install -U nltk

import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

Requirement already up-to-date: pip in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (19.2.3)
Requirement already up-to-date: nltk in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (3.4.5)


[nltk_data] Downloading package punkt to /home/ec2-user/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/ec2-user/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [8]:
import string
table = str.maketrans('','', string.punctuation)

from nltk.tokenize import word_tokenize # Word Tokenizer

from nltk.corpus import stopwords
stop_words = stopwords.words('english')
stop_words = set(stop_words)


from nltk.stem.wordnet import WordNetLemmatizer # Word Lemmatizer
lemmatizer = WordNetLemmatizer()

def clean_text(text):
    """
    Cleaning the document before vectorization.
    """
    # Tokenize by word
    tokens = word_tokenize(text)
    # Make all words lowercase
    lowercase_tokens = [w.lower() for w in tokens]
    # Strip punctuation from within words
    no_punctuation = [x.translate(table) for x in lowercase_tokens]
    # Remove words that aren't alphabetic
    alphabetic = [word for word in no_punctuation if word.isalpha()]
    # Remove stopwords
    no_stop_words = [w for w in alphabetic if not w in stop_words]
    # Lemmatize words
    lemmas = [lemmatizer.lemmatize(word) for word in no_stop_words]
    return lemmas

# Clean up the text
df['cleaned_text'] = df.job_description.apply(clean_text)

### Job Applicants Input

Below cell has sample user information which shall be used as input to the recommendation system.

Since job descriptions can overlap across different job titles, we can request user to input specific title which he/she is looking for.

In [9]:
# 1st Sample User Information
user_title = "Business Analyst"
user_info_summary = """Detail-oriented and proactive Business Analyst with a history of involvement in IT, supply chain and CRM projects.

Always positive, team-focused and actively striving to build my domain knowledge and technical skills to deliver successful outcomes in complex project environments.

BA techniques and underlying competencies include:
Requirements elicitation and documentation
User stories, use cases, feature mapping
Process modelling as-is & to-be
Scope analysis
Stakeholder management
Interviews
Requirements traceability
Data mapping
System testing
Business integration
User guides & training
Problem solving
Creative thinking
Teamwork
Excellent written and oral communication
Facilitation
Leadership
Mentoring (IBL students)"""

# Clean up the user input
cleaned_user_summary = clean_text(user_info_summary)

In [10]:
def get_jaccard_sim(str1, str2):
    a = set(str1) 
    b = set(str2)
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

In [11]:
df_match_by_title = df[df['job_title']==user_title].copy()
df_match_by_title['jaccard_sim_value'] = df_match_by_title.cleaned_text.apply(get_jaccard_sim, args=(cleaned_user_summary,))

sort_by_jaccard_sim = df_match_by_title.sort_values('jaccard_sim_value', ascending=False)
sort_by_jaccard_sim.head(5)

Unnamed: 0,pageurl,crawl_timestamp,job_title,category,company_name,city,post_date,job_description,job_type,job_board,geo,cleaned_text,jaccard_sim_value
4085,https://www.seek.com.au/job/36033394,2018-04-20 02:24:15 +0000,Business Analyst,Information & Communication Technology,Ignite,ACT,2018-04-19T23:55:07Z,This Job Profile Business Analysts to assist w...,Contract/Temp,seek,AU,"[job, profile, business, analyst, assist, busi...",0.161538
3972,https://www.seek.com.au/job/36032382,2018-04-20 02:38:44 +0000,Business Analyst,Information & Communication Technology,Compas Pty Ltd,ACT,2018-04-19T22:30:08Z,Are you a Business Analyst with proven experie...,Contract/Temp,seek,AU,"[business, analyst, proven, experience, busine...",0.13986
3515,https://www.seek.com.au/job/36034158,2018-04-20 02:15:14 +0000,Business Analyst,Information & Communication Technology,Talent  Winner Seek Large Recruitment Agency...,ACT,2018-04-20T00:44:07Z,A large government department is seeking to en...,Contract/Temp,seek,AU,"[large, government, department, seeking, engag...",0.132075
3410,https://www.seek.com.au/job/36029143,2018-04-20 03:31:55 +0000,Business Analyst,Accounting,HRMatrix Pty Ltd,Perth,2018-04-19T06:06:44Z,The Company The CFO of this established ASX li...,Contract/Temp,seek,AU,"[company, cfo, established, asx, listed, busin...",0.104972
18145,https://www.seek.co.nz/job/35915713,2018-04-06 07:48:15 +0000,Business Analyst,Accounting,Hudson New Zealand,Auckland,2018-04-05T22:33:00Z,The company Our client is an iconic charitable...,Full Time,seek,NZ,"[company, client, iconic, charitable, organisa...",0.092593


Based on Job applicant's input, we can make our model more specific.

Here we user has input preferred job type in addition to job title. This shall be used as input to our model in filtering only relevant jobs.

In [12]:
# 2nd  User Information
user_preferred_job_type = "Full Time"
user_title = "Senior Accountant"
user_info_summary = """Seasoned finance professional with 7 years of accelerating career in management finance and accounting and analytical roles:
Proficient in overall financial management & analysis, revenue recognition, accrual accounting, Statutory audit, US GAAP reporting, SOX compliance, intercompany consolidation, risk mitigation, fiscal planning, budgeting and reporting, tax strategies;

Hold a Certification in Accounting from California, Bachelors degree and hands on with , NetSuite ERP, SAP Finance, Xero, MYOB and QuickBooks;

Experience working for listed multinational companies with internal controls in various geographic locations brings immense cultural exposure, leadership and a charismatic personality. """
# Clean up the user input
cleaned_user_summary = clean_text(user_info_summary)

In [13]:
df_match = df[(df['job_title'] == user_title) & (df['job_type']==user_preferred_job_type)].copy()
df_match['jaccard_sim_value'] = df_match.cleaned_text.apply(get_jaccard_sim, args=(cleaned_user_summary,))

sort_by_jaccard_sim = df_match.sort_values('jaccard_sim_value', ascending=False)
sort_by_jaccard_sim.head(5)

Unnamed: 0,pageurl,crawl_timestamp,job_title,category,company_name,city,post_date,job_description,job_type,job_board,geo,cleaned_text,jaccard_sim_value
18268,https://www.seek.co.nz/job/35911481,2018-04-06 08:12:28 +0000,Senior Accountant,Accounting,Private Advertiser,Auckland,2018-04-05T05:26:45Z,The Role Management and reviewing work of a sm...,Full Time,seek,NZ,"[role, management, reviewing, work, small, tea...",0.083916
3880,https://www.seek.com.au/job/36032379,2018-04-20 02:55:23 +0000,Senior Accountant,Accounting,Hays Accountancy & Finance,Melbourne,2018-04-19T22:29:45Z,Opportunity for Senior Accountant available at...,Full Time,seek,AU,"[opportunity, senior, accountant, available, h...",0.076923
3198,https://www.seek.com.au/job/36030624,2018-04-20 03:24:34 +0000,Senior Accountant,Accounting,Hays Accountancy & Finance,Melbourne,2018-04-19T07:39:25Z,This progressive multi partner firm is seeking...,Full Time,seek,AU,"[progressive, multi, partner, firm, seeking, a...",0.069519
670,https://www.seek.com.au/job/36026095,2018-04-20 04:27:12 +0000,Senior Accountant,Accounting,Hays Accountancy & Finance,Melbourne,2018-04-19T02:54:43Z,Excellent opportunity to work with ex mid-tier...,Full Time,seek,AU,"[excellent, opportunity, work, ex, midtier, pa...",0.066298
17579,https://www.seek.com.au/job/36167489,2018-05-09 04:34:15 +0000,Senior Accountant,Accounting,Ambition Finance,Sydney,2018-05-08T01:44:37Z,My client is one of the largest players in the...,Full Time,seek,AU,"[client, one, largest, player, field, especial...",0.06383


### Demo Conclusion

Similarly, we can add more filters in the job recommendations.

If job applicant has provided feedback on the earlier job recommendations provided, we can use the user input for building recommendation based on collaborative filtering too which would be subsequent step based on available dataset.