<h1 style="background-color:#1E90FF; color:white" style="font-family: Cambria" align="center">Resume Screening using NLP+Different ML Algorithms</h1> 

**A summary of resume screening:**

- **1. Definition**: Resume screening is the process of determining whether a candidate is qualified for a role based his or her education, experience, and other information captured on their resume.

- **2. How to screen resumes**: First, screen resumes based on the job’s minimum qualifications. Second, screen resumes based on the job’s preferred qualifications. Third, screen resumes based on the shortlist of candidates you want to move onto the interview phase.

- **3. The challenges recruiters face while screening resumes**: The high volume of resumes received – up to 88% of them are unqualified – greatly increases time to fill. Recruiters face increased pressure to show quality of hire but lack tools to link their resume screening to post-hire metrics.

- **4. Tech innovations in resume screening**: Intelligent resume screening by using AI to learn from historical hiring decisions to improve quality of hire and reduce employee turnover.

<a href="https://ideal.com/resume-screening/#:~:text=Resume%20screening%20is%20the%20process,candidate%20based%20on%20their%20resume" target="_blank" rel="noopener noreferrer">Source</a> 

In this project, machine learning models is developed for the Resume Screening task.

<div class="list-group" id="list-tab" role="tablist">
        <h3 class="list-group-item list-group-item-action active" style="background-color:#1E90FF; color:white" data-toggle="list"  role="tab" aria-controls="home" style="font-family: Cambria">Notebook Content</h3>     
        <a class="list-group-item list-group-item-action list-group-item-info" data-toggle="list" href="#import" role="tab" aria-controls="profile" style="font-family: Cambria">Importing Basic Libraries and Loading Dataset<span class="badge badge-primary badge-pill" style="background-color:#1E90FF; color:white">1</span></a>      
        <a class="list-group-item list-group-item-action list-group-item-info" data-toggle="list" href="#understand" role="tab" aria-controls="profile" style="font-family: Cambria">Understanding Dataset<span class="badge badge-primary badge-pill" style="background-color:#1E90FF; color:white">2</span></a>
        <a class="list-group-item list-group-item-action list-group-item-info" data-toggle="list" href="#prep" role="tab" aria-controls="profile" style="font-family: Cambria">Preprocessing<span class="badge badge-primary badge-pill" style="background-color:#1E90FF; color:white">3</span></a>  
        <a class="list-group-item list-group-item-action list-group-item-info" data-toggle="list" href="#model" role="tab" aria-controls="profile" style="font-family: Cambria">Building Models<span class="badge badge-primary badge-pill" style="background-color:#1E90FF; color:white">4</span></a> 
        <a class="list-group-item list-group-item-action list-group-item-info" data-toggle="list" href="#cross" role="tab" aria-controls="profile" style="font-family: Cambria">Cross Validation for Models<span class="badge badge-primary badge-pill" style="background-color:#1E90FF; color:white">5</span></a> 

<a id='import'></a>
<h1 style="background-color:#1E90FF; color:white" style="font-family: Cambria">Importing Basic Libraries and Loading Dataset</h1> 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

: 

In [None]:
df= pd.read_csv('UpdatedResumeDataSet.csv')
df.head()

<a id='understand'></a>
<h1 style="background-color:#1E90FF; color:white" style="font-family: Cambria">Understanding Dataset</h1> 

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df['Category'].unique()

In [None]:
df['Category'].nunique()

In [None]:
categories = df['Category'].value_counts().reset_index()
categories

In [None]:
plt.figure(figsize=(25,8))
plt.xticks(rotation=60)
# count plot on single categorical variable
sns.countplot(x ='Category', data= df, order= df['Category'].value_counts().index)
 
# Show the plot
plt.show()

In [None]:
plt.figure(figsize=(25,8))

#define Seaborn color palette to use
colors= sns.color_palette('bright')[0:5]

#create pie chart
plt.pie(categories['Category'], labels= categories['index'], colors = colors, autopct='%.1f%%')
plt.show()

<a id='prep'></a>
<h1 style="background-color:#1E90FF; color:white" style="font-family: Cambria">Preprocessing</h1> 

Let's create a helper function to remove URLs, hashtags, mentions, special letters and punctuation

Firstly, Let's add a new column for this:

In [None]:
df1= df.copy()
df1['cleaned_resume']= ""
df1

Function:

In [None]:
import re
def clean_function(resumeText):
    resumeText = re.sub('http\S+\s*', ' ', resumeText)  # remove URLs
    resumeText = re.sub('RT|cc', ' ', resumeText)  # remove RT and cc
    resumeText = re.sub('#\S+', '', resumeText)  # remove hashtags
    resumeText = re.sub('@\S+', '  ', resumeText)  # remove mentions
    resumeText = re.sub('[%s]' % re.escape("""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""), ' ', resumeText)  # remove punctuations
    resumeText = re.sub(r'[^\x00-\x7f]',r' ', resumeText) 
    resumeText = re.sub('\s+', ' ', resumeText)  # remove extra whitespace
    return resumeText

Let's apply to columns:

In [None]:
df1['cleaned_resume'] = df1['Resume'].apply(lambda x: clean_function(x))
df1.head()

Let's encode the Category column:

In [None]:
from sklearn.preprocessing import LabelEncoder
df2= df1.copy()
df2['Category']= LabelEncoder().fit_transform(df2['Category'])
df2.head()

Let's create wordcloud:

In [None]:
import nltk
from nltk.corpus import stopwords
import string
from wordcloud import WordCloud

In [None]:
#Stop words are generally the most common words in a language.
#English stop words from nltk:
SetOfStopWords= set(stopwords.words('english')+['``',"''"])

In [None]:
totalWords= []

In [None]:
Sentences= df2['Resume'].values

In [None]:
cleanedSentences= ""

In [None]:
for records in Sentences:
    cleanedText= clean_function(records)
    cleanedSentences += cleanedText
    requiredWords = nltk.word_tokenize(cleanedText)
    for word in requiredWords:
        if word not in SetOfStopWords and word not in string.punctuation:
            totalWords.append(word)

In [None]:
wordfreqdist = nltk.FreqDist(totalWords)

In [None]:
wordfreqdist

In [None]:
mostcommon = wordfreqdist.most_common(30)

In [None]:
mostcommon

In [None]:
WordCloud= WordCloud().generate(cleanedSentences)
plt.figure(figsize=(10,10))
plt.imshow(WordCloud, interpolation='bilinear')
plt.axis("off")
plt.show()

<a id='model'></a>
<h1 style="background-color:#1E90FF; color:white" style="font-family: Cambria">Building Models</h1>

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse import hstack

In [None]:
Text= df2['cleaned_resume'].values
Target= df2['Category'].values

Here we will preprocess and convert the ‘cleaned_resume’ column into vectors. We will be using the ‘Tf-Idf’ method to get the vectors:

In [None]:
word_vectorizer = TfidfVectorizer(sublinear_tf=True, stop_words='english')
word_vectorizer.fit(Text)
WordFeatures= word_vectorizer.transform(Text)

We have ‘WordFeatures’ as vectors and ‘Target’ and target after this step.

In [None]:
WordFeatures.shape

Let’s split the data into training and test set:

In [None]:
X_train,X_test,y_train,y_test= train_test_split(WordFeatures, Target, random_state=42)

In [None]:
print(X_train.shape)
print(X_test.shape)

We have trained and tested the data and now let’s build the models:

In [None]:
from sklearn.multiclass import OneVsRestClassifier

from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier

In [None]:
models = {
    'K-Nearest Neighbors' : KNeighborsClassifier(),
    'Logistic Regression' : LogisticRegression(),
    'Support Vector Machine' : SVC(),
    'Random Forest' : RandomForestClassifier()    
}

In [None]:
model_list=[]
for model in models.values():
    model_list.append(OneVsRestClassifier(model))
model_list

In [None]:
for i in model_list:
    i.fit(X_train, y_train)
    print(f'{i} trained')

print("*"*60)
print("all models trained")

In [None]:
for count, value in enumerate(model_list):
    print(f"Accuracy of {value} on training set :", model_list[count].score(X_train, y_train))
    print(f"Accuracy of {value} on test set :", model_list[count].score(X_test, y_test))
    print("*"*100)

print("all scores calculated")

In [None]:
from sklearn.metrics import confusion_matrix as CM
from sklearn.metrics import classification_report

from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score

In [None]:
for count, value in enumerate(model_list):
    print(f'{value} classification report')
    print("-"*80)
    print(classification_report(y_test, model_list[count].predict(X_test)))
    print("*"*100)
    print(" ")

<a id='cross'></a>
<h1 style="background-color:#1E90FF; color:white" style="font-family: Cambria">Cross Validation for Models</h1>

In [None]:
from sklearn.model_selection import cross_val_score, KFold

In [None]:
results = {}

kf = KFold(n_splits= 10)

for count, value in enumerate(model_list):
    result = cross_val_score(model_list[count], X_train, y_train, scoring= 'accuracy', cv= kf)
    results[value] = result

In [None]:
print("r2 scores")
print("*********************************")
for name, result in results.items():
   
    print(f'{name} : {round(np.mean(result),3)}')
    print("----------------")

This project, it is showed how different machine learning algorithms could be applied for building a system such as a resume screening. 

The models just classified almost 1000 resumes in a few minutes into their respective categories with 99% accuracy.

<div style="color:white; font-size:125%; text-align:left; display:fill; border-radius:5px; background-color:#1E90FF; overflow:hidden">Thanks for reading. I hope you enjoy it and that it was helpful to you.<br>Please don't forget to follow me and give an upvote on</br>
👇👇👇
</div>

**<a href="https://www.kaggle.com/drindeng/" target="_blank" rel="noopener noreferrer">[Kaggle]</a> | 
<a href="https://github.com/drindeng" target="_blank" rel="noopener noreferrer">[GitHub]</a> |
<a href="https://www.linkedin.com/in/turgay-turker/" target="_blank" rel="noopener noreferrer">[Linkedin]</a>**