# AI Resume Analyzer

This project is a tool for analyzing resumes using natural language processing (NLP) techniques. The goal is to help recruiters and hiring managers quickly identify the most qualified candidates for a job opening. The tool extracts skills and qualifications from a resume, and matches them against a list of desired skills provided by the user.


# Code Explanation

The code is divided into several blocks, each performing a specific task:

- Block 1: Imports the required libraries and packages.
- Block 2: Loads the NLP model and defines a function for extracting skills from a resume.
- Block 3: Reads the resume file and preprocesses the data.
- Block 4: Extracts the skills from the resume and creates a TF-IDF matrix.
- Block 5: Trains a nearest neighbors model on the TF-IDF matrix.
- Block 6: Defines a function for finding the best match for a given set of skills.
- Block 7: Runs the function on the test data and displays the results.


# Required Libraries and Packages

In [2]:
from pyresparser import ResumeParser
import os
from docx import Document
import numpy as np
import pandas as pd
import nltk
from nltk.corpus import stopwords

In [3]:
df =pd.read_csv('jobs.csv') 
df

Unnamed: 0.1,Unnamed: 0,url,Position,Company,Location,Job_Description
0,0,https://www.glassdoor.co.in/partner/jobListing...,Software Testing Internship,Smart Food Safe Solutions Inc,– Bengaluru,About the company:\nSmart Food Safe Solutions ...
1,1,https://www.glassdoor.co.in/partner/jobListing...,Embedded Software Testing,Mobiveil,– Bengaluru,Location : Bangalore\nExperience : 4+ Years\n\...
2,2,https://www.glassdoor.co.in/partner/jobListing...,Senior Engineer - Software Testing (Bangalore ...,Open Systems International,– Bengaluru,"Open Systems International, Inc. (OSI) www.osi..."
3,3,https://www.glassdoor.co.in/partner/jobListing...,Software Testing Engineer,Bloom Solutions,– Bengaluru,About the Job\n\nSoftware Testing Engineer\n\n...
4,4,https://www.glassdoor.co.in/partner/jobListing...,CIEL/SEL/1888: Software testing Engineer,CIEL HR Services,– Bengaluru,Location: Bangalore\nExperience: 3 to 6Years\n...
...,...,...,...,...,...,...
1919,3113,https://www.glassdoor.co.in/partner/jobListing...,Front End Developer,Cuemath,Bengaluru,Skills and Qualifications:\n\n2+ Years of expe...
1920,3115,https://www.glassdoor.co.in/partner/jobListing...,Technology Lead-Sharepoint Developer,Infogain,Bengaluru,Job ID : TH10519_13189\n\nPosted on: 29th of M...
1921,3120,https://www.glassdoor.co.in/partner/jobListing...,Senior UI Developer,Siemens PLC,Bengaluru,Job Description\nWe spend 90 percent of\n\nour...
1922,3122,https://www.glassdoor.co.in/partner/jobListing...,Web Developer,Fidelity Investments,Bengaluru,(Job Number: 1905027)\n\nJob Title â€“ Web Dev...


In [4]:
stopw  = set(stopwords.words('english'))
df['test']=df['Job_Description'].apply(lambda x: ' '.join([word for word in str(x).split() if len(word)>2 and word not in (stopw)]))
df['test']

0       About company: Smart Food Safe Solutions Inc. ...
1       Location Bangalore Experience Years Job Descri...
2       Open Systems International, Inc. (OSI) www.osi...
3       About Job Software Testing Engineer Job Descri...
4       Location: Bangalore Experience: 6Years Skills ...
                              ...                        
1919    Skills Qualifications: Years experience Strong...
1920    Job TH10519_13189 Posted on: 29th May, 2019Job...
1921    Job Description spend percent lives buildings....
1922    (Job Number: 1905027) Job Title â€“ Web Develo...
1923    marry design engineering language ways produce...
Name: test, Length: 1924, dtype: object

In [5]:
import spacy

nlp = spacy.load('en_core_web_sm')

# File Format

The tool supports three file formats for resumes: .txt, .docx, and .pdf. The file should contain the candidate's name, contact information, education, work experience, and skills.


In [6]:
# Prompt the user to specify the file path of a resume in .txt, .docx, or .pdf format
filed = input('Specify the path of the resume (format(.txt, .docx and .pdf)): ')

try:
    # If the file is not in .txt format, create a new Word document, add its contents, save as .docx and extract from there
    doc = Document()
    with open(filed, 'r') as file:
        doc.add_paragraph(file.read())
    doc.save("text.docx")
    data1 = ResumeParser('text.docx').get_extracted_data()

except:
    # If the file is in .txt format, extract the contents directly
    data1 = ResumeParser(filed).get_extracted_data()
    
# Extract the 'skills' section from the extracted data and store it in the 'resume' variable
resume = data1['skills']

Specify the path of the resume (format(.txt, .docx and .pdf)): resume.pdf




In [7]:
print(resume)

['Html', 'Engineering', 'Ibm', 'Aws', 'Algorithms', 'C', 'Technical', 'Javascript', 'Css', 'Video', 'Api', 'Machine learning', 'English', 'Audio', 'Python', 'C++', 'Coding']


In [8]:
skills=[]
skills.append(' '.join(word for word in resume))
skills

['Html Engineering Ibm Aws Algorithms C Technical Javascript Css Video Api Machine learning English Audio Python C++ Coding']

In [9]:
import re
from ftfy import fix_text

def ngrams(string, n=3):
    # fix any Unicode encoding issues
    string = fix_text(string)
    # remove non-ASCII characters
    string = string.encode("ascii", errors="ignore").decode()
    # convert to lowercase
    string = string.lower()
    # remove specified characters using regex
    chars_to_remove = [")","(",".","|","[","]","{","}","'"]
    rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
    string = re.sub(rx, '', string)
    # replace '&' with 'and'
    string = string.replace('&', 'and')
    # replace ',' with a space
    string = string.replace(',', ' ')
    # replace '-' with a space
    string = string.replace('-', ' ')
    # normalise case - capital at start of each word
    string = string.title()
    # replace multiple spaces with a single space
    string = re.sub(' +',' ',string).strip()
    # pad names for ngrams
    string = ' '+ string +' '
    # remove certain characters
    string = re.sub(r'[,-./]|\sBD',r'', string)
    # create n-grams
    ngrams = zip(*[string[i:] for i in range(n)])
    # concatenate the n-grams into strings and return them as a list
    return [''.join(ngram) for ngram in ngrams]


In [10]:
# Import the TfidfVectorizer class from scikit-learn's feature extraction module
from sklearn.feature_extraction.text import TfidfVectorizer

# Define the vectorizer object with minimum document frequency of 1 and ngram analyzer
vectorizer = TfidfVectorizer(min_df=1, analyzer=ngrams, lowercase=False)

# Use the fit_transform method to learn the vocabulary dictionary and return a term-document matrix
# The input is the list of skills, which will be transformed into a matrix of TF-IDF features
tfidf = vectorizer.fit_transform(skills)

# The resulting tfidf matrix is a sparse matrix of shape (n_skills, n_features)
# Each row represents a skill and each column represents an n-gram feature in the skill text

In [11]:
# Import the NearestNeighbors class from scikit-learn's neighbors module
from sklearn.neighbors import NearestNeighbors

# Define the nearest neighbors object with k=1 (i.e., find the nearest neighbor) and use all available CPUs for parallelism
nbrs = NearestNeighbors(n_neighbors=1, n_jobs=-1).fit(tfidf)

# Define the input data for the nearest neighbors search
# Here, the input data is the preprocessed and filtered text in the 'test' column of the dataframe 'df'
test = (df['test'].values.astype('U'))

In [12]:
# Define a function to get the nearest neighbor(s) of a given query
def getNearestN(query):
    # Transform the query text into a TF-IDF vector using the same vectorizer used for the skills
    queryTFIDF_ = vectorizer.transform(query)
    # Use the fitted nearest neighbors model to find the nearest neighbor(s) of the query TF-IDF vector
    distances, indices = nbrs.kneighbors(queryTFIDF_)
    # Return the distances and indices of the nearest neighbor(s)
    return distances, indices

In [14]:
# Use the getNearestN function to find the nearest neighbor(s) of each job description in the 'test' array
distances, indices = getNearestN(test)

# Convert the 'test' array to a list
test = list(test) 

# Create an empty list to store the match confidence scores
matches = []

# Loop over the indices of the nearest neighbor(s) for each job description
for i,j in enumerate(indices):
    # Get the distance between the job description and its nearest neighbor(s)
    dist = round(distances[i][0], 2)
    
    # Create a temporary list to store the match confidence score for the current job description
    temp = [dist]
    
    # Append the match confidence score to the 'matches' list
    matches.append(temp)
    
# Convert the 'matches' list to a pandas DataFrame with a column named 'Match confidence'
matches = pd.DataFrame(matches, columns=['Match confidence'])
matches

Unnamed: 0,Match confidence
0,0.90
1,1.02
2,0.90
3,0.97
4,0.95
...,...
1919,1.02
1920,0.82
1921,0.89
1922,0.90


In [20]:
df['match']=matches['Match confidence']
df1=df.sort_values('match')
df1[['Position', 'Company','Location']].head(10).reset_index()

Unnamed: 0,index,Position,Company,Location
0,853,Data Scientist - Centre of Excellence,Micro Focus,â€“ Bengaluru
1,772,Data Scientist,ITC Infotech India Ltd,â€“ Bengaluru
2,822,Machine Learning Engineer II,American Express,â€“ Bengaluru
3,1328,Principal Machine Learning Engineer,Bengaluru,Bengaluru
4,1327,Machine Learning Engineer,American Express,Bengaluru
5,900,Data Scientist / Scala Engineer,IQLECT,Bengaluru
6,512,Senior Data Engineer – Open Source – Big Data ...,CareerXperts,– India
7,143,"Data Science, Statistical modelling_6-9Years_",Capgemini,– Pune
8,1169,Data Scientist,Alphonso,Bengaluru
9,1268,Programmer Analyst - Data Science and Machine ...,FIS,Bengaluru


# Output

The output of the tool is a list of matches between the candidate's skills and the desired skills. Each match is assigned a confidence score, indicating how closely the candidate's skills match the desired skills. The matches are sorted in descending order of confidence score.


# Conclusion

In this project, we have developed a tool for analyzing resumes using NLP techniques. The tool can extract skills and qualifications from a resume, and match them against a list of desired skills. The tool has the potential to save recruiters and hiring managers time and effort by quickly identifying the most qualified candidates for a job opening.
