## Simple Job Description to Resume Comparator

This program compares the words found in a job description to the words in a resume. The current version compares all words and gives a naive percentage match.

Many employers use software to analyze applicant resumes. It is better to have as many terms in the resume that match those in the job description.




In [3]:
from nltk import sent_tokenize, word_tokenize, pos_tag
from nltk.corpus import stopwords
import codecs
from nltk.stem.wordnet import WordNetLemmatizer
lem = WordNetLemmatizer()

# NLTK's default english stopwords
default_stopwords = stopwords.words('english')

#File Locations

document_folder = '../data/'
resume_file = document_folder + 'resume.txt'
job_description_file = document_folder + 'job_description.txt'
custom_stopwords_file = document_folder + 'custom_stopwords.txt'

custom_stopwords = codecs.open(custom_stopwords_file, 'r', 'utf-8').read().splitlines()
all_stopwords = list(map(str.lower,set(default_stopwords + custom_stopwords)))
                        
minimum_word_length = 2

desired_parts_of_speach = set(['NN'
                   ,'NNP'
                   ,'NNS'
                   ,'NNPS']
                 )



def process_text(text,stopwords,pos='',lemmatizer=None):
    tokens = word_tokenize(text)
    tags = pos_tag(tokens)
    lem = lemmatizer

    if len(pos)>0:
        words = [w for w,pos in tags if pos in pos]
        
    words = [t for t in tokens if t.isalpha()]
    words = [w for w in words if len(w)>=minimum_word_length]
    words = [w for w in words if not w.isnumeric()]
    words = [w for w in words if w not in all_stopwords]
    words = [w.lower() for w in words]
    
    if lemmatizer is not None:
        words = [lem.lemmatize(w) for w in words]
#     words = [nouns_only(w) for w in words]
#     words = map(nouns_only, words)

    return words





f_resume=open(resume_file,'r',)
f_desc = open(job_description_file,'r')

raw_resume =f_resume.read()
raw_desc = f_desc.read()

resume_words = process_text(raw_resume,all_stopwords,desired_parts_of_speach,lem)
job_words = process_text(raw_desc,all_stopwords,desired_parts_of_speach,lem)

resume_set = set(resume_words)
job_set = set(job_words)



matching_words = resume_set.intersection(job_set)

print ('You resume matches at ',"{0:.0%}".format(len(matching_words)/len(job_words)))

print('Your resume is missing the following words (naive): ',job_set-resume_set)



You resume matches at  22%
Your resume is missing the following words (naive):  {'literally', 'turning', 'deep', 'absolute', 'directly', 'background', 'scientist', 'large', 'stability', 'providing', 'individual', 'user', 'behalf', 'scale', 'warehouse', 'like', 'primary', 'insight', 'knowledge', 'excellent', 'current', 'inc', 'upload', 'internal', 'seeking', 'pipeline', 'champion', 'this', 'array', 'bachelor', 'writing', 'responsibility', 'understand', 'computer', 'working', 'join', 'enables', 'degree', 'leader', 'power', 'build', 'id', 'taking', 'analyze', 'answer', 'creative', 'enhance', 'passionate', 'influence', 'identify', 'unstructured', 'job', 'our', 'engine', 'pipe', 'environment', 'highly', 'solution', 'find', 'well', 'structured', 'higher', 'presenting', 'are', 'experienced', 'question', 'play', 'news', 'matter', 'extracting', 'relevant', 'complex', 'similar', 'track', 'qualification', 'deliver', 'person', 'alexa', 'validate', 'ensure', 'music', 'description', 'best', 'tool', 

['senior',
 'data',
 'analyst',
 'dsme',
 'alexa',
 'data',
 'services',
 'job',
 'id',
 'services',
 'inc',
 'description',
 'are',
 'excited',
 'passionate',
 'delivering',
 'advanced',
 'analytics',
 'directly',
 'influence',
 'alexa',
 'user',
 'do',
 'champion',
 'innovating',
 'behalf',
 'customer',
 'turning',
 'data',
 'insights',
 'action',
 'come',
 'join',
 'alexa',
 'data',
 'services',
 'team',
 'data',
 'sme',
 'alexa',
 'echo',
 'literally',
 'shaping',
 'voice',
 'recognition',
 'alexa',
 'name',
 'amazon',
 'cloud',
 'service',
 'brain',
 'powers',
 'echo',
 'alexa',
 'deep',
 'learning',
 'engine',
 'answers',
 'questions',
 'plays',
 'music',
 'reads',
 'news',
 'our',
 'goal',
 'deliver',
 'absolute',
 'perfect',
 'communication',
 'customers',
 'alexa',
 'data',
 'services',
 'team',
 'enables',
 'alexa',
 'deep',
 'learning',
 'providing',
 'data',
 'alexa',
 'engine',
 'alexa',
 'data',
 'services',
 'seeking',
 'experienced',
 'data',
 'analyst',
 'strong',
 'tr

## Next Steps: Improve Comparisons

1. Exclude low information parts of speach like prepositions, conjunctions.
2. Develop a list of skills.
3. Break comparisons by parts of speech. (Nouns, verbs, adjectives).
4. Look for key bigrams.
5. Enumerate and compare sentence subjects



## Next Steps: File Import of different formats