We came up with the idea to Make a tailored resume creator after I spoke with Rob Kairuz from the UF career Counseling Center about updating my own resume. He told me that you should really tailor each resume to the job posting that you are applying for so as to have the best chance of getting past the automatic resume scanners. We had a group meeting and all agreed that it was a good idea for our final project.

Next I interviewed Rob as to all of the things such a program might need. I wanted to start by taking a job posting and extracting “skill words” from it. Rob mentioned that most skill words were nouns like organization, initiative, creativity etc; and, verbs like editing, writing, programming, ect.

The group met up and brainstormed ways we could extract skill words from resumes. One suggested method was using natural language processing. Another was using chat GPT. Both have advantages and downsides. Chat GPT is accurate but expensive because API queries cost a couple of cents each. I opted to use natural language processing which is less accurate but does not require a chat GPT API key.

I used the natural language toolkit (NLTK) for python to do a grammatical analysis of job postings.

In [4]:
example_posting = '''Job Opening Summary
The DSS analyst will play a key role in delivering essential data through providing high-level reporting and analytics for key quality improvement, patient safety and service initiatives. This position will provide ad hoc reporting, maintain standard reporting and forecast key business processes using various systems, databases and tools, including MS Office Suite, MS SQL Server, MS Visual Studio and SAP Business Objects.

This position will work with physicians and data science teams to manage and analyze data pertaining to the needs of assigned projects. The incumbent will communicate and explain complex data and information to leaders at all levels of the organization.

This role reports to the manager of DSS (reporting) and interfaces with a wide customer base from unit managers to executive leadership, providing insights on both business and clinical operations. Builds and maintains positive relationships with clients while utilizing industry and subject-matter best practices. Translates data for reports, which requires a keen understanding of the health care business, operational processes and the ability to perform complex analyses using data resources and technical tools.

Job Opening Qualifications
Minimum Education and Experience Requirements:

Minimum Education:

Bachelor's degree in STEM (science, technology, engineering and math).
Business analytics or a related field required.


Minimum Job Experience:

Two years of health care data analysis experience with in-depth knowledge of health care operations required. An advanced degree may substitute as health care data analysis experience on a year-for-year basis.
Job-Related Knowledge, Skills and Abilities:

Strong problem-solving, quantitative and analytical abilities.
Experience with large relational databases and data warehousing, including the ability to query database using SQL or similar language.
Mastery of business intelligence report writing and visualization tools, such as Epic Analytics, Business Objects (WebI), Power BI, Tableau or similar report writing and visualization tools.
Advanced Excel spreadsheet skills, including complex functions, formulas and formatting.
Proven ability to work with and track large amounts of data (millions of records) with accuracy.
Excellent communication, collaboration and delegation skills.
Demonstrated ability to communicate information in an easy-to-understand format.


Motor Vehicle Operator Designation:

Employees in the position will operate vehicles for an assigned business purpose as a "non-frequent driver."


Licensure/Certification/Registration::

To be completed within six months of hire: clinical data model train track (proficiency).
#LI-90

Shift hours: 8 a.m. - 5 p.m., Monday-Friday
'''

In [8]:
import nltk

tokens = nltk.word_tokenize(example_posting)
pos_tags = nltk.pos_tag(tokens)

noun_verb = []

for word, tag in pos_tags:
    if tag.startswith('N') or tag.startswith('V'):
        noun_verb.append(word)

print(noun_verb)
len(noun_verb)

['Job', 'Opening', 'Summary', 'DSS', 'analyst', 'play', 'role', 'delivering', 'data', 'providing', 'reporting', 'analytics', 'quality', 'improvement', 'safety', 'service', 'initiatives', 'position', 'provide', 'ad', 'hoc', 'reporting', 'maintain', 'reporting', 'forecast', 'key', 'business', 'processes', 'using', 'systems', 'databases', 'tools', 'including', 'MS', 'Office', 'Suite', 'MS', 'SQL', 'Server', 'MS', 'Visual', 'Studio', 'SAP', 'Business', 'Objects', 'position', 'work', 'physicians', 'data', 'science', 'teams', 'manage', 'analyze', 'data', 'pertaining', 'needs', 'projects', 'incumbent', 'communicate', 'explain', 'data', 'information', 'leaders', 'levels', 'organization', 'role', 'reports', 'manager', 'DSS', 'reporting', 'interfaces', 'customer', 'base', 'unit', 'managers', 'executive', 'leadership', 'providing', 'insights', 'business', 'operations', 'Builds', 'maintains', 'relationships', 'clients', 'utilizing', 'industry', 'practices', 'Translates', 'data', 'reports', 'requir

227

But after some research I learned that nltk has much more specific categories than just nouns and verbs.

### Types of nouns
- NN: Noun, singular or mass
- NNS: Noun, plural
- NNP: Proper noun, singular
- NNPS: Proper noun, plural

### Types of verbs
- VB: Verb, base form
- VBD: Verb, past tense
- VBG: Verb, gerund or present participle
- VBN: Verb, past participle
- VBP: Verb, non-3rd person singular present
- VBZ: Verb, 3rd person singular present

### Types of adjectives
- JJ: Adjective
- JJR: Adjective, comparative
- JJS: Adjective, superlative

In [24]:
def get_all(pos_tags, input_tag):
    words_with_tag  = []

    for word, tag in pos_tags:
        if tag.startswith(input_tag):
            words_with_tag.append(word)

    print(f"{input_tag}: {words_with_tag}")
    print(f"Length of {input_tag} list is: {len(words_with_tag)}")

'''
### Types of nouns
- NN: Noun, singular or mass
- NNS: Noun, plural
- NNP: Proper noun, singular
- NNPS: Proper noun, plural
'''
print("Types of nouns ----------------------------------------------------")
get_all(pos_tags, 'NN')
get_all(pos_tags, 'NNS')
get_all(pos_tags, 'NNP')
get_all(pos_tags, 'NNPS')
'''
### Types of verbs
- VB: Verb, base form
- VBD: Verb, past tense
- VBG: Verb, gerund or present participle
- VBN: Verb, past participle
- VBP: Verb, non-3rd person singular present
- VBZ: Verb, 3rd person singular present
'''
print("Types of verbs ----------------------------------------------------")
get_all(pos_tags, 'VB')
get_all(pos_tags, 'VBD')
get_all(pos_tags, 'VBG')
get_all(pos_tags, 'VBN')
get_all(pos_tags, 'VBP')
get_all(pos_tags, 'VBZ')
'''
### Types of adjectives
- JJ: Adjective
- JJR: Adjective, comparative
- JJS: Adjective, superlative
'''
print("Types of adjectives ----------------------------------------------------")
get_all(pos_tags, 'JJ')
get_all(pos_tags, 'JJR')
get_all(pos_tags, 'JJS')

Types of nouns ----------------------------------------------------
NN: ['Job', 'Opening', 'Summary', 'DSS', 'analyst', 'role', 'data', 'reporting', 'analytics', 'quality', 'improvement', 'safety', 'service', 'initiatives', 'position', 'ad', 'hoc', 'reporting', 'reporting', 'forecast', 'key', 'business', 'systems', 'databases', 'tools', 'MS', 'Office', 'Suite', 'MS', 'SQL', 'Server', 'MS', 'Visual', 'Studio', 'SAP', 'Business', 'Objects', 'position', 'physicians', 'data', 'science', 'teams', 'data', 'needs', 'projects', 'incumbent', 'data', 'information', 'leaders', 'levels', 'organization', 'role', 'manager', 'DSS', 'interfaces', 'customer', 'base', 'unit', 'managers', 'leadership', 'insights', 'business', 'operations', 'Builds', 'maintains', 'relationships', 'clients', 'industry', 'practices', 'data', 'reports', 'understanding', 'health', 'care', 'business', 'processes', 'ability', 'analyses', 'data', 'resources', 'tools', 'Job', 'Opening', 'Qualifications', 'Minimum', 'Education', '

Which of these words best map to so called "skill words"?

In [27]:
my_list = []

tag_1 = 'VB'
tag_2 = 'JJ'
tag_3 = 'VBG'
# ect.

for word, tag in pos_tags:
    if tag.startswith(tag_1) or tag.startswith(tag_2) or tag.startswith(tag_3): # ect.
        my_list.append(word)

print(my_list)
len(my_list)

['play', 'key', 'delivering', 'essential', 'providing', 'high-level', 'key', 'patient', 'provide', 'maintain', 'standard', 'processes', 'using', 'various', 'including', 'work', 'manage', 'analyze', 'pertaining', 'assigned', 'communicate', 'explain', 'complex', 'reports', 'reporting', 'wide', 'executive', 'providing', 'clinical', 'positive', 'utilizing', 'subject-matter', 'Translates', 'requires', 'keen', 'operational', 'perform', 'complex', 'using', 'technical', 'Minimum', 'Business', 'related', 'required', 'Minimum', 'Job', 'in-depth', 'required', 'advanced', 'substitute', 'year-for-year', 'Job-Related', 'Strong', 'quantitative', 'analytical', 'large', 'relational', 'including', 'query', 'using', 'similar', 'writing', 'such', 'similar', 'including', 'complex', 'formatting', 'work', 'track', 'large', 'Excellent', 'Demonstrated', 'communicate', 'easy-to-understand', 'operate', 'assigned', 'non-frequent', 'be', 'completed', 'clinical', 'LI-90', 'Monday-Friday']


82