# dsresumatch Tutorial

Welcome to the ResuMatch package documentation! This package is designed to analyze resumes and job descriptions by extracting key sections (Skills, Education, Work Experience, and Contact) and scoring them based on relevant keywords. Here, we’ll illustrate the usage of these functions with a real-life example, featuring Daniel, a junior data scientist on the hunt for his next exciting role.

In [2]:
import dsresumatch

print(dsresumatch.__version__)

0.1.0


## Daniel’s Journey

Daniel has spent countless hours refining his resume, but still struggles to get callbacks from potential employers. After sending off 250 applications to several data science positions, and almost losing hope, he discovers dsresumatch. Intrigued by its ability to pinpoint missing information and relevant keywords, Daniel uses it to transform his resume into a tailored, high-impact profile for each job description he targets.

### Reading the PDF

First things first, Daniel reads the PDF version of his resume using the read_pdf function from our package. This function automatically extracts all readable text and returns it as a string, making it easy to analyze, filter, or score.

In [3]:
from dsresumatch.pdf_cv_processing import read_pdf

resume_text = read_pdf("bad_cv.pdf")

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\timot\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


### Checking for Resume Headers
From the previous function, we extracted the PDF words into a string so that it is ready to be passed into `missing_section()` function to see if there are any missing reasume headers. 

Let's jump into!

In [4]:
resume_text

'Daniel H. Personal Statement I’m a data scientist, kind of. I like data. I did some Python stuff once. Not sure what else to say. Education Bachelor’s degree in Math.  Internship I worked as a data science intern for some months. Did some machine learning.   '

As we can we see, the resume PDF has been converted to just string of words and stored in `resume_text`.

### Now, Check Sections!

Let's take a look at which section headers are missing from Daniel's resume by passing the text through `missing_section()` function.

In [5]:
from dsresumatch.sections_check import missing_section

# This is a simple use case:
section_check = missing_section(resume_text)

# Print the missing sections identified using the function
section_check

['Contact', 'Work Experience', 'Skills']

From the `section_check` variable, Daniel identifies that his resume is missing "Contact", "skills" and "Work Experience" sections.

### Include more sections for checking.

Here, we can use `add_benchmark_sections` argument to supply additional sections to include into the `missing_section()` function for checking. For example:

Daniel saw an article that says "Personal Statement" and "Volunteer" are one of the key sections in a resume. Now he wants to include the two additional sections into the function to check if they are present.

In [6]:
# Additional sections check use case:
add_section_check = missing_section(resume_text, add_benchmark_sections=["Personal Statement", "Volunteer"])

# Print the missing sections with the additional sections included
add_section_check

['Skills', 'Contact', 'Volunteer', 'Work Experience']

After including the additional sections using the `add_benchmark_sections` argument in the `missing_section()` function, it is found that "Personal Statement" is present in Daniel's resume but "Volunteer", "Contact", "Work Experience" and "Skills" are missing. 

### Check for Keywords

Now Daniel wants to check which keywords are most important in a data science resume. To do this, he passes the `resume_text` through `evaluate_keywords()` function to see which keywords are missing from his resume. 

In [7]:
from dsresumatch.evaluate_keywords import evaluate_keywords

# This is a simple use case:
check_keywords = evaluate_keywords(resume_text)

# Print the missing keywords identified using the function
check_keywords

['git',
 'data analysis',
 'sql',
 'numpy',
 'teamwork',
 'project management',
 'pytorch',
 'leadership',
 'problem solving',
 'jupyter',
 'communication',
 'pandas',
 'docker',
 'scikit-learn',
 'statistics',
 'tensorflow',
 'aws']

Daniel can now see all the important keywords missing from his resume.

In the same article as section check, Daniel read that a resume should have a couple more keywords when targeting for data science. To check them, he passed the additonal keywords to the `evaluate_keywords()` function.

In [8]:
# Additional sections check use case: 
add_keywords_check1 = evaluate_keywords(resume_text, keywords=["hyperparameter", "effeciency", "performance metrics", "A/B testing"])

# Sorting keywords to view them alphabetically
add_keywords_check1.sort()

# Print the missing keywords with the additional keywords included
add_keywords_check1

['a/b testing',
 'aws',
 'communication',
 'data analysis',
 'docker',
 'effeciency',
 'git',
 'hyperparameter',
 'jupyter',
 'leadership',
 'numpy',
 'pandas',
 'performance metrics',
 'problem solving',
 'project management',
 'pytorch',
 'scikit-learn',
 'sql',
 'statistics',
 'teamwork',
 'tensorflow']

Daniel can now see all the important keywords missing from his resume along with the additonal keywords he supplied.

Additionally, Daniel wants to confirm that his resume has 'Bachelor's Degree', 'Math', and 'Computer Science' listed. To do this, he passes the `resume_text` through the `evaulate_keywords()` section and updates the `use_only_supplied_keywords` argument to `True` so that only the supplied keywords are evaluated in the `resume_text`. 

In [9]:
# Additional sections check use case with use_only_supplied_keywords set to "True": 
add_keywords_check2 = evaluate_keywords(resume_text, keywords=["Bachelor’s degree", "Math", "Computer Science"], use_only_supplied_keywords=True)

# Print the missing keywords with the additional keywords included
add_keywords_check2

['computer science']

Now Daniel can confirm that 'Bachelor's Degree' and 'Math' are included in his resume but 'Computer Science' is not. 

### Resume Score

Daniel wants to know on a numerical scale how good his resume is. Fortunately, dsresumatch has a `resume_score()` function that gives a score to the resume.

In [11]:
from dsresumatch.resume_scoring import resume_score

# Calculate the resume score. To get the score only, pass the argument 'feedback=False'
score = resume_score(resume_text, feedback=False)

# Print the resume score
score

'This resume attained a score of 12.50.'

### Resume Summary

Daniel has gotten feedback from separate functions. Now, he wants a summary of all the feedback, so he can improve on his resume. This can be achieved by not setting feedback to False, when calling the `resume_score` function.

In [15]:
# Get the resume summary. By default, feedback = True.
summary = resume_score(resume_text)

# Print the resume score
print(summary)

This resume attained a score of 12.50. 
 - Missing Keywords: git, data analysis, sql, numpy, teamwork, project management, pytorch, leadership, problem solving, jupyter, communication, pandas, docker, scikit-learn, statistics, tensorflow, aws 
 - Missing Sections: Contact, Work Experience, Skills


### Improving the Resume

Based on all the feedback, Daniel created a new resume. He now wants to see how it performs on dsresumatch. He does the following functions to load and score his new resume.

In [2]:
new_resume = read_pdf("good_cv.pdf")
summary = resume_score(new_resume)
print(summary)

NameError: name 'read_pdf' is not defined