**Automated Resume Skills Parser Project**

In this project, I will demonstrate my knowledge of Natural Processing Language Processing (NLP) in the most simplistic way to help HR and even candidates easily match their skills to the job desription given.

This project is very simple and can be used where the resumes are not many.


In [None]:
# import required libraries
import spacy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

**import spacy** → Loads spaCy, a powerful NLP library for text processing.

**from sklearn.feature_extraction.text import TfidfVectorizer** → Imports TF-IDF vectorizer, which converts text into numerical vectors based on term frequency.

**from sklearn.metrics.pairwise import cosine_similarity** → Imports cosine similarity, which measures how similar two text vectors are.

In [3]:
# load spaCy's NLP model
nlp = spacy.load("en_core_web_sm")

This code loads the pre-trained English NLP model (en_core_web_sm), which helps extract information from text

In [4]:
# create sample resume data (Dictionary Format)
resume_data = {
    "Victoria": "Victoria is a data scientist expertise in Python, SQL, Excel, and Machine Learning.",
    "Dora": "Dora is a software engineer who specializes in Java, JavaScript, and cloud computing.",
    "Mary": "Mary business analyst is skilled in data visualization, Power BI, and Excel.",
    "Carl": "Carl is a data analyst with expertise in SQL, Python, PowerBI, and Excel"
}

This dictionary represents sample resumes, where:

**Keys (Victoria, Dora, Mary)** → Represent candidates.

**Values (resume text):** → Describe each candidate’s skills.:

In practice, these resumes would be extracted from PDF or text files. But for the sake of simplicity, we will manually extract the skills.

In [5]:
# Lets define a job description for comparison to the candidate's skills
job_description = "We are looking for a data scientist with experience in Python, Machine Learning, and SQL."

This job description contains required skills that will be compared to resumes.

In [6]:
# create a function that will extract the keywords using spaCy
def extract_keywords(text):
    doc = nlp(text)  # Process text using spaCy
    keywords = [token.text for token in doc if token.pos_ in ["NOUN", "PROPN"]]
    return " ".join(keywords)

This code processes text using spaCy to extract keywords (skills, job roles), extracts nouns (NOUN) and proper nouns (PROPN), which typically represent important concepts (e.g., Python, SQL, Machine Learning), and finally
returns keywords as a string.

In [7]:
# extract keywords from resumes & job description
resume_keywords = {name: extract_keywords(text) for name, text in resume_data.items()}
job_keywords = extract_keywords(job_description)

The code does the following:
1. Extracts keywords from each resume and stores them in a dictionary.
2. Extracts keywords from the job description.

In [8]:
# convert text into numerical vectors using TF-IDF
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(list(resume_keywords.values()) + [job_keywords])

TF-IDF (TfidfVectorizer) converts text into numerical vectors based on:

Term Frequency (TF): How often a word appears in the text.

Inverse Document Frequency (IDF): How unique the word is across documents.

Vectorizes the extracted keywords from resumes and the job description.


In [9]:
# Compute Similarity Scores Using Cosine Similarity
similarity_scores = cosine_similarity(vectors[:-1], vectors[-1:]).flatten()

This code computes similarity scores between:

Each resume's vector (vectors[:-1])

Job description's vector (vectors[-1:])

Cosine Similarity measures how similar two text vectors are (values range from 0 to 1, where 1 = perfect match).

In [10]:
# rank resumes based on similarity scores
ranked_resumes = sorted(zip(resume_data.keys(), similarity_scores), key=lambda x: x[1], reverse=True)

Pairs resume names with similarity scores, and sorts resumes in descending order (best match first)

In [11]:
# display the ranked resumes
print("Ranked Resumes:")
for name, score in ranked_resumes:
    print(f"{name}: {score:.2f}")

Ranked Resumes:
Victoria: 0.68
Carl: 0.27
Mary: 0.06
Dora: 0.00


Victoria is the best match for the Data Scientist role since her skills align the most.

**Conclusion**

This is a beginner friendly project to help grasp the idea behind how spacy library works in helping to sort out text, match and score them according to the need.

**Extensions for More Impact**

In this project, I have explored a basic resume skills match program which extracts keywords and match with job descriptions.

Going forward, I will take it up a notch by adding a streamlit web app for recruiters to upload resumes.