<a href="https://colab.research.google.com/github/chewzzz1014/fyp/blob/master/job_resume_score/src/job_resume_score_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# mount drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [73]:
resume_text = '''
Zi Qing Chew
chewziqing@gmail.com | 016-2892475 | Kuala Lumpur, Malaysia | linkedin.com/in/ziqingchew | github.com/chewzzz1014
EDUCATION

Universiti Putra Malaysia					                                                   Oct 2021 - Current
Bachelor in Computer Science with Honours
Expected to graduate in July 2025. CGPA: 3.99

WORK EXPERIENCE

Ant International 									          	July 2024 – Oct 2024
Java Engineer Intern							                               Kuala Lumpur, Malaysia
Collaborated in developing an audit logging feature for Ant Group’s internal Foreign Exchange (FX) trade strategy system that records changes made by business users to trade strategies.
Conducted comprehensive system analysis and project planning, delivering presentations to project stakeholders and QA teams prior to the development phase.
Utilised Ant Group’s internal frameworks, middleware, and tools to implement the audit logging feature.
Skills: Java, Spring, Sofaboot, Ant Group internal middlewares (ZDAL, DRM, Ant Scheduler, Msg Broker)
Howuku  									          	             Feb 2023 – Sep 2023
Software Developer Intern							                    Kuala Lumpur, Malaysia
Developed and optimized A/B testing features, including code editor and previewer for CSS and JavaScript modifications for experiment variations.
Expanded A/B testing targeting rule by incorporating website visitor's OS, device, and browser rules.
Automated experiment-stopping criteria and email notifications based on user-defined experiment termination conditions.
Collaborated with cross-functional teams to debug, troubleshoot, and enhance Howuku platform features based on user feedback and performance data.
Skills: JavaScript, Bootstrap, Vue.js, Express.js, MySQL

PROJECTS

Personal Portfolio Website (chewzzz1014.github.io/portfolio-website)
Designed, developed and deployed personalised portfolio website featuring skills, selected projects, and downloadable resume.
Skills: JavaScript, React.js, CSS, Bootstrap
Depression Level Detection Chatbot (https://github.com/chewzzz1014/health-ease-project)
Developed machine learning application that evaluates a message's depression level and provided tailored mental health advice and information based on the depression severity.
Skills: Python, pandas, scikit-learn, Keras, FastAPI, Gradio
Clothing Store Website (https://github.com/chewzzz1014/CSC3402-MVC-Project)
Worked in team to build a CRUD Spring Boot application with attractive interfaces, data persistence, authentication and authorisation.
Developed the backend of the application that involves querying the database, building REST endpoints and implementing Thymeleaf in HTML for dynamic contents.
Skills: Spring Boot, Spring MVC, Thymeleaf, Hibernate, Bootstrap

SKILLS
Programming Languages: Java, Python, HTML, CSS, JavaScript, MySQL, OracleSQL
Frameworks and Libraries: Spring, Spring Boot, TypeScript, Node.js, Express.js, React.js, Vue.js, Bootstrap, Tailwind CSS
Tools: Git, Github, Jira, Tableau, Excel, Jupyter Notebook, Google Colab, VSCode, IntelliJ
'''

In [69]:
# load job descriptions from excel
import pandas as pd
job_desc_df = pd.read_excel("/content/drive/MyDrive/FYP/Implementation/Resume Dataset/job_desc.xlsx")
job_desc_df

Unnamed: 0,Job Title,Job Desc
0,Java Developer,This is a technical role as part of an applica...
1,Front End Developer (React),Your roles & responsibilities:\n\nDesign and d...
2,Junior Backend Developer (Golang),Job description\nCompany Description\n\nAbout ...
3,Digital Marketing Manager,Job description\nCompany Description\n\nNuraz ...
4,C&S Design Engineer (Sibu),ROLES & RESPONSIBILITIES:\n\nDesign and prepar...


In [8]:
# make prediction using trained NER model

import spacy
import string
from spacy import displacy

# convert text into small letter then remove punctuation
resume_text = resume_text.lower()
resume_text = resume_text.translate(str.maketrans('', '', string.punctuation))

# load trained model
nlp = spacy.load("/content/drive/MyDrive/FYP/Implementation/spacy_output/model-best")

# create a Spacy doc and add text to it
doc = nlp(resume_text)

# extract entities into a dictionary
entities_dict = {}
for ent in doc.ents:
    if ent.label_ in entities_dict:
        entities_dict[ent.label_].append(ent.text)
    else:
        entities_dict[ent.label_] = [ent.text]

# Print the dictionary
print(entities_dict)

# visualize predicted entities using displacy
colors = {
    "NAME": "lightblue",
    "LOC": "yellow",
    "PHONE": "pink",
    "EMAIL": "lightgreen",
    "JOB": "orange",
    "SKILL": "aqua",
    "COMPANY": "violet",
    "WORK PER": "salmon",
    "DEG": "lightcoral",
    "UNI": "lightgrey",
    "STUDY PER": "peachpuff",
}
options = {"ents": list(colors.keys()), "colors": colors}
displacy.render(doc, style="ent", jupyter=True, options=options)

{'SKILL': ['qing', 'chew', 'java', 'spring', 'sofaboot', 'javascript', 'bootstrap', 'vuejs', 'mysql', 'javascript', 'reactjs', 'css', 'bootstrap', 'python', 'pandas', 'scikitlearn', 'spring boot', 'spring mvc', 'hibernate', 'bootstrap', 'java', 'python', 'html', 'css', 'javascript', 'mysql', 'oraclesql', 'nodejs', 'vuejs', 'bootstrap', 'tailwind', 'css', 'git', 'github', 'jira', 'tableau', 'excel'], 'WORK PER': ['0162892475', 'july 2024', 'oct 2024', 'feb 2023', 'sep 2023'], 'STUDY PER': ['oct 2021', 'july 2025'], 'DEG': ['bachelor in computer science'], 'JOB': ['java engineer intern', 'software developer']}


# Text Preprocessing
1. Text Cleaning
2. Tokenization
3. Preparation
    *   Stop Word Removal
    *   Stemming
    *   Lemmatization


# Feature Extraction Using TF-IDF

# Consine Similarity

In [19]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [70]:
# tokenization
import string
from collections import Counter

def tokenize(text):
    text = text.lower()
    no_punc_text = text.translate(str.maketrans('', '', string.punctuation))
    tokens = nltk.word_tokenize(no_punc_text)
    return tokens

resume_tokens = tokenize(resume_text)
resume_tokens_count = Counter(resume_tokens)
print('10 top tokens in resume:')
print(resume_tokens_count.most_common(10))

job_tokens = tokenize(job_desc_df.loc[0, 'Job Desc'])
job_tokens_count = Counter(job_tokens)
print('10 top tokens in job description:')
print(job_tokens_count.most_common(10))

10 top tokens in resume:
[('spring', 24), ('using', 24), ('java', 20), ('web', 14), ('services', 12), ('developed', 12), ('used', 12), ('jsp', 11), ('hibernate', 11), ('data', 11)]
10 top tokens in job description:
[('and', 17), ('to', 16), ('in', 8), ('design', 6), ('with', 5), ('requirements', 5), ('a', 4), ('technical', 4), ('of', 4), ('application', 4)]


In [71]:
# stop word removal
from nltk.corpus import stopwords

resume_tokens_filtered = [w for w in resume_tokens if not w in stopwords.words('english')]
resume_tokens_filtered_count = Counter(resume_tokens_filtered)
print('50 top tokens in resume after stop word removal:')
print(resume_tokens_filtered_count.most_common(50))

job_tokens_filtered = [w for w in job_tokens if not w in stopwords.words('english')]
job_tokens_filtered_count = Counter(job_tokens_filtered)
print('50 top tokens in job after stop word removal:')
print(job_tokens_filtered_count.most_common(50))

50 top tokens in resume after stop word removal:
[('spring', 24), ('using', 24), ('java', 20), ('web', 14), ('services', 12), ('developed', 12), ('used', 12), ('jsp', 11), ('hibernate', 11), ('data', 11), ('framework', 11), ('application', 10), ('beans', 9), ('mvc', 9), ('business', 9), ('css', 8), ('developer', 7), ('server', 7), ('implemented', 7), ('database', 7), ('environment', 7), ('soap', 6), ('ajax', 6), ('agile', 6), ('testing', 6), ('code', 6), ('html', 6), ('involved', 6), ('created', 6), ('struts', 6), ('developing', 5), ('j2ee', 5), ('servlets', 5), ('restful', 5), ('jquery', 5), ('log4j', 5), ('junit', 5), ('ui', 5), ('jira', 5), ('saic', 5), ('design', 5), ('like', 5), ('user', 5), ('worked', 5), ('system', 4), ('software', 4), ('jdbc', 4), ('html5', 4), ('script', 4), ('json', 4)]
50 top tokens in job after stop word removal:
[('design', 6), ('requirements', 5), ('technical', 4), ('application', 4), ('software', 4), ('analysis', 4), ('develop', 4), ('team', 3), ('workin

In [72]:
# stemming
from nltk.stem.porter import *

def stem_tokens(tokens, stemmer):
    stemmed = []
    for item in tokens:
        stemmed.append(stemmer.stem(item))
    return stemmed

stemmer = PorterStemmer()

resume_tokens_stemmed = stem_tokens(resume_tokens_filtered, stemmer)
resume_tokens_stemmed_count = Counter(resume_tokens_stemmed)
print('50 top tokens in resume after stemming:')
print(resume_tokens_stemmed_count.most_common(50))

job_tokens_stemmed = stem_tokens(job_tokens_filtered, stemmer)
job_tokens_stemmed_count = Counter(job_tokens_stemmed)
print('50 top tokens in job after stemming:')
print(job_tokens_stemmed_count.most_common(50))

50 top tokens in resume after stemming:
[('use', 37), ('develop', 32), ('spring', 24), ('java', 20), ('servic', 15), ('web', 14), ('applic', 13), ('jsp', 12), ('test', 12), ('hibern', 11), ('data', 11), ('framework', 11), ('bean', 10), ('server', 10), ('design', 9), ('rest', 9), ('mvc', 9), ('busi', 9), ('control', 9), ('creat', 9), ('code', 8), ('css', 8), ('work', 7), ('implement', 7), ('databas', 7), ('environ', 7), ('user', 7), ('product', 7), ('servlet', 6), ('soap', 6), ('script', 6), ('ajax', 6), ('agil', 6), ('ui', 6), ('requir', 6), ('respons', 6), ('html', 6), ('involv', 6), ('strut', 6), ('softwar', 5), ('j2ee', 5), ('jqueri', 5), ('log4j', 5), ('junit', 5), ('jira', 5), ('saic', 5), ('valid', 5), ('like', 5), ('configur', 5), ('interact', 5)]
50 top tokens in job after stemming:
[('develop', 7), ('requir', 7), ('work', 6), ('design', 6), ('technic', 4), ('applic', 4), ('softwar', 4), ('perform', 4), ('analysi', 4), ('team', 3), ('abl', 3), ('technolog', 3), ('project', 3), 

In [76]:
# tf-idf

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(tokenizer=tokenize, stop_words='english')

def compute_similarity(text1, text2):
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform([text1, text2])
    return cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]

nlp = spacy.load("/content/drive/MyDrive/FYP/Implementation/spacy_output/model-best")
resume_doc = nlp(resume_text)
resume_skills = [ent.text for ent in resume_doc.ents if ent.label_ == 'SKILL']
print('Extracted skills in resume:')
print(set(resume_skills))
print("******************************************")

for index, row in job_desc_df.iterrows():
    job_title = row['Job Title']
    job_desc = row['Job Desc']
    job_doc = nlp(row['Job Desc'])
    job_skills = [ent.text for ent in job_doc.ents if ent.label_ == 'SKILL']
    print(f'Extracted skills in job desc ({job_title})')
    print(set(job_skills))

    similarity_score = compute_similarity(resume_text, job_desc)
    print(similarity_score)
    print(f"Cosine Similarity Score: {similarity_score * 100:.2f}%")
    print("******************************************")

Extracted skills in resume:
{'MySQL', 'Spring', 'Programming', 'Git', 'CSS', 'Spring MVC', 'Zi', 'Intern', 'machine learning', 'Bootstrap', 'Qing Chew\n', 'IntelliJ'}
******************************************
Extracted skills in job desc (Java Developer)
{'SQL'}
0.3623718058163478
Cosine Similarity Score: 36.24%
******************************************
Extracted skills in job desc (Front End Developer (React))
{'Cloud Services', '&', 'Paranoid', 'Work', 'SOCSO', 'JSON', 'Redux', 'React'}
0.30788124193179056
Cosine Similarity Score: 30.79%
******************************************
Extracted skills in job desc (Junior Backend Developer (Golang))
{'OOP', 'Programming', 'Fun', 'Agile', 'Information', 'communication', 'Scrum'}
0.3289583818441827
Cosine Similarity Score: 32.90%
******************************************
Extracted skills in job desc (Digital Marketing Manager)
{'social media', 'Managing', 'Nuraz', 'Description', 'communication'}
0.24956428841790818
Cosine Similarity Score

In [41]:
import spacy
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
import nltk

# Download required NLTK data
nltk.download('stopwords')
nltk.download('wordnet')

# Load the SpaCy model (adjust path to your model)
nlp = spacy.load("/content/drive/MyDrive/FYP/Implementation/spacy_output/model-best")

# Initialize text processing components
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Define preprocessing function
def preprocess_text(text):
    # Text cleaning: lowercase, remove punctuation and extra whitespace
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    text = ' '.join(text.split())

    # Tokenization
    tokens = text.split()

    # Stop word removal
    tokens = [word for word in tokens if word not in stop_words]

    # Stemming and lemmatization
    tokens = [stemmer.stem(word) for word in tokens]
    tokens = [lemmatizer.lemmatize(word) for word in tokens]

    # Rejoin tokens to a single string
    return ' '.join(tokens)

# Preprocess resume and job description
resume_text_processed = preprocess_text(resume_text)
job_description_processed = preprocess_text(job_desc.iloc[0, 'Job Desc'])

# Extract skills from resume using NER
resume_doc = nlp(resume_text)
skills = [ent.text for ent in resume_doc.ents if ent.label_ == 'SKILL']

# Define function to compute cosine similarity
def compute_similarity(text1, text2):
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform([text1, text2])
    return cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]

# Similarity calculation
if len(skills) >= 5:  # if sufficient skills are detected
    skills_text = ' '.join(skills)
    print(skills_text)
    similarity_score = compute_similarity(skills_text, job_description_processed)
else:
    # Use the entire resume and job description text if skills are insufficient
    similarity_score = compute_similarity(resume_text_processed, job_description_processed)

similarity_score_skill = compute_similarity(skills_text, job_description_processed)
similarity_score_overall = compute_similarity(resume_text_processed, job_description_processed)

similarity_score = similarity_score_skill * 0.3 + similarity_score_overall * 0.7

print(f"Cosine Similarity Score: {similarity_score * 100:.2f}%")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


qing chew java spring sofaboot javascript bootstrap vuejs mysql javascript reactjs css bootstrap python pandas scikitlearn spring boot spring mvc hibernate bootstrap java python html css javascript mysql oraclesql nodejs vuejs bootstrap tailwind css git github jira tableau excel
Cosine Similarity Score: 14.66%
