In this notebook I use the cleaned dataset of resumes and jobs prepared respectively in EDA_LN_resumes.ipynb and in EDA_jobs.ipynb.

Notice that the resumes vary into 25 categories, instead the jobs are all about data science and machine learning. So the recommender (of course) works well with resumes of people in these field.

# Import libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load data

In [2]:
df_resume = pd.read_csv('/content/drive/MyDrive/UNIPI AIDE/1° anno/Data mining & ML/Project/Data/resume_cleaned.csv')
df_job = pd.read_csv('/content/drive/MyDrive/UNIPI AIDE/1° anno/Data mining & ML/Project/Job Recommender/job_cleaned.csv')

In [3]:
df_resume = df_resume.reset_index().rename(columns = {'index': 'userID'})

# Similarity matrix

Build a cosine similarity matrix based on the tfidf vectorization of the resumes and the jobs.

Take all the necessary text columns and merge in one 'data' column and then fit the tfidf vectorizer (excluding english stopwords).

In [4]:
# Specify the names of the columns you want to merge
columns_to_merge = ['category', 'experience', 'position', 'city', 'country', 'skills']
df_resume['data'] = df_resume[columns_to_merge].apply(lambda x: ' '.join(x.astype(str)), axis=1)
#------------------
# Specify the names of the columns you want to merge
columns_to_merge = ['category', 'position', 'company', 'description', 'city', 'country', 'skills']
df_job['data'] = df_job[columns_to_merge].apply(lambda x: ' '.join(x.astype(str)), axis=1)

In [5]:
# TF-IDF vectorization
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_job = vectorizer.fit_transform(df_job['data'])
tfidf_resume = vectorizer.transform(df_resume['data'])

In [6]:
# Compute cosine similarity
similarity_matrix = cosine_similarity(tfidf_resume, tfidf_job)

# Recommend jobs for a given resumeID

Change resumeID (0-1225)

In [7]:
# Function to recommend jobID for a given resumeID
def recommend_jobs(resumeID, top_n=10):
    resume_index = df_resume.index[df_resume['userID'] == resumeID][0]
    similarity_scores = similarity_matrix[resume_index]
    top_job_indices = np.argsort(similarity_scores)[::-1][:top_n] # Get indices of top N similar jobs
    #job_index = similarity_matrix[resume_index].argmax()
    recommended_jobs = [(df_job.loc[job_index, 'jobID'], similarity_scores[job_index]) for job_index in top_job_indices]
    return recommended_jobs

In [8]:
# Usage
resumeID = 671   #'your_resumeID_here'
recommended_jobs = recommend_jobs(resumeID)
# Print recommended jobs in columns
for jobID, similarity_score in recommended_jobs:
    print(f"Job ID: {jobID}  |  Similarity Score: {similarity_score}")

Job ID: 2773  |  Similarity Score: 0.7112202723031775
Job ID: 5261  |  Similarity Score: 0.6874799637722402
Job ID: 4870  |  Similarity Score: 0.6733493099956491
Job ID: 2785  |  Similarity Score: 0.6678861057515992
Job ID: 2762  |  Similarity Score: 0.6601667005737827
Job ID: 2784  |  Similarity Score: 0.656028675106282
Job ID: 2826  |  Similarity Score: 0.6542353960549553
Job ID: 2825  |  Similarity Score: 0.6410705838242848
Job ID: 10211  |  Similarity Score: 0.6258437422685513
Job ID: 9698  |  Similarity Score: 0.6240995670733719


# Find the best resume-job matching

In [11]:
def find_best_match(top_n=10):
    best_match_score = -1  # Initialize with a value that ensures any actual score will be higher
    best_match_resumeID = None
    best_match_jobID = None

    for resumeID in df_resume['userID']:
        resume_index = df_resume.index[df_resume['userID'] == resumeID][0]
        similarity_scores = similarity_matrix[resume_index]
        top_job_indices = np.argsort(similarity_scores)[::-1][:top_n] # Get indices of top N similar jobs

        for job_index in top_job_indices:
            match_score = similarity_scores[job_index]
            if match_score > best_match_score:
                best_match_score = match_score
                best_match_resumeID = resumeID
                best_match_jobID = df_job.loc[job_index, 'jobID']

    return best_match_resumeID, best_match_jobID, best_match_score

# Find the best resume-job match
best_resumeID, best_jobID, best_score = find_best_match()
print("Best Resume ID:", best_resumeID)
print("Best Job ID:", best_jobID)
print("Best Matching Score:", best_score)


Best Resume ID: 671
Best Job ID: 2773
Best Matching Score: 0.7112202723031775


In [12]:
def find_best_and_second_best_match(top_n=10):
    best_match_score = -1  # Initialize with a value that ensures any actual score will be higher
    second_best_match_score = -1  # Initialize with a value that ensures any actual score will be higher
    best_match_resumeID = None
    best_match_jobID = None
    second_best_match_resumeID = None
    second_best_match_jobID = None

    for resumeID in df_resume['userID']:
        resume_index = df_resume.index[df_resume['userID'] == resumeID][0]
        similarity_scores = similarity_matrix[resume_index]
        top_job_indices = np.argsort(similarity_scores)[::-1][:top_n] # Get indices of top N similar jobs

        for job_index in top_job_indices:
            match_score = similarity_scores[job_index]
            if match_score > best_match_score:
                # Update second best match
                second_best_match_score = best_match_score
                second_best_match_resumeID = best_match_resumeID
                second_best_match_jobID = best_match_jobID

                # Update best match
                best_match_score = match_score
                best_match_resumeID = resumeID
                best_match_jobID = df_job.loc[job_index, 'jobID']
            elif match_score > second_best_match_score:
                # Update second best match only
                second_best_match_score = match_score
                second_best_match_resumeID = resumeID
                second_best_match_jobID = df_job.loc[job_index, 'jobID']

    return (best_match_resumeID, best_match_jobID, best_match_score), (second_best_match_resumeID, second_best_match_jobID, second_best_match_score)

# Find the best and second-best resume-job matches
best_match, second_best_match = find_best_and_second_best_match()
print("Best Match:")
print("  Resume ID:", best_match[0])
print("  Job ID:", best_match[1])
print("  Matching Score:", best_match[2])
print("Second Best Match:")
print("  Resume ID:", second_best_match[0])
print("  Job ID:", second_best_match[1])
print("  Matching Score:", second_best_match[2])


Best Match:
  Resume ID: 671
  Job ID: 2773
  Matching Score: 0.7112202723031775
Second Best Match:
  Resume ID: 649
  Job ID: 2773
  Matching Score: 0.7014699803483111


Check the results

In [13]:
df_resume[df_resume['userID'] == 671]

Unnamed: 0,userID,category,experience,position,location,skills,exp_length,pos_length,city,country,region,population,population_category,data
671,671,Building & Construction,Project EngineerCompany NameBuilding Construct...,Project Engineer at Building Construction Co.,"Kamrup, Assam, India","Construction, Project Management, Microsoft Of...",7,6,Kamrup,India,Assam,1,1,Building & Construction Project EngineerCompan...


In [14]:
df_resume[df_resume['userID'] == 649]

Unnamed: 0,userID,category,experience,position,location,skills,exp_length,pos_length,city,country,region,population,population_category,data
649,649,Building & Construction,Building Construction servicesCompany NameBuil...,Building Construction services at Building Co...,"Panipat, Haryana, India",,6,7,Panipat,India,Haryana,1,1,Building & Construction Building Construction ...


In [15]:
df_job[df_job['jobID'] == 2773]

Unnamed: 0,jobID,position,company,city,country,category,job_level,job_type,skills,description,desc_length,pos_length,data
2772,2773,"Sr. Construction Manager, MLZ Data Center Cons...",Amazon Web Services (AWS),"Portage, IN",United States,Civil Engineer,Mid senior,Onsite,"Construction Management, Architectural Enginee...",Description\nAs our Data Center Construction M...,613,7,"Civil Engineer Sr. Construction Manager, MLZ D..."


In [16]:
df_job[df_job['jobID'] == 5261]

Unnamed: 0,jobID,position,company,city,country,category,job_level,job_type,skills,description,desc_length,pos_length,data
5260,5261,"Construction Manager, Data Center Construction",Amazon Web Services (AWS),"Culpeper, VA",United States,Building Consultant,Mid senior,Onsite,"Data Center Construction, Electrical Engineeri...",Description\nAs a Data Center Construction Man...,741,5,"Building Consultant Construction Manager, Data..."


In [17]:
df_job[df_job['jobID'] == 4870]

Unnamed: 0,jobID,position,company,city,country,category,job_level,job_type,skills,description,desc_length,pos_length,data
4869,4870,Data Center Construction Project Manager,Element Critical,"Austin, TX",United States,Building Consultant,Mid senior,Onsite,"Construction project management, Construction ...",Element Critical provides hybrid infrastructur...,1191,5,Building Consultant Data Center Construction P...


# Recommend jobs given an input text

Now, given an input text, it shows the recommended jobs

Just change the input text to reflect your resume!

In [18]:
def recommend_jobs_input_text(input_text, top_n=10):
    # Vectorize input text
    input_vector = vectorizer.transform([input_text])

    # Calculate similarity scores
    similarity_scores = cosine_similarity(input_vector, tfidf_job)

    # Get indices of top N similar jobs
    top_job_indices = np.argsort(similarity_scores[0])[::-1][:top_n]

    # Retrieve recommended jobs and their similarity scores
    recommended_jobs = [(df_job.loc[job_index, 'jobID'], similarity_scores[0][job_index]) for job_index in top_job_indices]

    return recommended_jobs

In [19]:
# Usage
input_text = 'salesforce marketing specialist Austin'
#"Data Scientist with experience in machine learning and Python"
recommended_jobs = recommend_jobs_input_text(input_text)
for jobID, similarity_score in recommended_jobs:
    print("Job ID:", jobID)
    print("Similarity Score:", similarity_score)
    print()  # Add an empty line for readability

Job ID: 9152
Similarity Score: 0.48669903993748986

Job ID: 6142
Similarity Score: 0.4556212829433865

Job ID: 6167
Similarity Score: 0.44779379150679705

Job ID: 6174
Similarity Score: 0.43757878787944016

Job ID: 9927
Similarity Score: 0.39501073426325073

Job ID: 9854
Similarity Score: 0.39320313977824656

Job ID: 9677
Similarity Score: 0.38812913721984266

Job ID: 9262
Similarity Score: 0.3855021676172077

Job ID: 2960
Similarity Score: 0.38055290494324656

Job ID: 5107
Similarity Score: 0.37680809131915693



Check the results

In [20]:
df_job[df_job['jobID'] == 6142]

Unnamed: 0,jobID,position,company,city,country,category,job_level,job_type,skills,description,desc_length,pos_length,data
6141,6142,Salesforce Data Analyst,Extend Information Systems Inc.,"New York, NY",United States,Data Communications Analyst,Associate,Onsite,"Salesforce, Data Analytics, Reporting, Data Sc...","Hi Jobseekers,\nI hope you are doing well!\nWe...",508,3,Data Communications Analyst Salesforce Data An...


In [21]:
df_job[df_job['jobID'] == 9152]

Unnamed: 0,jobID,position,company,city,country,category,job_level,job_type,skills,description,desc_length,pos_length,data
9150,9152,Salesforce Data Architect,VeeAR Projects Inc.,"Austin, TX",United States,Architect,Associate,Onsite,"Salesforce, Data modeling, Core Data Managemen...",Job Description:\nThe Salesforce Ascent progra...,150,3,Architect Salesforce Data Architect VeeAR Proj...


In [22]:
df_job[df_job['jobID'] == 6167]

Unnamed: 0,jobID,position,company,city,country,category,job_level,job_type,skills,description,desc_length,pos_length,data
6166,6167,"Looking for Salesforce Data Analyst -New York,...",Extend Information Systems Inc.,"New York, NY",United States,Data Communications Analyst,Associate,Onsite,"Salesforce Analytics, Data Science, SQL Server...","Hi,\nI hope you are doing well!\nWe have an op...",490,11,Data Communications Analyst Looking for Salesf...


As expected, I don't get really high similarity scores because the vectorizer is fitted on the jobs that reflect a 'specific' field/category, instead the resumes are more various.
Even though, the model is able to recommend suitable jobs.