# Job Recommendation System

Simple job recommendation system using TF-IDF and cosine similarity. Loads job data from a CSV file, combines relevant columns, and uses TF-IDF vectorization to create a numerical representation of the job descriptions. The get_recommendations function takes a job title as input, calculates its similarity to all jobs in the dataset, and returns the top 5 most similar jobs along with their key skills and salaries.

Dataset: https://statso.io/jobs-dataset/

Example Solution: https://thecleverprogrammer.com/2022/12/12/job-recommendation-system-using-python/

Hugging Face: https://huggingface.co/spaces/alperugurcan/job-recommendation

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
df = pd.read_csv('jobs.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Job Salary,Job Experience Required,Key Skills,Role Category,Functional Area,Industry,Job Title
0,0,Not Disclosed by Recruiter,5 - 10 yrs,Media Planning| Digital Media,Advertising,"Marketing , Advertising , MR , PR , Media Plan...","Advertising, PR, MR, Event Management",Media Planning Executive/Manager
1,1,Not Disclosed by Recruiter,2 - 5 yrs,pre sales| closing| software knowledge| clien...,Retail Sales,"Sales , Retail , Business Development","IT-Software, Software Services",Sales Executive/Officer
2,2,Not Disclosed by Recruiter,0 - 1 yrs,Computer science| Fabrication| Quality check|...,R&D,"Engineering Design , R&D","Recruitment, Staffing",R&D Executive
3,3,"2,00,000 - 4,00,000 PA.",0 - 5 yrs,Technical Support,Admin/Maintenance/Security/Datawarehousing,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Technical Support Engineer
4,4,Not Disclosed by Recruiter,2 - 5 yrs,manual testing| test engineering| test cases|...,Programming & Design,IT Software - QA & Testing,"IT-Software, Software Services",Testing Engineer


In [3]:
df.drop(columns=['Unnamed: 0'], inplace=True)  # Remove the 'Unnamed: 0' column from the DataFrame
df['combined'] = df['Job Title'] + ' ' + df['Key Skills'] + ' ' + df['Role Category'] + ' ' + df['Functional Area'] + ' ' + df['Industry']  # Create a new column by combining multiple columns
df.head()  # Display the first few rows of the updated DataFrame

Unnamed: 0,Job Salary,Job Experience Required,Key Skills,Role Category,Functional Area,Industry,Job Title,combined
0,Not Disclosed by Recruiter,5 - 10 yrs,Media Planning| Digital Media,Advertising,"Marketing , Advertising , MR , PR , Media Plan...","Advertising, PR, MR, Event Management",Media Planning Executive/Manager,Media Planning Executive/Manager Media Planni...
1,Not Disclosed by Recruiter,2 - 5 yrs,pre sales| closing| software knowledge| clien...,Retail Sales,"Sales , Retail , Business Development","IT-Software, Software Services",Sales Executive/Officer,Sales Executive/Officer pre sales| closing| s...
2,Not Disclosed by Recruiter,0 - 1 yrs,Computer science| Fabrication| Quality check|...,R&D,"Engineering Design , R&D","Recruitment, Staffing",R&D Executive,R&D Executive Computer science| Fabrication| ...
3,"2,00,000 - 4,00,000 PA.",0 - 5 yrs,Technical Support,Admin/Maintenance/Security/Datawarehousing,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Technical Support Engineer,Technical Support Engineer Technical Support ...
4,Not Disclosed by Recruiter,2 - 5 yrs,manual testing| test engineering| test cases|...,Programming & Design,IT Software - QA & Testing,"IT-Software, Software Services",Testing Engineer,Testing Engineer manual testing| test enginee...


In [4]:
tfidf = TfidfVectorizer(stop_words='english')  # Create a TF-IDF vectorizer, ignoring common English words
tfidf_matrix = tfidf.fit_transform(df['combined'])  # Apply TF-IDF to the 'combined' column, creating a TF-IDF matrix

In [5]:
def get_recommendations(job_title, top_n=5):
    job_vec = tfidf.transform([job_title])  # Transform the input job title into a TF-IDF vector

    cosine_similarities = cosine_similarity(job_vec, tfidf_matrix).flatten()  # Calculate cosine similarities between input job and all jobs
    
    related_docs_indices = cosine_similarities.argsort()[:-top_n-1:-1]  # Get indices of top N most similar jobs
    
    return df.iloc[related_docs_indices][['Job Title', 'Key Skills', 'Job Salary']]  # Return DataFrame with top N similar jobs

In [6]:
user_job = input("Type your job title: ")  # Prompt user to enter a job title
recommendations = get_recommendations(user_job)  # Get recommendations based on user input
print(recommendations)  # Print the recommendations

                                Job Title  \
19230                Retail Store Manager   
12789                       Counter Sales   
2961   Sales/Business Development Manager   
1380   Sales/Business Development Manager   
4656              Sales Executive/Officer   

                                            Key Skills  \
19230            Customer Service| Sales| retail sales   
12789                              Retail Sales| Sales   
2961    Sales| Retail Sales| selling| sales management   
1380    Retail Sales| Sales| selling| sales management   
4656    Retail Sales| Store Sales| Sales| Direct Sales   

                         Job Salary  
19230   Not Disclosed by Recruiter   
12789   Not Disclosed by Recruiter   
2961    Not Disclosed by Recruiter   
1380    Not Disclosed by Recruiter   
4656    Not Disclosed by Recruiter   
