# Bussiness Problem
### **To create a recommendation sysytem to identify jobs based on specific skills.**

# Data Understanding 
**Understanding each feature**

In [1]:
# Importing all neccessary libraries 
import pandas as pd
import numpy as np
import warnings

from sklearn.feature_extraction.text import TfidfVectorizer # To vectorize the documents
from sklearn.metrics.pairwise import cosine_similarity # To find similarity score between records

warnings.simplefilter("ignore")
pd.set_option("display.max_columns",None)

In [2]:
df = pd.read_csv("job_dataset.csv")
df.head(3)

Unnamed: 0,JobID,Title,ExperienceLevel,YearsOfExperience,Skills,Responsibilities,Keywords
0,NET-F-001,.NET Developer,Fresher,0-1,C#; VB.NET basics; .NET Framework; .NET Core f...,Assist in coding and debugging applications; L...,.NET; C#; ASP.NET MVC; Entity Framework; SQL S...
1,NET-F-002,.NET Developer,Fresher,0-1,C#; .NET Framework basics; ASP.NET; Razor; HTM...,Write simple C# programs under guidance; Suppo...,.NET; C#; ASP.NET MVC; Entity Framework; SQL S...
2,NET-F-003,.NET Developer,Fresher,0-1,C#; VB.NET basics; .NET Core; ASP.NET MVC; HTM...,Contribute to development of small modules; As...,.NET; C#; ASP.NET MVC; SQL Server; Entity Fram...


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1068 entries, 0 to 1067
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   JobID              1068 non-null   object
 1   Title              1067 non-null   object
 2   ExperienceLevel    1068 non-null   object
 3   YearsOfExperience  1068 non-null   object
 4   Skills             1068 non-null   object
 5   Responsibilities   1068 non-null   object
 6   Keywords           1068 non-null   object
dtypes: object(7)
memory usage: 58.5+ KB


`Observation` :   Title consists one null value 

In [4]:
# converting all columns names to lower case (Column name standardization)
df.columns = df.columns.str.lower()
df.columns

Index(['jobid', 'title', 'experiencelevel', 'yearsofexperience', 'skills',
       'responsibilities', 'keywords'],
      dtype='object')

In [5]:
df["jobid"].unique() # Drop

array(['NET-F-001', 'NET-F-002', 'NET-F-003', ..., 'WEB-E-018',
       'WEB-E-019', 'WEB-E-020'], shape=(1038,), dtype=object)

In [6]:
print("No of uniques values   " , df["title"].nunique())
print("------------------------------")
df["title"].value_counts()

No of uniques values    218
------------------------------


title
.NET Developer                 20
AI Prompt Engineer             20
AR/VR Developer                20
Business Analyst               20
Data Engineer                  20
                               ..
Senior Interaction Designer     1
UX Consultant                   1
UX Strategist                   1
UX Design Lead                  1
Staff UX Designer               1
Name: count, Length: 218, dtype: int64

In [7]:

print("No of unique values   " , df["experiencelevel"].nunique())
print("---------------------------")
df["experiencelevel"].value_counts()

No of unique values    11
---------------------------


experiencelevel
Experienced         476
Fresher             363
Entry-Level          66
Senior-Level         66
Mid-Level            60
Senior               15
Lead                  7
Junior                5
Mid-level             5
Mid-Senior Level      3
Mid-Senior            2
Name: count, dtype: int64

In [8]:
print("No of unique values   " , df["yearsofexperience"].nunique())
print("---------------------------")
df["yearsofexperience"].value_counts()

No of unique values    110
---------------------------


yearsofexperience
0-1           247
0–1 year      104
5+             49
0              48
2-5            35
             ... 
5–7 years       1
1–3 years       1
13-17           1
7–10 years      1
7-9             1
Name: count, Length: 110, dtype: int64

In [9]:
print("No of unique values   " , df["skills"].nunique())
print("---------------------------")
df["skills"].value_counts()

No of unique values    969
---------------------------


skills
Salesforce; HubSpot; Predictive Analytics; Lead Generation; Negotiation; Closing Techniques; Sales Automation; Social Selling; Data-driven Sales; Relationship Management               6
Advanced Technical Writing; Structured Authoring; API Documentation; Adobe FrameMaker; MadCap Flare; XML/HTML/CSS; CMS; AI-assisted Documentation; Version Control; UX Documentation    5
Salesforce; HubSpot; Lead Generation; Negotiation; Closing Deals; Pipeline Management; Product Demos; Sales Forecasting; Social Selling; Time Management                                5
Technical Writing; Editing; Proofreading; Adobe FrameMaker; MadCap Flare; CMS; HTML/CSS/XML; AI Writing Tools; Version Control; Agile Documentation                                     5
Clear Writing; Proofreading; Basic HTML/CSS; Research Skills; CMS Basics; Markdown; Collaboration; Time Management; Attention to Detail; Adaptability                                   4
                                                               

In [10]:
print("No of unique values   " , df["responsibilities"].nunique())
print("---------------------------")
df["responsibilities"].value_counts()

No of unique values    1047
---------------------------


responsibilities
Define and drive product strategy; Collaborate with engineering, design, and marketing teams; Analyze product metrics and conduct experiments; Lead agile ceremonies; Engage stakeholders and manage expectations; Mentor junior PMs                                                                                                                                                                                    3
Drive product vision and roadmap; Lead cross-functional teams; Oversee product analytics and experiments; Conduct user research; Ensure agile best practices; Manage stakeholder communication; Optimize product performance metrics                                                                                                                                                                                    3
Define and drive product strategy; Lead product roadmap execution; Collaborate across teams; Conduct user research and analyze metrics; Ensure agile best practices

In [11]:
print("No of unique values   " , df["keywords"].nunique()) # Drop--> these feture values are similar to skills feature
print("---------------------------")
df["keywords"].value_counts()

No of unique values    678
---------------------------


keywords
JavaScript; TypeScript; React; Node.js; Jest; Webpack; Redux; CI/CD; Docker                                                                                             15
Unity; Unreal Engine; C#; ARKit; ARCore; 3D Graphics; Spatial Computing; Performance Optimization; Cross-Platform; Machine Learning                                     15
C++; C#; Unity; Unreal Engine; Game AI; Shader Programming; Multiplayer; Performance Optimization                                                                       15
Product Strategy; Data-Driven Decisions; Roadmapping; Agile Leadership; Stakeholder Management; User-Centered Design; Growth Metrics; Cross-Functional Collaboration    15
Ethical Hacking; Penetration Testing; Network Security; Python; Linux; Windows Security; Metasploit; Burp Suite; Cryptography; Malware Analysis                         14
                                                                                                                                        

# Data Cleaning 

In [12]:
# droping missing values record (title of one job missing )
df.dropna(axis = 0, inplace = True, ignore_index = True)
# droping jobid and keywords features which are not important.
df.drop(["keywords","jobid"], axis = 1, inplace = True )

In [13]:
# Standardizing years of experience feature
for i in range(len(df)):
    df["yearsofexperience"][i] = df["yearsofexperience"][i].split()[0]
for i in range(len(df)):
    if df["yearsofexperience"][i].find("–") >= 0:
        df["yearsofexperience"][i] = df["yearsofexperience"][i].split("–")[0] + "-" + df["yearsofexperience"][i].split("–")[1]

In [14]:
# Standardizing experiencelevel feature
for i in range(len(df)):
    df["experiencelevel"][i] = df["experiencelevel"][i].lower().split(" ")[0]
for i in range(len(df)):
    if "-level" in df["experiencelevel"][i]:
        df["experiencelevel"][i] = df["experiencelevel"][i].split("-level")[0]

In [15]:
# Replacing semi-colon(";") with space(" ") for easy vectorization
df["skills"] = df["skills"].str.replace("; ", " ")  

In [16]:
# Appending experience level and job title to the skills, so that it may become key(important) for the similarity score.
df["combined_features"] = np.nan # creating an empty column
for i in range(len(df)):
    df["combined_features"][i] = df["skills"][i] + " " + df["experiencelevel"][i] + " " + df["title"][i]

In [17]:
df["combined_features"][0] # cross checking single record 

'C# VB.NET basics .NET Framework .NET Core fundamentals ASP.NET MVC HTML CSS JavaScript basics SQL Server Entity Framework basics LINQ Visual Studio Git Unit Testing basics fresher .NET Developer'

### Tf-idf Vectorizer

In [18]:
# TF-IDF Vectorizer initialization
regex = r"[A-Za-z0-9+#/.]+"   # custom regex (pattern) to vectorize 
vectorizer = TfidfVectorizer(token_pattern = regex)

# fit the vectorizer to skills feature
features_vector = vectorizer.fit_transform(df["combined_features"]).toarray()

# Similarity score
similarity = cosine_similarity(features_vector)

In [19]:
vectorizer.get_feature_names_out()

array(['.net', '/', '27001', ..., 'yarn', 'zero', 'zigbee'],
      shape=(1014,), dtype=object)

In [20]:
print(pd.DataFrame(features_vector,columns = vectorizer.get_feature_names_out()).shape)
print(similarity.shape)

(1067, 1014)
(1067, 1067)


In [21]:
similarity

array([[1.        , 0.73962923, 0.77616339, ..., 0.07063744, 0.03603762,
        0.06731921],
       [0.73962923, 1.        , 0.59269215, ..., 0.06000814, 0.04461931,
        0.05718923],
       [0.77616339, 0.59269215, 1.        , ..., 0.07504299, 0.05579854,
        0.07151781],
       ...,
       [0.07063744, 0.06000814, 0.07504299, ..., 1.        , 0.42212184,
        0.88470552],
       [0.03603762, 0.04461931, 0.05579854, ..., 0.42212184, 1.        ,
        0.40229247],
       [0.06731921, 0.05718923, 0.07151781, ..., 0.88470552, 0.40229247,
        1.        ]], shape=(1067, 1067))

## Recommendation Function

In [22]:
def recomender():
    
    skills = input("Enter your skills separated by space : ")
    while True :  
        try :
            top_n_jobs = int(input("Enter how many recomendations do you want : "))
            if top_n_jobs > 0:
                break
        except  ValueError :
            print("Enter a positive number greater than 0")
    vec = vectorizer.transform([skills])
    sim_score = cosine_similarity(vec, features_vector).flatten()
    sorted_indices = np.argsort(sim_score)[::-1]
    
    print(" "*60 + "Recommendations based on your skills.")
    print(' '*60 + len(f"Recommendation based on your skills.") * "=")
    print()
    j = 1
    for job in sorted_indices[1:top_n_jobs+1]:
        
        print(f"{" "*75}**{j}**")
        print(f"★ Job Title :  {df["title"][job]}")
        print(f"{len("★ Job Title :  ") * "="}{len(df["title"][job]) * "="}")
        print("")
        print(f"    ➤ SKILLS     : {df["skills"][job].replace(" ",", ").replace(",,",",").replace(":," , ":")}.")
        print("")
        print(f"    ➤ EXPERIENCE : {df["experiencelevel"][job]} with an experience of {df["yearsofexperience"][job]} years.")
        print("")
        print(f"    ➤ RESPONSIBILITIES ")
        print(f"    {len("➤ RESPONSIBILITIES ") * "-"}")
        for i in range(len(df["responsibilities"][job].split(";"))) :
            print(f"        → {df["responsibilities"][job].split(";")[i].strip()}.")
        print("*" * 200)
        j+= 1

### **Recommendation system is ready**
**Enter your skills & number of recommendations you want to view job roles.**

In [23]:
# Run this command to enter skills and prefered number of roles
recomender()

Enter your skills separated by space :  python seaborn matplotlib ml
Enter how many recomendations do you want :  3


                                                            Recommendations based on your skills.

                                                                           **1**
★ Job Title :  ML Engineer Trainee

    ➤ SKILLS     : Python, R, Java, ML, fundamentals: supervised/unsupervised, learning, classification, regression, clustering, Data, preprocessing, Feature, engineering, ML, libraries: Scikit-learn, TensorFlow, PyTorch, Keras, Statistics, and, probability, Visualization: Matplotlib, Seaborn, Git/GitHub.

    ➤ EXPERIENCE : fresher with an experience of 0-1 years.

    ➤ RESPONSIBILITIES 
    -------------------
        → Assist in developing ML algorithms.
        → Preprocess datasets and build visualization dashboards.
        → Gain experience in ML pipeline development.
        → Collaborate with team members.
        → Learn ML best practices.
*****************************************************************************************************************************