# 💼 Job Recommender

## 📌 Introduction

The job recommender system suggests relevant jobs based on various criteria to enhance user experience and job discovery.

### 🔍 Recommendation Criteria:
- **Location-based Trends**: Identify popular locations among job seekers and creators.  
- **Similar Job Titles/Descriptions**: Recommend jobs based on job title and description similarity.  
- **User Profile Matching**: Suggest jobs based on profiles of similar users.  


##  🖥️  CODE START

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import ast 
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
import zipfile


import warnings; warnings.simplefilter('ignore')

### Packages we are importing and why we would use it:

* **pyplot** from matplotlib: To plot it seems?
* **seaborn** : to create histographs and charts
* **pandas** : For dataframe
* **numpy** : for mathematical calculations
* **ast**: 0 idea
* **scipy**: 0 idea
* **sklearn**:

    - TfidifVectorizer: to create TF-IDF vectors to find the importance of each word and give it a numerical value and display it in a matrix
    - CountVectorizer: 0 idea
    - linear_kernel: 0 idea
    - cosine_simmilarity: 0 idea

In [None]:
zip_file_path = "/kaggle/input/job-recommendation/jobs.zip" 

with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:

    zip_ref.extractall()

**We use the zipfile module previously imported to  extract the jobs.tsv file from the input dataset since it was present inside of a zip file and then we will access it from the working directory.**

In [None]:
apps = pd.read_csv('/kaggle/input/job-recommendation/apps.tsv', delimiter='\t',encoding='utf-8')
user_history = pd.read_csv('/kaggle/input/job-recommendation/user_history.tsv', delimiter='\t',encoding='utf-8')
jobs = pd.read_csv('/kaggle/working/jobs.tsv', delimiter='\t',encoding='utf-8', on_bad_lines="skip")
users = pd.read_csv('/kaggle/input/job-recommendation/users.tsv' ,delimiter='\t',encoding='utf-8')
test_users = pd.read_csv('/kaggle/input/job-recommendation/test_users.tsv', delimiter='\t',encoding='utf-8')

All the data is now loaded in apps [] , user_history [], jobs[], users[], test_users[]

In [None]:
apps.head()

In [None]:
shape=apps.shape
columns=apps.columns
print(shape)
print(columns)

In [None]:
user_history.head()

In [None]:
shape=user_history.shape
columns=user_history.columns
print(shape)
print(columns)

In [None]:
jobs.head()

In [None]:
shape=jobs.shape
columns=jobs.columns
print(shape)
print(columns)

In [None]:
users.head()

In [None]:
shape=users.shape
columns=users.columns
print(shape)
print(columns)

In [None]:
test_users.head()

In [None]:
shape=test_users.shape
columns=test_users.columns
print(shape)
print(columns)

## 📊 EDA and Data Cleaning

### 🔍 Understanding the Data

From our extracted columns, we found **three** data frames that contain the `'Split'` attribute. This means the data is categorized as either **training** or **testing**, requiring us to create separate data frames accordingly.

### 📂 Identified Data Frames:
1. **apps**  
2. **user_history**  
3. **users**  


In [None]:
apps_training=apps.loc[apps['Split']=='Train']
apps_test=apps.loc[apps['Split']=='Test']

In [None]:
user_history=user_history.loc[user_history['Split']=='Train']
user_history=user_history.loc[user_history['Split']=='Test']

In [None]:
users_training=users.loc[users['Split']=='Train']
users_test=users.loc[users['Split']=='Test']