# **Project Title: Make Job Hunting Easier: LinkedIn Job Recommendation Based on Applicants' Description**

team member: Shaoying Zheng, Zhongrui Ning, Xiao Pu


# **Overview**

Our project hopes to match the most suitable job in the LinkedIn system with the information provided by the applicant, including their education, skills, ideal industry, ideal salary, etc. Ideally, the model we build will be adaptive, it can adjust and provide suitable job matches even with incomplete details.

# **Motivation**

Job haunting process for college students is daunting, especially in today's rapidly evolving labor market, where new-graduate job seekers are faced with an overwhelming number of job postings. When going through job application websites or apps like LinkedIn, many applicants spend a considerable amount of time filtering through irrelevant or unsuitable jobs, leading to inefficiency and frustration. Therefore, developing a smart, data-driven recommendation system that can make job hunting more personalized, efficient, and tailored to each individual's profile would provide immense value to job seekers.

Here are several specific questions we aim to explore:
1. What are the most common skills listed in job postings across various industries?

  What we hope to learn: By identifying the most frequently mentioned skills, we hope to find some "universal" skills in this era.
2. How could job hunters with different background find suitable jobs?




# **Data Sources**

Source: LinkedIn Job Postings (2023 - 2024)
- A Snapshot Into the Current Job Market including company, jobs and mapping datasets.
https://www.kaggle.com/datasets/arshkon/linkedin-job-postingslo
This data source contains a nearly comprehensive record of 124,000+ job postings listed in 2023 and 2024. Each individual posting contains dozens of valuable attributes for both postings and companies, including the title, job description, salary, location, application URL, and work-types (remote, contract, etc), in addition to separate files containing the benefits, skills, and industries associated with each posting.


# **Data description**

We use ER diagram to descibe the relationship between dataframes and columns.
1. companies
2. company_industries
3. employee_counts

![image.png](attachment:image.png)


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

In [2]:
# import the csv files and save as pddataframes
companies = pd.read_csv('https://raw.githubusercontent.com/Koi4595/SI-618-FA-2024/refs/heads/main/Projectdata/company/companies.csv')
company_industries = pd.read_csv('https://raw.githubusercontent.com/Koi4595/SI-618-FA-2024/refs/heads/main/Projectdata/company/company_industries.csv')
job_industries = pd.read_csv('https://raw.githubusercontent.com/Koi4595/SI-618-FA-2024/refs/heads/main/Projectdata/jobs/job_industries.csv')
employee_counts = pd.read_csv('https://raw.githubusercontent.com/Koi4595/SI-618-FA-2024/refs/heads/main/Projectdata/company/employee_counts.csv')
# benefits = pd.read_csv('https://raw.githubusercontent.com/Koi4595/SI-618-FA-2024/refs/heads/main/Projectdata/jobs/benefits.csv')
skills = pd.read_csv('https://raw.githubusercontent.com/Koi4595/SI-618-FA-2024/refs/heads/main/Projectdata/mappings/skills.csv')


In [3]:
# looking at the missing values of dataframes
print(companies.isnull().sum())
print(employee_counts.isnull().sum())
print(company_industries.isnull().sum())
print(job_industries.isnull().sum())
print(skills.isnull().sum())

company_id         0
name               1
description      297
company_size    2774
state             22
country            0
city               1
zip_code          28
address           22
url                0
dtype: int64
company_id        0
employee_count    0
follower_count    0
time_recorded     0
dtype: int64
company_id    0
industry      0
dtype: int64
job_id         0
industry_id    0
dtype: int64
skill_abr     0
skill_name    0
dtype: int64


# **Data Manipulation**

## Steps:
**1. Handle missing values:**

**2.Filtering out rows:**

**3. Merge dataframes:**

**4.create new columns**

1. The data has been successfully imported and displayed in the notebook.
2. Datasets have been successfully merged
3. The data is in a 'tidy' and usable format. In other words, you have created one dataset that can be used for analysis. This might involve cleaning the data, dealing with missing values, creating new columns based on existing data, etc.
4. The ability to perform some basic data manipulation techniques. This might include creating new columns, filtering out rows, or other techniques that you have learned in class.
5. The ability to use the tools we have learned in class to visualize your data in some appropriate way. This might include using libraries like matplotlib or seaborn to create plots, or using pandas to group data in a variety of ways.
6. You should demonstrate that you are able to produce a summary of the data, in a format that is easy to read and understand.


In [None]:
# 1. Handle missing values


In [None]:
# 2. Filtering out rows

In [None]:
# 3. Merge dataframes

In [None]:
# 4. Create new columns

# **Data visualization**

In [None]:
# 1.

# **Reference**

https://www.kaggle.com/code/muhammadrifqimaruf/top10-recommendation-linkedin-job-posting

