In [1]:
!git clone https://github.com/AshishJangra27/datasets

Cloning into 'datasets'...
remote: Enumerating objects: 328, done.[K
remote: Counting objects: 100% (93/93), done.[K
remote: Compressing objects: 100% (83/83), done.[K
remote: Total 328 (delta 19), reused 54 (delta 9), pack-reused 235 (from 1)[K
Receiving objects: 100% (328/328), 278.62 MiB | 16.81 MiB/s, done.
Resolving deltas: 100% (145/145), done.
Updating files: 100% (225/225), done.


In [2]:
import pandas as pd

### 1. Data Exploration

#### 1.1) Loading the Dataset

In [3]:
df = pd.read_csv('/content/datasets/Job Postings/jobs.csv.zip')
df.head(2)

Unnamed: 0,job_id,job_role,company,experience,salary,location,rating,reviews,resposibilities,posted_on,job_link,company_link
0,70123010000.0,Branch Banking - Calling For Women Candidates,Hdfc Bank,1-6 Yrs,Not disclosed,"Kolkata, Hyderabad/Secunderabad, Pune, Ahmedab...",4.0,39110 Reviews,"Customer Service,Sales,Relationship Management",1 Day Ago,https://www.naukri.com/job-listings-branch-ban...,https://www.naukri.com/hdfc-bank-jobs-careers-213
1,60123910000.0,Product Owner Senior Manager,Accenture,11-15 Yrs,Not disclosed,"Kolkata, Mumbai, Hyderabad/Secunderabad, Pune,...",4.1,32129 Reviews,"Product management,Market analysis,Change mana...",1 Day Ago,https://www.naukri.com/job-listings-product-ow...,https://www.naukri.com/accenture-jobs-careers-...


#### 1.2) Removing "posted_on" Columns

In [4]:
del df['posted_on']

#### 1.3) Check Null Values

In [5]:
df.isnull().sum()

Unnamed: 0,0
job_id,480
job_role,480
company,481
experience,1749
salary,480
location,1706
rating,36199
reviews,36199
resposibilities,500
job_link,480


#### 1.4) Removign rows with null values in job_id,company and responsibility column

In [6]:
df.dropna(subset = ['job_id','company','resposibilities'], inplace = True)

#### 1.5) Filling Null values in location and experience column with most frequent

In [7]:
df['location'].fillna(df['location'].mode()[0], inplace = True)
df['experience'].fillna(df['experience'].mode()[0], inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['location'].fillna(df['location'].mode()[0], inplace = True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['experience'].fillna(df['experience'].mode()[0], inplace = True)


#### 1.6) Filling Null values in rating and reviews column with 0

In [8]:
df['rating'].fillna(0.0,inplace = True)
df['reviews'].fillna('0 Reviews',inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['rating'].fillna(0.0,inplace = True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['reviews'].fillna('0 Reviews',inplace = True)


In [9]:
df.isnull().sum()

Unnamed: 0,0
job_id,0
job_role,0
company,0
experience,0
salary,0
location,0
rating,0
reviews,0
resposibilities,0
job_link,0


#### 1.7) Remove Duplicates

In [10]:
df.drop_duplicates(subset=['job_link'],inplace = True)

### 2. Data Cleaning

#### 2.1) Cleaning Job_id Column

In [11]:
df['job_id'] = df['job_id'].astype('int').astype('str')

#### 2.2) Creating Company ID Column

In [45]:
df['company_id'] = df['company_link'].str.split('-').str[-1]

#### 2.3) Removing Companies with company_id = 0

In [80]:
df = df[df['company_id'] != '0']

#### 2.4) Cleaning Compan

In [104]:
df['experience'].value_counts()

Unnamed: 0_level_0,count
experience,Unnamed: 1_level_1
5-10 Yrs,8720
3-8 Yrs,3900
2-7 Yrs,3047
1-6 Yrs,3012
4-9 Yrs,2913
...,...
21-28 Yrs,1
18-26 Yrs,1
2-21 Yrs,1
25-28 Yrs,1


In [105]:
df[df['experience'] == '5-10 Yrs']

Unnamed: 0,job_id,job_role,company,experience,salary,location,rating,reviews,resposibilities,job_link,company_link,company_id
5,60123006373,ServiceNow Consultants / Sr. Consultants ( Dev...,Deloitte,5-10 Yrs,Not disclosed,"Kolkata, Hyderabad/Secunderabad, Pune, Chennai...",4.0,10263 Reviews,"Itam,ITSM,Hrsd,GRC,Itom,CSD",https://www.naukri.com/job-listings-servicenow...,https://www.naukri.com/deloitte-jobs-careers-1706,1706
6,50123005519,C++ Developer (Looking For immediate joiners o...,Capgemini,5-10 Yrs,Not disclosed,"Hybrid - Kolkata, Hyderabad/Secunderabad, Pune...",3.9,23786 Reviews,"C++,stl,Node js,SVN,Go,Git,Python",https://www.naukri.com/job-listings-c-develope...,https://www.naukri.com/capgemini-jobs-careers-649,649
14,50123004500,Automation Test Engineer,HCLTech,5-10 Yrs,Not disclosed,Bangalore/Bengaluru,3.8,18348 Reviews,"webdriverio,javascript,Automation Testing",https://www.naukri.com/job-listings-automation...,https://www.naukri.com/hcltech-jobs-careers-460,460
22,151222006599,PowerBI Developer,EY,5-10 Yrs,Not disclosed,"Hybrid - Pune, Mumbai (All Areas)",3.8,6258 Reviews,"pagination,Power Bi,Dax",https://www.naukri.com/job-listings-powerbi-de...,https://www.naukri.com/ey-jobs-careers-9156,9156
29,81122009258,EY India is Hiring For SAP SD Functional Consu...,EY,5-10 Yrs,"13,00,000 - 22,50,000 PA.","Hybrid - Kolkata, Hyderabad/Secunderabad, Pune...",3.8,6258 Reviews,"SAP SD,Sap Hana,SAP Implementation",https://www.naukri.com/job-listings-ey-india-i...,https://www.naukri.com/ey-jobs-careers-9156,9156
...,...,...,...,...,...,...,...,...,...,...,...,...
73633,161122008857,"Factory Medical Officer at Dahej,Gujrat",G. K. Associates,5-10 Yrs,"10,00,000 - 15,00,000 PA.",Dahej,0.0,0 Reviews,"monitor hospitalized patients,Participate in s...",https://www.naukri.com/job-listings-factory-me...,https://www.naukri.com/g-k-associates-jobs-car...,78426
73707,70123004364,Staff Nurse,Sheeba Medical Services,5-10 Yrs,"1,25,000 - 3,50,000 PA.",Bangalore/Bengaluru,0.0,0 Reviews,"BSC NURSING,DIPLOMA NURSING,ANM,Dmlt,GNM",https://www.naukri.com/job-listings-staff-nurs...,https://www.naukri.com/sheeba-medical-services...,123634285
73711,60123009573,Job opening - Retail Branch Banking- CASA Sale...,Suryoday Small Finance Bank,5-10 Yrs,Not disclosed,Bangalore/Bengaluru,4.0,621 Reviews,"Sales,Banking,Retail Branch Banking,Promotions...",https://www.naukri.com/job-listings-opening-re...,https://www.naukri.com/suryoday-small-finance-...,3357842
73736,50123008739,Hiring For Domestic Customer Support Executive,Blinks India Solutions & Services,5-10 Yrs,"1,25,000 - 2,25,000 PA.",Bangalore/Bengaluru,0.0,0 Reviews,"BPO,Call Center,AEGIS STARTEK,Solving Queries,...",https://www.naukri.com/job-listings-hiring-for...,https://www.naukri.com/blinks-india-solutions-...,123583039


In [23]:
df['company_link'].value_counts()

Unnamed: 0_level_0,count
company_link,Unnamed: 1_level_1
https://www.naukri.com/lavya-associates-hr-services-jobs-careers-932996,5222
https://www.naukri.com/accenture-jobs-careers-7682,2928
https://www.naukri.com/varite-jobs-careers-82233,892
https://www.naukri.com/hucon-jobs-careers-1001716,852
https://www.naukri.com/ibm-jobs-careers-16987,656
...,...
https://www.naukri.com/gnx-group-jobs-careers-6368018,1
https://www.naukri.com/congruex-jobs-careers-3715576,1
https://www.naukri.com/ajnalens-a-unit-of-dimension-nxg-jobs-careers-7077783,1
https://www.naukri.com/solaura-power-jobs-careers-123628451,1


In [66]:
df

Unnamed: 0,job_id,job_role,company,experience,salary,location,rating,reviews,resposibilities,job_link,company_link
0,70123006070,Branch Banking - Calling For Women Candidates,Hdfc Bank,1-6 Yrs,Not disclosed,"Kolkata, Hyderabad/Secunderabad, Pune, Ahmedab...",4.0,39110 Reviews,"Customer Service,Sales,Relationship Management",https://www.naukri.com/job-listings-branch-ban...,https://www.naukri.com/hdfc-bank-jobs-careers-213
1,60123905908,Product Owner Senior Manager,Accenture,11-15 Yrs,Not disclosed,"Kolkata, Mumbai, Hyderabad/Secunderabad, Pune,...",4.1,32129 Reviews,"Product management,Market analysis,Change mana...",https://www.naukri.com/job-listings-product-ow...,https://www.naukri.com/accenture-jobs-careers-...
2,60123905898,Employee Relations and Policies Associate Manager,Accenture,3-7 Yrs,Not disclosed,"Kolkata, Mumbai, Hyderabad/Secunderabad, Pune,...",4.1,32129 Reviews,"Business process,Change management,Team manage...",https://www.naukri.com/job-listings-employee-r...,https://www.naukri.com/accenture-jobs-careers-...
3,60123905897,Employee Relations and Policies Specialist,Accenture,3-7 Yrs,Not disclosed,"Kolkata, Mumbai, Hyderabad/Secunderabad, Pune,...",4.1,32129 Reviews,"Business process,Change management,Team manage...",https://www.naukri.com/job-listings-employee-r...,https://www.naukri.com/accenture-jobs-careers-...
4,60123008332,SAP BO Consultant,Mindtree,5-7 Yrs,Not disclosed,"Hybrid - Kolkata, Hyderabad/Secunderabad, Pune...",4.1,3759 Reviews,"SAP BO,PL / SQL,Oracle SQL,SAP Business Object...",https://www.naukri.com/job-listings-sap-bo-con...,https://www.naukri.com/mindtree-jobs-careers-3...
...,...,...,...,...,...,...,...,...,...,...,...
73762,20123002989,Partner Success Executive/Edtech/Punjab,Parth Associates,1-5 Yrs,"7,00,000 - 8,50,000 PA.","Jalandhar, Chandigarh, Amritsar",0.0,0 Reviews,"CRM,Communication Skills,Presentation Skills,C...",https://www.naukri.com/job-listings-partner-su...,https://www.naukri.com/parth-associates-jobs-c...
73763,20123002957,Partner Success Associate/Edtech/Punjab,Parth Associates,1-5 Yrs,"7,00,000 - 8,50,000 PA.","Ludhiana, Patiala, Moga",0.0,0 Reviews,"CRM,Communication Skills,Presentation Skills,C...",https://www.naukri.com/job-listings-partner-su...,https://www.naukri.com/parth-associates-jobs-c...
73764,231222003986,Hiring For International Voice Process | Gurga...,First Step Solutions,1-4 Yrs,"3,00,000 - 4,50,000 PA.","New Delhi, Gurgaon/Gurugram",0.0,0 Reviews,"Customer Service,US Process,Hospitality,Custom...",https://www.naukri.com/job-listings-hiring-for...,https://www.naukri.com/first-step-solutions-jo...
73765,171220001449,Fresher Engineer,Sofcon,0-2 Yrs,Not disclosed,"Kota, Udaipur, Banswara, Bhiwadi, Jaipur, Alwa...",2.8,3 Reviews,"ENGINEERING,B Tech Fresher,AutoCAD,Degree,PLC,...",https://www.naukri.com/job-listings-fresher-en...,https://www.naukri.com/sofcon-jobs-careers-110403


In [None]:
- Number of Active Jobs in Any Company | Top Companies
- Company which provides maximum/minimum average Salary
- Salary vs Experience of any Company
- Company hiring for most numbers of locations

In [None]:
1. Analyze Company Ratings
  - Calculate the average rating for each company
  - Identify companies with the highest and lowest average ratings.
  - Compare the distribution of ratings across different companies.
  - Calculate the total number of reviews for each company. | Sort to get most and LiteralString

- List the top 10 companies based on average rating and number of reviews.
- Analyze the characteristics and practices of these top-rated companies. | Most Popular Responsibilities of top 100 companies
