# Employer Review Prediction
## Reynara Ezra Pratama

## Background

## Business Understanding

1. Mengetahui *review* yang diberikan oleh pegawai terhadap perusahaan.
2. Memprediksi *review* yang diberikan dan mengkategorikannya ke dalam *review* yang bersifat positif, netral, atau negatif.

## Data Understanding

1. `ReviewTitle` : Topik dari *review*.
2. `CompleteReview` : *Review* yang diberikan pegawai perusahaan.
3. `URL` : *Uniform Resource Locator*.
4. `Rating` : Penilaian yang diberikan pegawai perusahaan.
5. `ReviewDetails` : Detail mengenai *review*.

## Import Library

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
import nltk

import warnings 
warnings.filterwarnings('ignore')

## Loading Dataset

**Load Data From Github**

In [2]:
# url = "https://raw.githubusercontent.com/ReynaraEzra/Employer-Review/main/data_input/results.json"
# df = pd.read_json(url)

**Load Data From Local File**

In [4]:
df = pd.read_json('data_input/results.json')

## Checking Dataset

In [5]:
df.head()

Unnamed: 0,ReviewTitle,CompleteReview,URL,Rating,ReviewDetails
0,Productive,"Good company, cool workplace, work load little...",https://in.indeed.com/cmp/Reliance-Industries-...,3,"(Current Employee) - Ghansoli - August 30,..."
1,Stressful,1. Need to work on boss's whims and fancies 2....,https://in.indeed.com/cmp/Reliance-Industries-...,3,"(Former Employee) - - August 26, 2021"
2,Good Company for Every employee,"Good company for every Engineers dream, Full M...",https://in.indeed.com/cmp/Reliance-Industries-...,5,"(Former Employee) - - August 17, 2021"
3,Productive,I am just pass out bsc in chemistry Typical da...,https://in.indeed.com/cmp/Reliance-Industries-...,5,"(Current Employee) - - August 17, 2021"
4,Non productive,Not so fun at work just blame games Target pe...,https://in.indeed.com/cmp/Reliance-Industries-...,1,"(Former Employee) - - August 9, 2021"


In [6]:
df.tail()

Unnamed: 0,ReviewTitle,CompleteReview,URL,Rating,ReviewDetails
145204,Definitely very good place to work and can hav...,We get a lot to learn in the company. Very sys...,https://in.indeed.com/cmp/Tata-Consultancy-Ser...,4,"(Former Employee) - - January 20, 2012"
145205,IT Services Company; Great scope for improvement.,Lot of scope to learn different technologies u...,https://in.indeed.com/cmp/Tata-Consultancy-Ser...,4,"(Former Employee) - - January 19, 2012"
145206,"Productive, fun to work, great place to do cer...","An overall positive experience, nice environme...",https://in.indeed.com/cmp/Tata-Consultancy-Ser...,4,"(Former Employee) - - January 19, 2012"
145207,Great place to start the career.,Happy that I've started my career from such a ...,https://in.indeed.com/cmp/Tata-Consultancy-Ser...,3,"(Former Employee) - - January 7, 2012"
145208,Nice place to work,Got good experience and knowledge about my wor...,https://in.indeed.com/cmp/Tata-Consultancy-Ser...,5,"(Former Employee) - - December 19, 2011"


In [7]:
df.sample(5)

Unnamed: 0,ReviewTitle,CompleteReview,URL,Rating,ReviewDetails
114481,Good place to work,It was a good place to work with friendly peop...,https://in.indeed.com/cmp/Cognizant-Technology...,4,"(Former Employee) - - January 5, 2017"
50126,Management of the entire organisation was very...,HDFC is India's most successful private sector...,https://in.indeed.com/cmp/Hdfc-Bank/reviews?st...,4,"(Former Employee) - - April 13, 2019"
32992,Great Research Groups at MSR,Was an intern at Microsoft Research India. Ver...,https://in.indeed.com/cmp/Microsoft/reviews?st...,5,"(Current Employee) - - November 15, 2018"
11266,fun and challenging envoirment,Marriott Hyderabad have wonderful team with he...,https://in.indeed.com/cmp/Marriott-Internation...,5,"(Current Employee) - - October 17, 2016"
54045,Productive and Fun,The job was well secured and paid well. Also t...,https://in.indeed.com/cmp/Wells-Fargo/reviews?...,4,"(Former Employee) - - April 12, 2020"


## Check Characteristic Data

**Data Shape**

In [8]:
df.shape

(145209, 5)

**Data Columns**

In [9]:
df.columns

Index(['ReviewTitle', 'CompleteReview', 'URL', 'Rating', 'ReviewDetails'], dtype='object')

**Data Info**

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145209 entries, 0 to 145208
Data columns (total 5 columns):
 #   Column          Non-Null Count   Dtype 
---  ------          --------------   ----- 
 0   ReviewTitle     145209 non-null  object
 1   CompleteReview  145209 non-null  object
 2   URL             145209 non-null  object
 3   Rating          145209 non-null  int64 
 4   ReviewDetails   145209 non-null  object
dtypes: int64(1), object(4)
memory usage: 5.5+ MB


**Descriptive Statistic**

In [11]:
df.describe()

Unnamed: 0,Rating
count,145209.0
mean,4.053661
std,0.925805
min,1.0
25%,4.0
50%,4.0
75%,5.0
max,5.0


**Check Missing Value**

In [12]:
df.isnull().sum()

ReviewTitle       0
CompleteReview    0
URL               0
Rating            0
ReviewDetails     0
dtype: int64

**Check and Drop Duplicate Data**

In [14]:
df = df.drop_duplicates(keep='first')

In [15]:
df.shape

(145191, 5)

## Feature Extraction

In [16]:
df.head(3)

Unnamed: 0,ReviewTitle,CompleteReview,URL,Rating,ReviewDetails
0,Productive,"Good company, cool workplace, work load little...",https://in.indeed.com/cmp/Reliance-Industries-...,3,"(Current Employee) - Ghansoli - August 30,..."
1,Stressful,1. Need to work on boss's whims and fancies 2....,https://in.indeed.com/cmp/Reliance-Industries-...,3,"(Former Employee) - - August 26, 2021"
2,Good Company for Every employee,"Good company for every Engineers dream, Full M...",https://in.indeed.com/cmp/Reliance-Industries-...,5,"(Former Employee) - - August 17, 2021"


In [21]:
dummy = df['URL'].str.split('/')
dummy.head()

0    [https:, , in.indeed.com, cmp, Reliance-Indust...
1    [https:, , in.indeed.com, cmp, Reliance-Indust...
2    [https:, , in.indeed.com, cmp, Reliance-Indust...
3    [https:, , in.indeed.com, cmp, Reliance-Indust...
4    [https:, , in.indeed.com, cmp, Reliance-Indust...
Name: URL, dtype: object

In [22]:
dummy = dummy.str[4]
dummy.head()

0    Reliance-Industries-Ltd
1    Reliance-Industries-Ltd
2    Reliance-Industries-Ltd
3    Reliance-Industries-Ltd
4    Reliance-Industries-Ltd
Name: URL, dtype: object