# Final Project Part 2

# Dataset Basic Info
### Name: Human Resources Data Set
### Access: obtained from Kaggle.com
### URL: https://www.kaggle.com/rhuebner/human-resources-data-set/version/4#salary_grid.csv
### Licence: [Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

- Users are free to share and adapt the dataset as long as the user provide appropriate credit, link to the license, and indication of the changes.  

### Size: 176kb, 5 items
--------------------------------------------------------------------------------------------------------

- Code:

    Jupyter notebook with an interactive dashboard that helps an expert explore your dataset thoroughly.
    There should be a "dashboard" type aspect to this - i.e. a linked view exploring your dataset in an interactive way
    Do not delete any cells, just comment them out. Show your work.

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
import bqplot
import traitlets
import ipywidgets
%matplotlib inline
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


In [3]:
hr = pd.read_csv('HRDataset_v9.csv')
hr.head()

Unnamed: 0,Employee Name,Employee Number,MarriedID,MaritalStatusID,GenderID,EmpStatus_ID,DeptID,Perf_ScoreID,Age,Pay Rate,...,Date of Hire,Days Employed,Date of Termination,Reason For Term,Employment Status,Department,Position,Manager Name,Employee Source,Performance Score
0,"Brown, Mia",1103024456,1,1,0,1,1,3,30,28.5,...,10/27/2008,3317,,N/A - still employed,Active,Admin Offices,Accountant I,Brandon R. LeBlanc,Diversity Job Fair,Fully Meets
1,"LaRotonda, William",1106026572,0,2,1,1,1,3,34,23.0,...,1/6/2014,1420,,N/A - still employed,Active,Admin Offices,Accountant I,Brandon R. LeBlanc,Website Banner Ads,Fully Meets
2,"Steans, Tyrone",1302053333,0,0,1,1,1,3,31,29.0,...,9/29/2014,1154,,N/A - still employed,Active,Admin Offices,Accountant I,Brandon R. LeBlanc,Internet Search,Fully Meets
3,"Howard, Estelle",1211050782,1,1,0,1,1,9,32,21.5,...,2/16/2015,58,4/15/2015,N/A - still employed,Active,Admin Offices,Administrative Assistant,Brandon R. LeBlanc,Pay Per Click - Google,N/A- too early to review
4,"Singh, Nan",1307059817,0,0,0,1,1,9,30,16.56,...,5/1/2015,940,,N/A - still employed,Active,Admin Offices,Administrative Assistant,Brandon R. LeBlanc,Website Banner Ads,N/A- too early to review


In [4]:
hr = hr[hr.Perf_ScoreID != 9]
hr['RaceID'] = hr.RaceDesc.astype('category')
hr['RaceID']=hr['RaceID'].cat.codes
hr.head()

Unnamed: 0,Employee Name,Employee Number,MarriedID,MaritalStatusID,GenderID,EmpStatus_ID,DeptID,Perf_ScoreID,Age,Pay Rate,...,Days Employed,Date of Termination,Reason For Term,Employment Status,Department,Position,Manager Name,Employee Source,Performance Score,RaceID
0,"Brown, Mia",1103024456,1,1,0,1,1,3,30,28.5,...,3317,,N/A - still employed,Active,Admin Offices,Accountant I,Brandon R. LeBlanc,Diversity Job Fair,Fully Meets,2
1,"LaRotonda, William",1106026572,0,2,1,1,1,3,34,23.0,...,1420,,N/A - still employed,Active,Admin Offices,Accountant I,Brandon R. LeBlanc,Website Banner Ads,Fully Meets,2
2,"Steans, Tyrone",1302053333,0,0,1,1,1,3,31,29.0,...,1154,,N/A - still employed,Active,Admin Offices,Accountant I,Brandon R. LeBlanc,Internet Search,Fully Meets,5
5,"Smith, Leigh Ann",711007713,1,1,0,5,1,3,30,20.5,...,730,9/25/2013,career change,Voluntarily Terminated,Admin Offices,Administrative Assistant,Brandon R. LeBlanc,Diversity Job Fair,Fully Meets,1
6,"LeBlanc, Brandon R",1102024115,1,1,1,1,1,3,33,55.0,...,691,,N/A - still employed,Active,Admin Offices,Shared Services Manager,Janet King,Monster.com,Fully Meets,5


- Note:
    Perf_ScoreID = 9 is excluded from the analysis since it represent "too early to give a score"

In [23]:
nPerf = len(np.unique(hr['Perf_ScoreID']))
nRace= len(np.unique(hr['RaceID']))
Perf_bins = np.linspace(0,nPerf,nPerf+1) 
Race_bins = np.linspace(0,nRace,nRace+1)

# print(Perf_bins, Race_bins)

hist2d, perf_edge, race_edge = np.histogram2d(hr['Perf_ScoreID'],hr['RaceID'],
                                                    weights = hr['Pay Rate'],
                                                    bins = [Perf_bins, Race_bins])

perf_center = range(len(perf_edge)-1)
race_center = range(len(race_edge)-1)

hist2d[hist2d <= 0] = np.nan

# heatmap
col_sc = bqplot.ColorScale(scheme='Reds')
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.OrdinalScale()

# axes
c_ax = bqplot.ColorAxis(scale=col_sc, label = 'Pay Rate',orientation='vertical',side='top')
x_ax = bqplot.Axis(scale=x_sc, label='Perf_ScoreID')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='RaceID',tick_values = np.unique(hr['RaceID']))

# marks: heatmap
heat_map = bqplot.GridHeatMap(color = hist2d, row = perf_center, column = race_center,
                             scales={'color':col_sc, 'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             anchor_style={'fill':'blue'},
                             selected_style={'opacity':1.0},
                             unselected_stye={'opacity':1.0})
heat_fig = bqplot.Figure(marks = [heat_map],axes=[c_ax,x_ax,y_ax])
# heat_fig

In [22]:
# (II) Line plot of pay rate vs. MaritalStatusID
x_scl = bqplot.OrdinalScale() # marital status id (year)
y_scl = bqplot.LinearScale() # pay rate (footage)
ax_xcl = bqplot.Axis(label='MaritalStatusID', scale=x_scl,tick_rotate=90)
ax_ycl = bqplot.Axis(label='Pay Rate', scale=y_scl, 
                    orientation = 'vertical', side='left')
i,j = 0,0
performance = [perf_edge[i]]
race = [race_edge[j]]
mask = ( (hr['Perf_ScoreID']==performance)&(hr['RaceID']==race))

# lets get the ufo plot year => add to building dataset
payrate, payrate_edge = np.histogram(hr['MaritalStatusID'][mask], 
                             weights=hr['Pay Rate'][mask],
                             bins=np.unique(hr['MaritalStatusID']))

# calculate center of our bins
payrate_centers = [np.unique(hr['MaritalStatusID'])]

payrate_line = bqplot.Lines(x=payrate_centers, y=payrate,
                           scales={'x':x_scl,'y':y_scl})
fig_payrate = bqplot.Figure(marks=[payrate_line], axes=[ax_xcl,ax_ycl])


#(III) interactions
mySelectedLabel = ipywidgets.Label()
def get_data_value(change):
    i,j = change['owner'].selected[0]
    v = hist2d[i,j]
    mySelectedLabel.value = 'Pay Rate = ' + str(v)
    performance = [perf_edge[i]]
    race = [race_edge[j]]
    mask = ( (hr['Perf_ScoreID']==performance)&(hr['RaceID']==race))
    
    payrate, payrate_edge = np.histogram(hr['MaritalStatusID'][mask], 
                             weights=hr['Pay Rate'][mask],
                             bins=np.unique(hr['MaritalStatusID']))
      
    # calculate center of the lines
    payrate_centers = [np.unique(hr['MaritalStatusID'])]
    payrate_line.x = payrate_centers
    payrate_line.y = payrate


    
# (IV) Line plot of pay rate vs. Age
x_scl = bqplot.OrdinalScale() # marital status id (year)
y_scl = bqplot.LinearScale() # pay rate (footage)
ax_xcl = bqplot.Axis(label='Age', scale=x_scl,tick_rotate=90)
ax_ycl = bqplot.Axis(label='Pay Rate', scale=y_scl, 
                    orientation = 'vertical', side='left')
i,j = 0,0
performance = [perf_edge[i]]
race = [race_edge[j]]
mask = ( (hr['Perf_ScoreID']==performance)&(hr['RaceID']==race))

# lets get the ufo plot year => add to building dataset
payrate1, payrate1_edge = np.histogram(hr['Age'][mask], 
                             weights=hr['Pay Rate'][mask],
                             bins=np.unique(hr['Age']))

# calculate center of our bins
payrate1_centers = [np.unique(hr['Age'])]

payrate1_line = bqplot.Lines(x=payrate1_centers, y=payrate1,
                           scales={'x':x_scl,'y':y_scl})
fig_payrate1 = bqplot.Figure(marks=[payrate1_line], axes=[ax_xcl,ax_ycl])


#(V) interactions
mySelectedLabel = ipywidgets.Label()
def get_data_value(change):
    i,j = change['owner'].selected[0]
    v = hist2d[i,j]
    mySelectedLabel.value = 'Pay Rate = ' + str(v)
    performance = [perf_edge[i]]
    race = [race_edge[j]]
    mask = ( (hr['Perf_ScoreID']==performance)&(hr['RaceID']==race))
    
    payrate1, payrate1_edge = np.histogram(hr['Age'][mask], 
                             weights=hr['Pay Rate'][mask],
                             bins=np.unique(hr['Age']))
      
    # calculate center of the lines
    payrate1_centers = [np.unique(hr['Age'])]
    payrate1_line.x = payrate1_centers
    payrate1_line.y = payrate1
    
    
# (IV) link this label to changes in our histogram
heat_map.observe(get_data_value, 'selected')
ipywidgets.VBox([mySelectedLabel,ipywidgets.HBox([heat_fig,fig_payrate,fig_payrate1])])


VBox(children=(Label(value=''), HBox(children=(Figure(axes=[ColorAxis(label='Pay Rate', scale=ColorScale(schem…

|Race Desc| Race code| 
|---------|----------|
|American Indian or Alaska Native|0|
|Asian|1|
|Black or African American|2|
|Hispanic 	|3|
|Two or more races 	|4|
|White|5|

|Marital Status| Code| 
|---------|----------|
|single | 0|  
|married | 1|  
|divorced | 2| 
|separated |  3| 
|widowed | 4|

    
- Prose:

    One paragraph explaining how to use the dashboard you created, to help someone who is not an expert understand your dataset.

    A list of 1 or more contextual datasets you have identified, links to where they reside, and a sentence about why they might be useful in telling the final story.

There are three graphs in the dashboard: a heatplot shows the relationships between employee race and performance score; a line chart thta shows the pay rates of employees of each certain marital status; and another line chart that shows the pay rates of employees of each age. 

To use the dashboard, you will need to click on a square on the heatmap of your choice, and the data on the two lines charts will update.  For example, if you are interested in seeing the marital status and age of employees who is White and earns a performance score of 5, the only thing you need to do is to click on on the top-right square on the heatmap. 

A contextual dataset can be the "recruiting_cost.csv", which demonsrtate how much money the company spend to recruit an employee. Knowing how employees from different race performs, we will know if the money we spend is worthy. Also, konwing the pay rate of employees from different marital and age groups, we can know if the salary we pay for a certain employee worth the cost we spend when recruiting them. 