<a href="https://colab.research.google.com/github/abdyraman/hr-deep-learning/blob/main/deep_hr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Employee retention strategies are integral to the success and well-being of a company. There are often many reasons why employees leave an organization, and in this case study, I will explore some of the key drivers of employee attrition. Employee attrition measures how many workers have left an organization and is a common metric companies use to assess their performance. While turnover rates vary from industry to industry, the [Bureau of Labor Statistics reported](https://www.bls.gov/news.release/jolts.t18.htm#) that among voluntary separations the overall turnover rate was 25% in 2020.


In this notebook, I will explore [IBM's dataset](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset) on HR Analytics. The data consists of nearly 1,500 current and former employees with information related to their job satisfaction, work life balance, tenure, experience, salary, and demographic data.

**Employee Attrition Analysis**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import hvplot.pandas  # Import hvplot for DataFrame plotting
import panel as pn    # For creating dashboards
import holoviews as hv
from bokeh.io import output_notebook
# import tensorflow as tf

In [2]:
df_full = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')

**Data cleaning**

In [3]:
# remove 4 columns
df = df_full.drop(['Over18', 'EmployeeNumber','EmployeeCount','StandardHours'],axis=1)

**Descriptive statistics**

Text data analysis- categorical values

Numeric Data Analysis

In [4]:
num=df.select_dtypes('int64')
num.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,1470.0,36.92381,9.135373,18.0,30.0,36.0,43.0,60.0
DailyRate,1470.0,802.485714,403.5091,102.0,465.0,802.0,1157.0,1499.0
DistanceFromHome,1470.0,9.192517,8.106864,1.0,2.0,7.0,14.0,29.0
Education,1470.0,2.912925,1.024165,1.0,2.0,3.0,4.0,5.0
EnvironmentSatisfaction,1470.0,2.721769,1.093082,1.0,2.0,3.0,4.0,4.0
HourlyRate,1470.0,65.891156,20.329428,30.0,48.0,66.0,83.75,100.0
JobInvolvement,1470.0,2.729932,0.711561,1.0,2.0,3.0,3.0,4.0
JobLevel,1470.0,2.063946,1.10694,1.0,1.0,2.0,3.0,5.0
JobSatisfaction,1470.0,2.728571,1.102846,1.0,2.0,3.0,4.0,4.0
MonthlyIncome,1470.0,6502.931293,4707.956783,1009.0,2911.0,4919.0,8379.0,19999.0


**Analysis**

In [5]:
# Grouping by Gender, specified attribute, and Attrition; calculating normalized counts
plot_df = df.groupby(['Gender', 'Department'])['Attrition'].value_counts(normalize=True)

# Multiplying by 100, renaming columns, and resetting index
plot_df = plot_df.mul(100).rename('Percent').reset_index()

# Ensure color mapping
color_map = {"Yes": "#D4A1E7", "No": "#6faea4"}
plot_df['color'] = plot_df['Attrition'].map(color_map)

# Creating the bar plot using hvplot
bars = plot_df.hvplot.bar(
    x='Department',
    y='Percent',
    by='Attrition',
    groupby='Gender',
    color='color',  # Custom color mapping
    width=800,
    height=600,
    xlabel='Department',
    ylabel='Percent',
    title='Attrition Rates by Department and Gender',
    line_color='black',
    legend='top_right'
)

# Customize the plot for aesthetics
bars.opts(
    show_grid=True,
    tools=['hover'],
    bgcolor='#F4F2F0',
    height=500,
    width=600
)

# Save the plot as an HTML file
hv.save(bars, 'docs/attrition_rates_plot.html', fmt='html')

# Display the plot (if running in Jupyter)
bars



  0%|                                                     | 0/2 [00:00<?, ?it/s]



                                                                                





