<a href="https://colab.research.google.com/github/abdyraman/hr-deep-learning/blob/main/deep_hr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Employee retention strategies are integral to the success and well-being of a company. There are often many reasons why employees leave an organization, and in this case study, I will explore some of the key drivers of employee attrition. Employee attrition measures how many workers have left an organization and is a common metric companies use to assess their performance. While turnover rates vary from industry to industry, the [Bureau of Labor Statistics reported](https://www.bls.gov/news.release/jolts.t18.htm#) that among voluntary separations the overall turnover rate was 25% in 2020.


In this notebook, I will explore [IBM's dataset](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset) on HR Analytics. The data consists of nearly 1,500 current and former employees with information related to their job satisfaction, work life balance, tenure, experience, salary, and demographic data.

**Employee Attrition Analysis**

In [1]:
import pandas as pd
import numpy as np
import hvplot.pandas  # Import hvplot for DataFrame plotting
import holoviews as hv
import panel as pn
pn.extension("tabulator","echarts", "plotly", "vega", "vizzu")
from bokeh.palettes import Category10 
import plotly.express as px


In [2]:
df_full = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')

**Data cleaning**

In [3]:
# remove 4 columns
df = df_full.drop(['Over18', 'EmployeeNumber','EmployeeCount','StandardHours'],axis=1)

In [4]:
#Checking the unique answer points per feature
unique_counts_objects = df.select_dtypes('object').nunique()

# Looping through each categorical variable and printing its unique values and counts
for i in unique_counts_objects.index:
    unique_values = df[i].value_counts()
    print(f'Unique values of {i}:')
    print(unique_values)
    print()

Unique values of Attrition:
Attrition
No     1233
Yes     237
Name: count, dtype: int64

Unique values of BusinessTravel:
BusinessTravel
Travel_Rarely        1043
Travel_Frequently     277
Non-Travel            150
Name: count, dtype: int64

Unique values of Department:
Department
Research & Development    961
Sales                     446
Human Resources            63
Name: count, dtype: int64

Unique values of EducationField:
EducationField
Life Sciences       606
Medical             464
Marketing           159
Technical Degree    132
Other                82
Human Resources      27
Name: count, dtype: int64

Unique values of Gender:
Gender
Male      882
Female    588
Name: count, dtype: int64

Unique values of JobRole:
JobRole
Sales Executive              326
Research Scientist           292
Laboratory Technician        259
Manufacturing Director       145
Healthcare Representative    131
Manager                      102
Sales Representative          83
Research Director             

In [5]:
#Checking on numeric datatypes details
num=df.select_dtypes(include=['int64','float64'])
num.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,1470.0,36.92381,9.135373,18.0,30.0,36.0,43.0,60.0
DailyRate,1470.0,802.485714,403.5091,102.0,465.0,802.0,1157.0,1499.0
DistanceFromHome,1470.0,9.192517,8.106864,1.0,2.0,7.0,14.0,29.0
Education,1470.0,2.912925,1.024165,1.0,2.0,3.0,4.0,5.0
EnvironmentSatisfaction,1470.0,2.721769,1.093082,1.0,2.0,3.0,4.0,4.0
HourlyRate,1470.0,65.891156,20.329428,30.0,48.0,66.0,83.75,100.0
JobInvolvement,1470.0,2.729932,0.711561,1.0,2.0,3.0,3.0,4.0
JobLevel,1470.0,2.063946,1.10694,1.0,1.0,2.0,3.0,5.0
JobSatisfaction,1470.0,2.728571,1.102846,1.0,2.0,3.0,4.0,4.0
MonthlyIncome,1470.0,6502.931293,4707.956783,1009.0,2911.0,4919.0,8379.0,19999.0


In [6]:
idf=df.interactive()

**Descriptive statistics**

Text data analysis- categorical values

Numeric Data Analysis

In [7]:

# Function to update the plot based on attrition counts
def update_plot():
    # Count the occurrences of gender by attrition status
    counts = df.groupby(['Gender', 'Attrition']).size().reset_index(name='Count')

    # Create a new column for combined categories
    counts['Gender_Attrition'] = counts['Gender'] + ' - Attrition: ' + counts['Attrition']

    # Calculate the total count for percentages
    total_count = counts['Count'].sum()
    counts['Percentage'] = ((counts['Count'] / total_count) * 100).round(0).astype(int).astype(str) + '%'  # Round to 0 and add '%'

    # Create a bar plot using hvplot
    bar_plot = counts.hvplot(
        kind='bar',
        x='Gender_Attrition',  # Use the new combined column for the x-axis
        y='Count',  # Use the 'Count' column for the y-axis
        title='Counts of Employees by Gender and Attrition Status',
        xlabel='Gender and Attrition Status',
        ylabel='Count',
        color='Attrition',  # Optional: color by Gender
        width=800,
        height=400
    )

    # Create labels for percentages
    labels = hv.Labels(counts, kdims=['Gender_Attrition', 'Count'], vdims=['Percentage']).opts(
        text_color='black',
        text_font_size='10pt',
        fontsize='12pt',
        show_legend=False
    )

    # Adjust labels to position above the bars
    labels = labels.opts(yoffset=10)  # Adjust y offset for positioning above bars

    # Combine the bar plot with labels
    combined_plot = bar_plot * labels

    return combined_plot

# Create a Panel layout to include the plot
interactive_plot = pn.Column(update_plot())

# Display the interactive plot
interactive_plot.servable()

In [8]:
import panel as pn

# Define a Markdown pane with introductory text on attrition
pn.pane.Markdown("""
# Attrition

Employee retention strategies are integral to the success and well-being of a company. 
There are often many reasons why employees leave an organization, and in this case study, I will explore some of the key drivers of employee attrition. 
Employee attrition measures how many workers have left an organization and is a common metric companies use to assess their performance. While turnover rates vary from industry to industry, the Bureau of Labor Statistics reported that among voluntary separations, the overall turnover rate was 25% in 2020.  
[In this notebook](https://github.com/abdyraman/diversitydashboard), I will explore IBM's dataset on HR Analytics. 
The data consists of nearly 1,500 current and former employees with information related to their job satisfaction, work-life balance, tenure, experience, salary, and demographic data.
""").servable()


In [16]:
# Assuming df is your DataFrame containing the data
# Create a RadioButtonGroup for selecting the y-axis variable
yaxis_source = pn.widgets.RadioButtonGroup(
    name='Y axis', 
    options=['MonthlyIncome','MonthlyRate','DailyRate','HourlyRate'], 
    button_type='success'
)

# Function to create the plot based on the selected y-axis variable
def update_plot(y_axis):
    # Group by Age and Gender, then calculate the mean for the selected y-axis variable
    average_data = df.groupby(['Age', 'Gender'])[y_axis].mean().reset_index()

    # Plot the average values by Age, grouped by Gender
    fig = average_data.hvplot.line(
        x="Age",                # Use Age as the x-axis
        y=y_axis,               # Use the selected y-axis variable correctly
        by="Gender",            # Separate lines for each Gender
        line_width=5,           # Line width
        title=f"Average {y_axis} by Age and Gender",
        ylabel=f"Average {y_axis}",
        xlabel="Age",
        ylim=(0, average_data[y_axis].max() + 10)  # Set y-axis limit based on max of selected variable
    )

    return fig

# Create a Panel layout to include the RadioButtonGroup and plot
interactive_plot = pn.Column(
    yaxis_source,
    pn.bind(update_plot, y_axis=yaxis_source)  # Bind the update_plot function to the selected value
)

# Display the interactive plot
interactive_plot.servable() 

In [17]:
# Assuming df is your DataFrame containing the data
# Create a RadioButtonGroup for selecting the y-axis variable
yaxis_source = pn.widgets.RadioButtonGroup(
    name='Y axis', 
    options=['MonthlyIncome','MonthlyRate','DailyRate','HourlyRate'], 
    button_type='success'
)

# Function to create the plot based on the selected y-axis variable
def update_plot(y_axis):
    # Group by Age and Gender, then calculate the mean for the selected y-axis variable
    average_data = df.groupby(['YearsAtCompany', 'Gender'])[y_axis].mean().reset_index()

    # Plot the average values by Age, grouped by Gender
    fig = average_data.hvplot.line(
        x="YearsAtCompany",                # Use Age as the x-axis
        y=y_axis,               # Use the selected y-axis variable correctly
        by="Gender",            # Separate lines for each Gender
        line_width=5,           # Line width
        title=f"Average {y_axis} by Years at company and Gender",
        ylabel=f"Average {y_axis}",
        xlabel="YearsAtCompany",
        ylim=(0, average_data[y_axis].max() + 10)  # Set y-axis limit based on max of selected variable
    )

    return fig

# Create a Panel layout to include the RadioButtonGroup and plot
interactive_plot = pn.Column(
    yaxis_source,
    pn.bind(update_plot, y_axis=yaxis_source)  # Bind the update_plot function to the selected value
)

# Display the interactive plot
interactive_plot.servable() 

In [None]:
# Group by Age and Gender, then calculate the mean HourlyRate
average_hourly_rate_df = df.groupby(['Age', 'Gender'])['HourlyRate'].mean().reset_index()

# Plot the average HourlyRate by Age, grouped by Gender
fig = average_hourly_rate_df.hvplot.line(
    x="Age",                # Use Age as the x-axis
    y="HourlyRate",         # Use the averaged HourlyRate as the y-axis
    by="Gender",            # Separate lines for each Gender
    line_width=5,           # Line width
    title="Average Hourly Rate by Age and Gender",
    ylabel="Average Hourly Rate",
    xlabel="Age",
    ylim=(0, average_hourly_rate_df['HourlyRate'].max() + 10)  # Set y-axis limit based on max HourlyRate
)

# Displaying the plot
pn.pane.HoloViews(fig, sizing_mode="stretch_width").servable()


In [None]:

years_with_company

In [None]:
# Define Panel buttons for gender selection
male_button = pn.widgets.Button(name='Male', button_type='primary')
female_button = pn.widgets.Button(name='Female', button_type='primary')

# Pane to display filtered data
filtered_data_pane = pn.pane.DataFrame(width=400)

# Function to filter data based on selected gender
def filter_data(gender):
    filtered_data = df[df['Gender'] == gender]
    filtered_data_pane.object = filtered_data

# Set up button click events
male_button.on_click(lambda event: filter_data('Male'))
female_button.on_click(lambda event: filter_data('Female'))

# Display buttons and filtered data pane
pn.Column(
    male_button, 
    female_button, 
    filtered_data_pane
).servable()

In [None]:
# Define a Toggle button to filter by Attrition "Yes"
attrition_toggle = pn.widgets.Toggle(name='Attrition: Yes', button_type='success')

# Pane to display filtered data
filtered_data_pane = pn.pane.DataFrame(width=400)

# Function to filter data based on toggle state
def filter_data(event):
    if attrition_toggle.value:
        # Show only rows where Attrition is 'Yes'
        filtered_data = df[df['Attrition'] == 'Yes']
    else:
        # Show an empty DataFrame or any other default state (like all data if preferred)
        filtered_data = pd.DataFrame(columns=df.columns)
    
    filtered_data_pane.object = filtered_data

# Set up the toggle to call filter_data whenever its state changes
attrition_toggle.param.watch(filter_data, 'value')

# Display the toggle button and filtered data pane
pn.Column(attrition_toggle, filtered_data_pane).servable()

**Bar chart with departments and gender distribution**