<a href="https://colab.research.google.com/github/abdyraman/hr-deep-learning/blob/main/deep_hr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Employee retention strategies are integral to the success and well-being of a company. There are often many reasons why employees leave an organization, and in this case study, I will explore some of the key drivers of employee attrition. Employee attrition measures how many workers have left an organization and is a common metric companies use to assess their performance. While turnover rates vary from industry to industry, the [Bureau of Labor Statistics reported](https://www.bls.gov/news.release/jolts.t18.htm#) that among voluntary separations the overall turnover rate was 25% in 2020.


In this notebook, I will explore [IBM's dataset](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset) on HR Analytics. The data consists of nearly 1,500 current and former employees with information related to their job satisfaction, work life balance, tenure, experience, salary, and demographic data.

**Employee Attrition Analysis**

In [1]:
import pandas as pd
import numpy as np
import hvplot.pandas  # Import hvplot for DataFrame plotting
import holoviews as hv
import panel as pn
pn.extension("tabulator","echarts", "plotly", "vega", "vizzu")



In [2]:
df_full = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')

**Data cleaning**

In [3]:
# remove 4 columns
df = df_full.drop(['Over18', 'EmployeeNumber','EmployeeCount','StandardHours'],axis=1)

In [4]:
idf=df.interactive()

**Descriptive statistics**

Text data analysis- categorical values

Numeric Data Analysis

In [5]:
# Assuming df is your DataFrame with 'Age', 'Gender', and other y-axis options
# Create a RadioButtonGroup for selecting the y-axis variable
yaxis_source = pn.widgets.RadioButtonGroup(
    name='Y axis', 
    options=['MonthlyIncome', 'MonthlyRate', 'DailyRate', 'HourlyRate'], 
    button_type='success'
)


In [6]:

# Define the plot function
def averagelinearplot_gender_age(y_axis):
    # Group by Age and Gender, then calculate the mean for the selected y-axis variable
    average_data = df.groupby(['Age', 'Gender'])[y_axis].mean().reset_index()

    # Plot the average values by Age, grouped by Gender
    fig = average_data.hvplot.line(
        x="Age",                # Use Age as the x-axis
        y=y_axis,               # Use the selected y-axis variable correctly
        by="Gender",            # Separate lines for each Gender
        line_width=5,           # Line width
        title=f"Average {y_axis} by Age and Gender",
        ylabel=f"Average {y_axis}",
        xlabel="Age",
        ylim=(0, average_data[y_axis].max() + 10)  # Set y-axis limit based on max of selected variable
    )

    return fig



In [7]:
# Create a Panel layout to include the RadioButtonGroup and plot
averages_age_gender_linear_plot = pn.Column(
    yaxis_source,
    pn.bind(averagelinearplot_gender_age, y_axis=yaxis_source)  # Bind the plot function to the selected value
)
# Set a specific width for the column
averages_age_gender_linear_plot


In [8]:
# Function to create the plot based on the selected y-axis variable
def create_averagelinearplot_gender_years_at_company(y_axis):
    # Group by YearsAtCompany and Gender, then calculate the mean for the selected y-axis variable
    average_data = df.groupby(['YearsAtCompany', 'Gender'])[y_axis].mean().reset_index()

    # Plot the average values by YearsAtCompany, grouped by Gender
    fig = average_data.hvplot.line(
        x="YearsAtCompany",                # Use YearsAtCompany as the x-axis
        y=y_axis,               # Use the selected y-axis variable correctly
        by="Gender",            # Separate lines for each Gender
        line_width=5,           # Line width
        title=f"Average {y_axis} by Years at Company and Gender",
        ylabel=f"Average {y_axis}",
        xlabel="YearsAtCompany",
        ylim=(0, average_data[y_axis].max() + 10)  # Set y-axis limit based on max of selected variable
    )

    return fig

In [9]:
# Create a Panel layout to include the RadioButtonGroup and plot
averagelinearplot_gender_years_at_company_layout = pn.Column(
    yaxis_source,
    pn.bind(create_averagelinearplot_gender_years_at_company, y_axis=yaxis_source)  # Bind the update_plot function to the selected value
)

# Display the interactive plot layout
averagelinearplot_gender_years_at_company_layout



In [10]:
# Function to create the plot based on the selected y-axis variable
def update_averages_plot(y_axis):
    # Group by Gender and Attrition, then calculate the mean for the selected y-axis variable
    average_data = df.groupby(['Gender', 'Attrition'])[y_axis].mean().reset_index().round(0)

    # Create a bar plot for the selected y-axis variable grouped by Gender and Attrition
    fig = average_data.hvplot.bar(
        x='Attrition',          # Use Attrition status as the x-axis
        y=y_axis,              # Use the selected y-axis variable
        by='Gender',           # Separate bars for each Gender
        title=f'Average {y_axis} by Gender and Attrition',
        ylabel=f'Average {y_axis}',
        xlabel='Attrition',
        ylim=(0, average_data[y_axis].max() + 10),  # Set y-axis limit based on max of selected variable
        legend='top_left'      # Position of the legend
    )

    return fig



In [11]:
# Create a Panel layout to include the RadioButtonGroup and plot
aver_gend_attr_layout = pn.Column(
    yaxis_source,
    pn.bind(update_averages_plot, y_axis=yaxis_source)  # Bind the update_plot function to the selected value
)

# Display the interactive plot
aver_gend_attr_layout