# HR Analytics Project

## Project Description

This Jupyter Notebook contains the analysis and findings of the HR Analytics project. The project's main objective is to analyze employee attrition within the organization and identify factors that contribute to attrition. We explore various aspects such as employee satisfaction, career progression, work-life balance, and more to gain insights into attrition patterns.

The analysis includes data cleaning, data visualization, and recommendations for HR strategies based on the findings.

LinkedIn: [Prakhar Yadav](https://www.linkedin.com/in/prakhar-yadav-8271231a0/)

GitHub: [98prakhar](https://github.com/98prakhar/HR-Analytics-MeriSkill)


##  Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


 ## Load the Data

In [None]:
df = pd.read_csv("C:/Users/Lenovo/Desktop/Meri skill Project/drive-download-20231003T163556Z-001/Project 3 - HR Analytics/Data P3 MeriSKILL/HR-Employee-Attrition.csv")

##  Data Exploration

In [None]:
df.head()


In [None]:
df.tail()

In [None]:
df.describe()

In [None]:
df.info()

## Data Cleaning and Preprocessing

In [None]:
df.dropna(inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df = df.drop(["EmployeeCount", "Over18", "StandardHours"], axis=1)

In [None]:
df = df.rename(columns={"Attrition": "Attrition", "JobSatisfaction": "Job_Satisfaction",})

In [None]:
df = df.dropna()

In [None]:
df.head()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
categorical_vars = ["BusinessTravel", "Department", "Gender", "JobRole", "MaritalStatus"]
for var in categorical_vars:
    plt.figure(figsize=(8, 5))
    sns.countplot(data=df, x=var, hue="Attrition")
    plt.title(f"Distribution of {var} vs. Attrition")
    plt.xticks(rotation=45)
    plt.show()

In [None]:
# Explore the distribution of numeric variables
numeric_vars = ["Age", "MonthlyIncome", "YearsAtCompany", "YearsInCurrentRole"]
for var in numeric_vars:
    plt.figure(figsize=(8, 5))
    sns.boxplot(data=df, y=var, x="Attrition")
    plt.title(f"Distribution of {var} vs. Attrition")
    plt.show()


In [None]:
# Calculate and visualize the correlation between numeric variables
correlation_matrix = df[numeric_vars].corr()
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()

## Analyze Attrition

In [None]:
# Understand the distribution of Attrition
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="Attrition")
plt.title("Attrition Distribution")
plt.show()

In [None]:
# Explore factors related to attrition
attrition_factors = ["MaritalStatus", "JobRole", "Education"]
for var in attrition_factors:
    plt.figure(figsize=(8, 5))
    sns.countplot(data=df, x=var, hue="Attrition")
    plt.title(f"{var} vs. Attrition")
    plt.xticks(rotation=45)
    plt.show()

## Employee Satisfaction and Engagement

In [None]:
# Analyze employee satisfaction and engagement
satisfaction_vars = ["Job_Satisfaction", "EnvironmentSatisfaction", "RelationshipSatisfaction", "WorkLifeBalance"]
for var in satisfaction_vars:
    plt.figure(figsize=(8, 5))
    sns.countplot(data=df, x=var, hue="Attrition")
    plt.title(f"{var} vs. Attrition")
    plt.show()


In [None]:
# Investigate the relationship between JobInvolvement and Attrition
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="JobInvolvement", hue="Attrition")
plt.title("JobInvolvement vs. Attrition")
plt.show()

## Career Progression

In [None]:
# Analyze employee career progression
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, y="JobLevel", x="YearsInCurrentRole")
plt.title("Job Level vs. Years in Current Role")
plt.show()

In [None]:
# Investigate the relationship between promotions and attrition
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="YearsSinceLastPromotion", hue="Attrition")
plt.title("Years Since Last Promotion vs. Attrition")
plt.xticks(rotation=45)
plt.show()

## Work-Life Balance and Overtime

In [None]:
# Analyze the impact of work-life balance and overtime on attrition
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="OverTime", hue="Attrition")
plt.title("OverTime vs. Attrition")
plt.show()

In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x="WorkLifeBalance", hue="Attrition")
plt.title("WorkLifeBalance vs. Attrition")
plt.show()

## Analyze Attrition

In [None]:
# Understand the distribution of Attrition
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="Attrition")
plt.title("Attrition Distribution")
plt.show()

# Explore factors related to attrition


In [None]:
# Analyze employee satisfaction and engagement
satisfaction_vars = ["Job_Satisfaction", "EnvironmentSatisfaction", "RelationshipSatisfaction", "WorkLifeBalance"]
for var in satisfaction_vars:
    plt.figure(figsize=(8, 5))
    sns.countplot(data=df, x=var, hue="Attrition")
    plt.title(f"{var} vs. Attrition")
    plt.show()


In [None]:
# Investigate the relationship between JobInvolvement and Attrition
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="JobInvolvement", hue="Attrition")
plt.title("JobInvolvement vs. Attrition")
plt.show()

## Career Progression

In [None]:
# Analyze employee career progression
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, y="JobLevel", x="YearsInCurrentRole")
plt.title("Job Level vs. Years in Current Role")
plt.xlabel("Years in Current Role")
plt.ylabel("Job Level")
plt.show()

In [None]:
# Investigate the relationship between promotions and attrition
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x="YearsSinceLastPromotion", hue="Attrition")
plt.title("Years Since Last Promotion vs. Attrition")
plt.xlabel("Years Since Last Promotion")
plt.xticks(rotation=45)
plt.show()

## Work-Life Balance and Overtime

In [None]:
# Analyze the impact of work-life balance on attrition
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x="OverTime", hue="Attrition")
plt.title("OverTime vs. Attrition")
plt.xlabel("Overtime")
plt.show()

In [None]:
# Investigate the relationship between work-life balance and attrition
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x="WorkLifeBalance", hue="Attrition")
plt.title("WorkLifeBalance vs. Attrition")
plt.xlabel("Work-Life Balance")
plt.show()

## Conclusion and Recommendations

In [None]:
# Summarize your findings
print("Summary of Findings:")
print("- Attrition Distribution:")
attrition_counts = df['Attrition'].value_counts()
print(attrition_counts)

In [None]:
# Provide recommendations
print("\nRecommendations:")
print("- Consider improving work-life balance to reduce attrition among employees.")
print("- Monitor the impact of overtime work on attrition and take necessary actions to manage workload.")
print("- Focus on career development opportunities, such as promotions and skill development, to enhance job satisfaction.")
print("- Conduct exit interviews with departing employees to gather more insights into attrition reasons.")


In [None]:
# Overall Conclusion
print("\nOverall Conclusion:")
print("Based on the analysis, we have identified several factors that are related to attrition within the organization. It's important for the company to address these factors in order to improve employee retention and satisfaction. By implementing the recommended actions, the company can work towards reducing attrition and creating a more positive work environment.")