# IBM Atrition Analysis Using R 

*In this project, I conducted an in-depth attrition analysis for IBM employees, exploring various personal and organizational factors contributing to employee turnover. The analysis examined demographic details, job roles, income levels, work-life balance, and other workplace conditions to identify key patterns and potential causes of attrition. The insights aim to help in developing strategies to improve employee retention and satisfaction.*

---

In [None]:
library(tidyverse)
library(tidyr)
library(skimr)
library(janitor)
library(ggplot2)

## Attrtion Insight : Personal Factors
In the first phase of the analysis, I focused on personal factors influencing attrition, examining trends across age groups, gender, and marital status. This helped identify which demographic segments were more likely to leave, providing insights into targeted retention strategies.

In [None]:
Employee_detail1 <- read_csv("Ed.csv")

Employee_detail <- Employee_detail1 %>%
  distinct() %>%
  drop_na()

head(Employee_detail)
glimpse(Employee_detail)


This simple code imports the dataset, removes duplicate records, drops missing values, and displays the first few cleaned rows for analysis.

In [None]:
Gender_Wise <- Employee_detail %>%
  group_by(Gender) %>%
  summarise(
    total_employee = n(),
    attritions = sum(Attrition == "TRUE"),
    Attrition_rate = attritions / total_employee * 100
  )

glimpse(Gender_Wise)

ggplot(data = Gender_Wise) +
  geom_bar(mapping = aes(x = Gender, y = Attrition_rate , fill = Gender), stat = "identity") + 
  geom_text(aes(x = Gender, y = Attrition_rate, 
                label = paste0(round(Attrition_rate, 2), "%")), 
            vjust = -0.5, size = 4, fontface = "bold", show.legend = FALSE) +
  labs(title = "Attrition Rate by Gender", 
       y = "Attrition Rate (%)", 
       x = "Gender") +
  theme_minimal() + 
  theme(plot.title = element_text(face = "bold", size = 16, colour = "lightblue"),
        axis.title = element_text(face = "bold", size = 12),
        axis.text = element_text(face = "bold"))


This section of the analysis begins by transforming and preparing the dataset for exploration. Using data manipulation functions, the raw employee records are filtered, grouped, and summarized to focus on gender-based attrition trends. After the transformation, ggplot2 is used to visualize the relationship between gender and attrition rates, making it easier to identify patterns at a glance.

_The visualization revealed that male employees in this dataset show a higher attrition rate compared to female employees, indicating a gender-related difference in turnover patterns._'

In [None]:
Age_wise <- Employee_detail %>%
  group_by(Age) %>%
  summarise(total_employee = n(),
            attritions = sum(Attrition=="TRUE"),
            Attrition_rate = attritions/total_employee*100)

glimpse(Age_wise)

ggplot(data = Age_wise) +
  geom_histogram(mapping = aes(x = Age, y = Attrition_rate),
                 stat = "identity", bins = 40, fill = "blue", color= "black") +
  labs(title = "Attrition rate per Age", x = "Age", y = "Attrition Rate(%)") +
  theme_minimal() + 
  theme(plot.title = element_text(face = "bold", size = 16),
        axis.title = element_text(face = "bold", size = 12))

Marital_wise <- Employee_detail %>%
  group_by(MaritalStatus) %>%
  summarise(total_employee = n(),
            attritions = sum(Attrition=="TRUE"),
            Attrition_rate = attritions/total_employee*100)
glimpse(Marital_wise)

ggplot(Marital_wise, aes(x = reorder(MaritalStatus, Attrition_rate), y = Attrition_rate)) +
  geom_segment(mapping = aes(xend = MaritalStatus, y = 5, yend = Attrition_rate), 
               size = 2, color = "grey") +
  geom_point(size = 6, color = "#f50538") +
  geom_text(aes(label = paste0(round(Attrition_rate, 1), "%")),
            vjust = -1.1,size = 3,fontface = "bold") +
  labs(title = "Attrition Rate by Marital Status",
       x = "Marital Status", y = "Attrition Rate (%)") +
  theme_light()+ 
  theme(plot.title = element_text(face = "bold",size = 16 ),axis.title = element_text(face = "bold",size = 12),axis.text=element_text(face = "bold"))


*I further analyzed and transformed the data to study attrition patterns by age and marital status.* 

**The age-wise analysis showed that younger employees in the dataset had a higher attrition rate. This may be because many of them are hired as interns or are in the early stages of their careers, making them more likely to explore other opportunities.**

**The marital status analysis indicated that divorced and married employees had comparatively lower attrition rates, which may be due to a greater need for job stability, whereas single employees, with fewer family commitments, showed relatively higher turnover.**

## Attrition Trends Across Job Factors

*In the second part of the analysis, I examined job-related factors such as department, job role, job level, business travel, and distance from home.*

In [None]:
Jd <- read_csv("Job.csv")

glimpse(Jd)

Job_detail <- Jd %>%
  distinct() %>%
  drop_na()

glimpse(Job_detail)
## Attrition Department wise 

Department_Wise <- Job_detail %>%
  group_by(Department) %>%
  summarise(total_employee = n(),Attritions = sum(Attrition == TRUE),Attrition_rate=Attritions/total_employee*100)

glimpse(Department_Wise)

ggplot(data = Department_Wise) + geom_col(mapping = aes(reorder(Department,Attrition_rate),y=Attrition_rate,fill= Department)) +
  geom_text(mapping = aes(reorder(Department,Attrition_rate),y=Attrition_rate,label=paste0(round(Attrition_rate,1),"%")),vjust = -0.2, fontface = "bold") +
  labs(title = "Attrition Rate By Department",x="Department",y="Attrition Rate (%)") + theme_minimal() + 
  theme(plot.title = element_text(face = "bold",size=16),axis.title=element_text(face ="bold",size= 12))


*I first organized and transformed the dataset to calculate attrition rates across different departments.*

> **The results showed that the Sales department had the highest attrition rate, which may be linked to the higher stress levels often associated with consumer-facing roles, such as meeting sales targets and handling client interactions.** 

 > **In contrast, the Research and Development department recorded the lowest attrition rate, possibly due to the nature of work involving long-term projects, structured deadlines, and comparatively less direct interaction with consumers.**

In [None]:
## Attrition job role wise

Role_Wise <- Job_detail %>%
  group_by(JobRole) %>%
  summarise(total_employee = n(),Attritions = sum(Attrition == TRUE),Attrition_rate=Attritions/total_employee*100)

glimpse(Role_Wise)

ggplot(Role_Wise,aes(reorder(JobRole,Attrition_rate),y=Attrition_rate))+
  geom_segment(aes(xend=JobRole,y=0,yend=Attrition_rate),color ="blue",size=1.5) + 
  geom_point(size =5 , color = "red") +
  labs(title = "Attrition Rate By Job Role",x="Job Role", y = "Attrition Rate (%)") +
  theme_minimal() + 
  theme(plot.title = element_text(face = "bold",size = 16),axis.title =element_text(face = "bold",size =12),axis.text.x=element_text(face= "bold",angle =30 , hjust =1))

## Job level vs Attrition 

Job_wise <- Job_detail %>%
  group_by(JobLevel) %>%
  summarize (total_employee=n(), Attritions = sum(Attrition==TRUE), Attrition_rate=Attritions/total_employee*100)

glimpse(Job_wise)

ggplot(Job_wise,aes(x=reorder(JobLevel, Attrition_rate),y=Attrition_rate, size = Attrition_rate,fill = JobLevel))+
  geom_bar(stat = "identity") +
  scale_fill_viridis_b() +
  labs(title="Attrition Rate by Job Level",x="Job Level",y= "Attrition Rate(%)", size="Attrition Rate",color="Job Level") +
  theme_minimal() +
  theme(plot.title = element_text(face="bold", size=16),axis.title = element_text(face="bold", size=12),axis.text =element_text(face="bold"),legend.title  = element_text(face = "bold")) 


I organized and analyzed the dataset to examine attrition patterns across different job roles and job levels. 

>**The job role analysis showed that positions such as Sales Representative and Laboratory Technician had notably higher attrition rates. These roles are typically associated with lower job levels, which was further confirmed by the job level analysis.**

>**The job level results indicated that levels 1 to 3—which include such roles—had the highest attrition rates, while levels 4 and 5 (covering senior positions like managers and directors) had the lowest. This trend may be influenced by factors such as lower pay and reduced job security in lower-level positions.**

In [None]:
## Attrition vs business travel 

Travel_Wise <- Job_detail %>%
  group_by(BusinessTravel) %>%
  summarise(total_employee = n(),Attritions = sum(Attrition == TRUE),Attrition_Rate = Attritions/total_employee*100)

glimpse(Travel_Wise)

ggplot(Travel_Wise , aes (x = reorder(BusinessTravel, Attrition_Rate),y = Attrition_Rate, fill = BusinessTravel)) + 
  geom_bar(stat= "identity") +
  coord_flip() +
  labs(title = "Attrition Rate  By Business Travel ", x = "Business Travel", y = "Attrition Rate (%)",fill = "Business Travel") +
  geom_text(aes (x = reorder(BusinessTravel, Attrition_Rate),y = Attrition_Rate,label=paste0(round(Attrition_Rate,1),"%")),hjust = -0.3, fontface = "bold",color = "black") +
  scale_fill_ordinal(labels = c("Non-Travel" = "No Travel", "Travel_Frequently" = "Too Much Travel", "Travel_Rarely" = "Rarely Travel"))  +
  scale_x_discrete(labels = c("Non-Travel" = "No Travel", "Travel_Frequently" = "Too Much Travel", "Travel_Rarely" = "Rarely Travel")) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold",size = 16 ),axis.title = element_text(face="bold",size = 12),axis.text = element_text(face = "bold") )

## Attrition vs. Distance From Home 

Distance_Wise <- Job_detail %>%
  group_by(DistanceFromHome) %>%
  summarise(total_employee = n(),Attritions = sum(Attrition == TRUE),Attrition_Rate = round(Attritions/total_employee*100,2))

glimpse(Distance_Wise)

ggplot(Distance_Wise , aes(x= DistanceFromHome,y=Attrition_Rate,size = Attrition_Rate,color = DistanceFromHome)) +
  geom_point()  +
  scale_colour_viridis_b() +
  labs (title = "Attrition By Distance From Home",x="Distance From Home",y="Attrition Rate",color = "Distance",size="Attrition Rate") +
  theme_light() +
  theme(plot.title = element_text(face = "bold",size = 16,color = "orange"),axis.title = element_text(face = "bold", size = 12))

I organized and analyzed the dataset to study the impact of business travel and distance from home on attrition rates.

>**For business travel, the results showed that employees who traveled more frequently for work had higher attrition rates. This may be due to the demands of fieldwork and spending extended periods away from home, which can affect work–life balance and increase stress levels.**

>**For distance from home, there was no perfectly regular pattern across all distances. However, when the data was grouped into three ranges 0–10 km, 10–20 km, and 20–30 km—employees living far away (10–30 km range) had the highest attrition rate. A possible reason could be longer commuting times leading to greater fatigue and reduced job satisfaction.**

## Attrition by Job Compensation

*In the third section, I used a pre-organized dataset (prepared in Excel) to analyze the effect of monthly income on attrition.*

In [None]:
## Attrition BY job compensation
Job_Comp <- read_csv("Job_income.csv")

glimpse(Job_Comp)

ggplot(Job_Comp,aes(x="",y=Attrition_Rate,fill = Monthly_Income)) +
  geom_col() +
  coord_polar("y") +
  geom_text(aes(label = paste0(Attrition_Rate, "%")),position = position_stack(vjust = 0.5) , fontface = "bold",color = "white",size = 4) +
  labs(title = "Attrition By Income",x ="Income",y="Attrition Rate(%)",fill = "Monthly Income") +
  scale_fill_manual(values = c("#092c5c", "#ff3232", "#f5d130"))+
  theme_light() +
  theme(plot.title = element_text(face= "bold",size =16),axis.title = element_text(face= "bold",size =12),axis.text = element_blank(),panel.grid = element_blank())


>**The results showed that employees earning between $1,000 and $2,500 contributed to more than 50% of total attritions. A possible explanation is that lower salaries may lead to reduced job satisfaction, higher financial pressure, and a stronger motivation to seek better-paying opportunities elsewhere.**

##Attrition Trends by Satisfaction & Work–Life Balance

*In the last section, I transformed, organized, and analyzed the dataset to study the relationship between Job Satisfaction Score (JSS), work–life balance, and attrition.*

In [None]:
# satisfaction score vs Attrition 
Sts <- read_csv("Sat_score.csv")

glimpse(Sts)

sat_score <- Sts %>%
  distinct()

glimpse(sat_score)

## Job satisfaction vs Attrition

Jsat_Wise <- sat_score %>%
  group_by(JobSatisfaction) %>%
  summarise(total_employee = n(),Attritions = sum(Attrition == TRUE),Attrition_Rate = round(Attritions/total_employee*100,2))

glimpse(Jsat_Wise)

ggplot(Jsat_Wise,aes(x="",y = Attrition_Rate, fill = JobSatisfaction))+
  geom_col(width = .25) + 
  scale_fill_gradient2() +
  coord_flip() +
  geom_text(aes(label = paste0(Attrition_Rate,"%")),position = position_stack(vjust = 0.5) , fontface = "bold",color = "white",size = 4) +
  labs(title = "Attrition By Job Satisfaction" ,x = "Job Satisfaction",y = "Attrition Rate (%)",fill = "Job Satisfaction Score") +
  theme(plot.title = element_text(face= "bold",size =16,color = "#ffa600"),axis.title = element_text(face= "bold",size =12))

## Attrition vs work life 

Wlb_Wise <- sat_score %>%
  group_by(WorkLifeBalance) %>%
  summarise(total_employee = n(),Attritions = sum(Attrition == TRUE),Attrition_Rate = round(Attritions/total_employee*100,2))

glimpse(Wlb_Wise)

ggplot(Wlb_Wise,aes(x=WorkLifeBalance ,y=Attrition_Rate)) +
  geom_segment(aes(xend=WorkLifeBalance,y=0,yend=Attrition_Rate),color ="#1982c4",size=1.5) + 
  geom_point(size =6 , color = "yellow") +
  geom_text(aes(WorkLifeBalance,y=Attrition_Rate,label = paste0(round(Attrition_Rate,1),"%")),vjust = -1.0,fontface = "bold",size =3 , color = "black") +
  labs(title = "Attriton By Work life Balance",x="Work Life Balance Score", y = "Attrition Rate (%)") +
  theme_light() +
  theme(plot.title = element_text(face= "bold",size =16,color = "#ffa600"),axis.title = element_text(face= "bold",size =12))


>**The findings showed that employees with higher job satisfaction scores had lower attrition rates, while those with lower scores were more likely to leave.**

>**A similar trend was observed for work–life balance, where higher scores correlated with better retention. This suggests that when employees feel satisfied in their roles and are able to maintain a healthy balance between work and personal life, they are more likely to remain with the organization.**

## Turning Insights into Retention: Conclusion & Recommendations

>Conclusion & Key Findings

This IBM attrition analysis explored personal, demographic, and job-related factors influencing employee turnover.
Key findings include:

**Personal Factors:** Younger employees and single employees showed higher attrition rates compared to older and married/divorced employees.

**Department & Role:** Sales and certain lower-level roles (e.g., Sales Representative, Laboratory Technician) had the highest attrition rates, while Research & Development and senior roles had the lowest.

**Job Factors:** Frequent business travel and long commuting distances (10–30 km) were linked to higher attrition.

**Compensation:** Employees earning between $1,000 and $2,500 accounted for over 50% of total attrition.

**Satisfaction & Balance:** Higher Job Satisfaction Scores (JSS) and better work–life balance strongly correlated with lower turnover.

>**Suggestions**

-**Targeted Retention for High-Risk Groups: Offer clear career development paths and mentorship programs for younger employees and those in entry-level roles.**                      

-**Reduce Workload Pressure in Sales: Provide stress management support, training, and realistic performance targets.**

-**Improve Compensation Competitiveness: Review pay scales for lower-income brackets to reduce financial-driven turnover.**

-**Enhance Work–Life Balance: Offer flexible working arrangements or remote work options for employees with long commutes or heavy travel schedules.**

-**Boost Job Satisfaction: Strengthen recognition programs, feedback systems, and opportunities for skill growth.**

**By addressing these factors, IBM can strengthen employee engagement, improve retention rates, and reduce the costs associated with high turnover.**