In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import numpy as np

In [3]:
df = pd.read_csv(r"C:\Users\LENOVO\Downloads\Employee.csv")

In [5]:
df.head()

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1


**What is the distribution of educational qualifications among employees?**

In [26]:
df['Education'].value_counts().reset_index(name = 'Number of Employees')

Unnamed: 0,Education,Number of Employees
0,Bachelors,3601
1,Masters,873
2,PHD,179


- Bachelor’s degree holders make up the majority of the workforce with 3,601 employees (a dominant proportion).

- Master’s degree holders account for 873 employees, showing a smaller but significant representation.

- PhD holders are the least represented, with only 179 employees.

**Insight:**

*The company’s workforce is heavily concentrated around employees with undergraduate degrees, suggesting it may hire primarily for roles that don’t require advanced academic qualifications. This could also highlight an opportunity for:*

- Upskilling programs for employees with Bachelor’s degrees who want to pursue higher education.

- Attracting more highly specialized talent (e.g., PhD holders) for research-intensive or leadership roles.

**How does the length of service (Joining Year) vary across different cities?**

In [18]:
from datetime import datetime
current_year = datetime.now().year
df['Length of Service'] = current_year - df['JoiningYear']

In [24]:
df.groupby('City')['Length of Service'].mean().reset_index(name = 'Average Length of Service')

Unnamed: 0,City,Average Length of Service
0,Bangalore,10.140485
1,New Delhi,9.47796
2,Pune,9.998423


- Bangalore has the highest average length of service at 10.14 years.

- Pune comes next with 9.99 years.

- New Delhi has the lowest at 9.48 years.

**Key Insight:**

*Employees in Bangalore tend to stay the longest with the company on average. Pune follows closely, while New Delhi has slightly shorter employee tenure. Though the differences are not huge, HR might explore factors contributing to higher retention in Bangalore and Pune to improve tenure in New Delhi.*

**Is there a correlation between Payment Tier and Experience in Current Domain?**

In [28]:
from scipy.stats import spearmanr 
x = df['PaymentTier']
y = df['ExperienceInCurrentDomain']
corr, p_value = spearmanr(x,y)
print(f'Spearman correlation: {corr}, P-Value: {p_value}')

Spearman correlation: 0.015191447440754312, P-Value: 0.3001862448012146


**Key Takeaway**

- People with more experience are not consistently placed in higher payment tiers.

- Other factors (like education) could be influencing payment tier more strongly.

**What is the gender distribution within the workforce?**

In [37]:
df['Gender'].value_counts()

Gender
Male      2778
Female    1875
Name: count, dtype: int64

- Male employees make up the majority of the workforce with 2,778 employees (≈60%).

- Female employees account for 1,875 employees (≈40%).

**Insight:**

*The company’s workforce shows a 60:40 male-to-female ratio, which reflects some level of gender diversity but also highlights a gender gap. This could indicate:*

- An opportunity to promote gender balance through targeted recruitment, especially in roles or departments (No department Column though) where women are underrepresented.

- A need to explore policies for inclusion and equity to encourage retention and growth for female employees.

**Are there any patterns in leave-taking behavior among employees?**

In [44]:
leave_counts = df['LeaveOrNot'].value_counts()
leave_percentage = df['LeaveOrNot'].value_counts(normalize=True) * 100
leave_percentage

LeaveOrNot
0    65.613583
1    34.386417
Name: proportion, dtype: float64

In [50]:
education_leave = df.groupby('Education')['LeaveOrNot'].value_counts(normalize=True).unstack() * 100
education_leave

LeaveOrNot,0,1
Education,Unnamed: 1_level_1,Unnamed: 2_level_1
Bachelors,68.647598,31.352402
Masters,51.202749,48.797251
PHD,74.860335,25.139665


In [52]:
gender_leave = df.groupby('Gender')['LeaveOrNot'].value_counts(normalize=True).unstack() * 100
gender_leave 

LeaveOrNot,0,1
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,52.853333,47.146667
Male,74.226062,25.773938


In [58]:
payment_leave = df.groupby('PaymentTier')['LeaveOrNot'].value_counts(normalize=True).unstack() * 100
payment_leave

LeaveOrNot,0,1
PaymentTier,Unnamed: 1_level_1,Unnamed: 2_level_1
1,63.374486,36.625514
2,40.087146,59.912854
3,72.479954,27.520046


In [60]:
city_leave = df.groupby('City')['LeaveOrNot'].value_counts(normalize=True).unstack() * 100
city_leave

LeaveOrNot,0,1
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Bangalore,73.294434,26.705566
New Delhi,68.366465,31.633535
Pune,49.605678,50.394322


In [64]:
corr = df['ExperienceInCurrentDomain'].corr(df['LeaveOrNot'])
print(f"Correlation between Experience and Leave: {corr:.4f}")

Correlation between Experience and Leave: -0.0305


**Overall Leave Rate**

- 65.6% of employees stay, while 34.4% leave. This means roughly 1 in 3 employees leave, which is a substantial proportion for HR to address.

**Insight:**

- Master’s degree holders have the highest leave rate (48.8%), suggesting they might be more mobile in the job market or seeking roles that match their advanced qualifications.

- PhD holders show the highest retention (74.9%), likely because of specialized roles and long-term career commitments.

- Female employees are almost twice as likely to leave (47.1%) compared to male employees (25.8%).
This could signal potential issues with workplace inclusion, flexibility, or advancement opportunities for women (HR could look into this) 

- Employees in Payment Tier 2 have the highest attrition (59.9%), possibly because they’re in mid-level roles and are seeking higher compensation or growth opportunities.

- Employees in Tier 3 (highest salary) have stronger retention, likely due to job satisfaction and seniority

- Pune has the highest leave rate (50.4%), meaning 1 in 2 employees in Pune tend to leave.

- Bangalore employees are the most stable (26.7% leave rate).

*This suggests location-specific factors (e.g., job market competitiveness, cost of living) may influence attrition.*

- Correlation coefficient = -0.0305

*This very weak negative correlation suggests experience in the current domain does not strongly influence whether employees leave or stay. Other factors (education, gender, location) are likely more important.*

**Key Takeaways**

- Female employees and those in Payment Tier 2 are at the highest risk of leaving.

- Pune is a hotspot for attrition compared to Bangalore and New Delhi.

- Master’s degree holders are more likely to leave than employees with other qualifications.



**Analysis Done by Ediomo Etesin**

*ediomoetesin40@gmail.com*