# HR Analytics - Employee Performance & Retention Analysis

## 1. Problem Statement

The objective of this project is to analyse HR data to assess employee performance, identify trends influencing retention, and provide actionable recommendations to improve employee productivity and reduce turnover.

Specifically, this project aims to:
- Identify performance patterns based on factors such as age, education, department, and training.
- Examine retention trends based on employee characteristics including tenure, performance ratings, and awards.
- Analyse relationships between key employee metrics to recommend strategies for improving employee engagement and retention.

## 2. Importing Libraries & Loading Dataset

In [4]:
import pandas as pd
df = pd.read_csv(r'D:\Projects\employees_dataset.csv')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17417 entries, 0 to 17416
Data columns (total 13 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   employee_id            17417 non-null  int64  
 1   department             17417 non-null  object 
 2   region                 17417 non-null  object 
 3   education              16646 non-null  object 
 4   gender                 17417 non-null  object 
 5   recruitment_channel    17417 non-null  object 
 6   no_of_trainings        17417 non-null  int64  
 7   age                    17417 non-null  int64  
 8   previous_year_rating   16054 non-null  float64
 9   length_of_service      17417 non-null  int64  
 10  KPIs_met_more_than_80  17417 non-null  int64  
 11  awards_won             17417 non-null  int64  
 12  avg_training_score     17417 non-null  int64  
dtypes: float64(1), int64(7), object(5)
memory usage: 1.7+ MB


In [5]:
df.head()

Unnamed: 0,employee_id,department,region,education,gender,recruitment_channel,no_of_trainings,age,previous_year_rating,length_of_service,KPIs_met_more_than_80,awards_won,avg_training_score
0,8724,Technology,region_26,Bachelors,m,sourcing,1,24,,1,1,0,77
1,74430,HR,region_4,Bachelors,f,other,1,31,3.0,5,0,0,51
2,72255,Sales & Marketing,region_13,Bachelors,m,other,1,31,1.0,4,0,0,47
3,38562,Procurement,region_2,Bachelors,f,other,3,31,2.0,9,0,0,65
4,64486,Finance,region_29,Bachelors,m,sourcing,1,30,4.0,7,0,0,61


In [6]:
df.shape

(17417, 13)

## 3. Dataset Overview

**Source:**  
The dataset is sourced from Internshala Trainings – HR Analytics Case Study and provided in CSV (.csv) format.

**Dataset Shape:**  
- **Records:** 17,417 employees  
- **Features:** 13 columns  

**Column Summary:**
- `employee_id`: Unique employee identifier
- `department`: Employee’s department
- `region`: Employee location/region code
- `education`: Education qualification
- `gender`: Employee gender
- `recruitment_channel`: Hiring source
- `no_of_trainings`: Trainings completed in one year
- `age`: Employee age
- `previous_year_rating`: Last year's performance rating (1–5)
- `length_of_service`: Work tenure in years
- `KPIs_met_more_than_80`: KPI performance above 80% (0/1)
- `awards_won`: Awards received (0/1)
- `avg_training_score`: Training score out of 100


## 4. Data Quality & Preprocessing

### 4.1 Missing Values Check