# Data Preparation for Predicting Employee Attrition 
*By Bhavya Bhargava*<br>

Employee attrition prediction is crucial for organizations aiming to maintain a strong and committed workforce. By leveraging data-driven insights, HR analysts can identify key factors that influence employee departures and take proactive steps to improve retention. Preparing the IBM employee attrition dataset is a critical step in this process, ensuring that the data is clean, structured, and ready for meaningful analysis. Proper data preparation—such as handling missing values, encoding categorical variables, and normalizing numerical features—enhances model accuracy and reliability. By refining the dataset, we can build predictive models that help HR teams make informed decisions, reduce turnover costs, and create a more engaged work environment.

### About the Dataset:
#### Dataset Source:
The IBM HR Employee Attrition dataset is publicly available and is often used for predictive analytics and HR analytics. It contains various employee-related attributes, helping organizations understand factors influencing attrition.

This Dataset was Orginally developed for IBM Watson Studio by Saishruthi Swaminathan, Rich Hagarty but has since been made available to people openly on Kaggle to work on and get insights from.

_Sources:_ <br>
[Visit IBM Developer Page](https://developer.ibm.com/patterns/data-science-life-cycle-in-action-to-solve-employee-attrition-problem/)<br>
[Visit Kaggle Page](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset/data)

_*Let's start with the preparation...*_

**Step 1**: Setting up the environment for Processing and Proper displaying of data


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
%matplotlib inline

# Set display options for better notebook readability
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 1000)

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

**Step 2**: Loading the Dataset for processing

In [2]:
print("Loading the dataset...")
df = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')

Loading the dataset...
