## Employee Attrition Prediction Using SVM

## Introduction

Employee attrition is a crucial factor in workforce management. Predicting attrition helps organizations take proactive steps to retain employees. This report explores an employee attrition dataset and applies an SVM (Support Vector Machine) classification model to predict employee gender based on various features.

## Libraries Used

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

## Dataset Overview

The dataset consists of various employee attributes such as:

- **Gender**: Male or Female (Target variable for classification)

- **Marital_Status**: Marital status of the employee

- **Department**: Department in which the employee works

- **Job_Role**: Specific role of the employee

- **Overtime**: Whether the employee does overtime or not

- **Attrition**: Whether the employee left the company or not

## Dataset Shape

In [2]:
data = pd.read_csv(r"C:\Users\Shaik Sakhlaih\Downloads\employee_attrition_dataset.csv")
print("Dataset Shape:", data.shape)

Dataset Shape: (1000, 26)


## Checking for Null Values

print("Missing Values:")
print(data.isnull().sum())



In [3]:
print("Missing Values:")
print(data.isnull().sum())

Missing Values:
Employee_ID                      0
Age                              0
Gender                           0
Marital_Status                   0
Department                       0
Job_Role                         0
Job_Level                        0
Monthly_Income                   0
Hourly_Rate                      0
Years_at_Company                 0
Years_in_Current_Role            0
Years_Since_Last_Promotion       0
Work_Life_Balance                0
Job_Satisfaction                 0
Performance_Rating               0
Training_Hours_Last_Year         0
Overtime                         0
Project_Count                    0
Average_Hours_Worked_Per_Week    0
Absenteeism                      0
Work_Environment_Satisfaction    0
Relationship_with_Manager        0
Job_Involvement                  0
Distance_From_Home               0
Number_of_Companies_Worked       0
Attrition                        0
dtype: int64


- The dataset was checked for missing values using `.isnull().sum()`, and no missing values were found.

## Data Preprocessing

Since machine learning models work best with numerical data, categorical variables were converted into numerical values using `LabelEncoder` from `sklearn.preprocessing`:

In [4]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

data['Gender'] = le.fit_transform(data['Gender'])
data['Marital_Status'] = le.fit_transform(data['Marital_Status'])
data['Department'] = le.fit_transform(data['Department'])
data['Job_Role'] = le.fit_transform(data['Job_Role'])
data['Overtime'] = le.fit_transform(data['Overtime'])
data['Attrition'] = le.fit_transform(data['Attrition'])

## Model Building

### Splitting the Dataset

The dataset was split into training and testing sets using `train_test_split` from `sklearn.model_selection`:

In [5]:
from sklearn.model_selection import train_test_split

x = data.drop(['Gender'], axis=1)
y = data['Gender']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

print("x_train:", x_train.shape)
print("x_test:", x_test.shape)
print("y_train:", y_train.shape)
print("y_test:", y_test.shape)

x_train: (800, 25)
x_test: (200, 25)
y_train: (800,)
y_test: (200,)


### Feature Scaling

To standardize the dataset, `StandardScaler` from `sklearn.preprocessing` was used to normalize features.

In [6]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

## Training the SVM Model

The `SVC` classifier from `sklearn.svm` was used with a **linear kernel**:

In [7]:
from sklearn.svm import SVC

model = SVC(kernel='linear', random_state=1)
model.fit(x_train, y_train)

## Model Evaluation

After training the model, predictions were made using `model.predict(x_test)`, and the accuracy was calculated using `accuracy_score` from `sklearn.metrics`: