# Decision Tree Classifier with Employee Attrition Dataset

In this notebook, we will build a decision tree classifier using the scikit-learn library. We will use a hypothetical employee attrition dataset for this example.

## Import Libraries
First, let's import the necessary libraries.

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score


## Load and Explore the Dataset
Next, we will load the employee attrition dataset ('employee_attrition_small.csv') and explore its contents.

In [5]:
import pandas as pd
df = pd.read_csv('employee_attrition_small.csv')

## Preprocess the Data
We need to preprocess the data, including handling categorical variables and missing values.

In [None]:
from sklearn.calibration import LabelEncoder
print(df.isnull().sum())
X = df.drop(columns=['Attrition'], axis=1)
y = df['Attrition']

le = LabelEncoder()
X['BusinessTravel'] = le.fit_transform(df['BusinessTravel'])
X['Department'] = le.fit_transform(df['Department'])
X['EducationField'] = le.fit_transform(df['EducationField'])
X['JobRole'] = le.fit_transform(df['JobRole'])
X['MaritalStatus'] = le.fit_transform(df['MaritalStatus'])
X['Gender'] = le.fit_transform(df['Gender'])
X['OverTime'] = le.fit_transform(df['OverTime'])

#Age,Attrition,BusinessTravel,DailyRate,Department,EducationField,Gender,HourlyRate,JobRole,
#JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,OverTime,TotalWorkingYears
print(X)


Age                   0
Attrition             0
BusinessTravel        0
DailyRate             0
Department            0
EducationField        0
Gender                0
HourlyRate            0
JobRole               0
JobSatisfaction       0
MaritalStatus         0
MonthlyIncome         0
MonthlyRate           0
NumCompaniesWorked    0
OverTime              0
TotalWorkingYears     0
dtype: int64
      Age  BusinessTravel  DailyRate  Department  EducationField  Gender  \
0      41               2       1102           2               1       0   
1      49               1        279           1               1       1   
2      37               2       1373           1               4       1   
3      33               1       1392           1               1       0   
4      27               2        591           1               3       1   
...   ...             ...        ...         ...             ...     ...   
1465   36               1        884           1               3       

## Split the Dataset
We will split the dataset into training and testing sets.

In [8]:
# drop the outcome column

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Train and evaluate the Decision Tree Model
## Please not that the maximum depth shouldn't be greater than 3

In [9]:
# Create and train the decision tree classifier
dt_classifier = DecisionTreeClassifier(max_depth=3,max_leaf_nodes=10,random_state=42)
dt_classifier.fit(X_train, y_train)


# Make predictions on the test set
y_pred_dt = dt_classifier.predict(X)

# Calculate the accuracy of the decision tree model
accuracy_dt = accuracy_score(y, y_pred_dt)
print(f'Decision Tree Accuracy: {accuracy_dt:.2f}')

Decision Tree Accuracy: 0.86
