# Decision Tree Classifier with Employee Attrition Dataset

In this notebook, we will build a decision tree classifier using the scikit-learn library. We will use a hypothetical employee attrition dataset for this example.

## Import Libraries
First, let's import the necessary libraries.

In [20]:
!pip install numpy pandas scikit-learn


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

from sklearn.preprocessing import LabelEncoder

from sklearn.tree import DecisionTreeClassifier, plot_tree




## Load and Explore the Dataset
Next, we will load the employee attrition dataset ('employee_attrition_small.csv') and explore its contents.

In [7]:
df = pd.read_csv("employee_attrition_small.csv")
print(df)

      Age Attrition  ... OverTime  TotalWorkingYears
0      41       Yes  ...      Yes                  8
1      49        No  ...       No                 10
2      37       Yes  ...      Yes                  7
3      33        No  ...      Yes                  8
4      27        No  ...       No                  6
...   ...       ...  ...      ...                ...
1465   36        No  ...       No                 17
1466   39        No  ...       No                  9
1467   27        No  ...      Yes                  6
1468   49        No  ...       No                 17
1469   34        No  ...       No                  6

[1470 rows x 16 columns]


## Preprocess the Data
We need to preprocess the data, including handling categorical variables and missing values.

In [15]:
df_processed = pd.get_dummies(df, drop_first=True)
print(df_processed)

      Age  DailyRate  ...  MaritalStatus_Single  OverTime_Yes
0      41       1102  ...                  True          True
1      49        279  ...                 False         False
2      37       1373  ...                  True          True
3      33       1392  ...                 False          True
4      27        591  ...                 False         False
...   ...        ...  ...                   ...           ...
1465   36        884  ...                 False         False
1466   39        613  ...                 False         False
1467   27        155  ...                 False          True
1468   49       1023  ...                 False         False
1469   34        628  ...                 False         False

[1470 rows x 30 columns]


## Split the Dataset
We will split the dataset into training and testing sets.

In [16]:
# drop the outcome column
X = df_processed.drop("Attrition_Yes", axis=1)
y = df_processed["Attrition_Yes"]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## Train and evaluate the Decision Tree Model
## Please not that the maximum depth shouldn't be greater than 3

In [22]:
# Create and train the decision tree classifier
model = DecisionTreeClassifier(
    criterion="gini",  
    max_depth=3,        
    random_state=42
)
model.fit(X_train, y_train)
# Make predictions on the test set
pred = model.predict(X_test)
print(pred)
# Calculate the accuracy of the decision tree model
print(model.score(X_test, y_test))

[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False  True False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False  True False False False
 False False False False False False False False False False False False
 False False False False False False False False Fa