# Decision Tree Classifier with Employee Attrition Dataset

In this notebook, we will build a decision tree classifier using the scikit-learn library. We will use a hypothetical employee attrition dataset for this example.

## Import Libraries
First, let's import the necessary libraries.

In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

## Load and Explore the Dataset
Next, we will load the employee attrition dataset ('employee_attrition_small.csv') and explore its contents.

In [8]:
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv('employee_attrition_small.csv')
df.head()
le = LabelEncoder()
df['Attrition'] = le.fit_transform(df['Attrition'])  

categorical_cols = ['BusinessTravel', 'Department', 'EducationField', 
                    'Gender', 'JobRole', 'MaritalStatus', 'OverTime']

for col in categorical_cols:
    df[col] = LabelEncoder().fit_transform(df[col])

df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,EducationField,Gender,HourlyRate,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,OverTime,TotalWorkingYears
0,41,1,2,1102,2,1,0,94,7,4,2,5993,19479,8,1,8
1,49,0,1,279,1,1,1,61,6,2,1,5130,24907,1,0,10
2,37,1,2,1373,1,4,1,92,2,3,2,2090,2396,6,1,7
3,33,0,1,1392,1,1,0,56,6,3,1,2909,23159,1,1,8
4,27,0,2,591,1,3,1,40,2,2,1,3468,16632,9,0,6


## Preprocess the Data
We need to preprocess the data, including handling categorical variables and missing values.

In [9]:
df = df.dropna()

## Split the Dataset
We will split the dataset into training and testing sets.

In [11]:
# drop the outcome column

# Split the dataset into training and testing sets
features = df.drop('Attrition', axis=1)
target = df['Attrition']
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3, random_state=42)


## Train and evaluate the Decision Tree Model
## Please not that the maximum depth shouldn't be greater than 3

In [12]:
# Create and train the decision tree classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Calculate the accuracy of the decision tree model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.76
