# Decision Tree Classifier with Employee Attrition Dataset

In this notebook, we will build a decision tree classifier using the scikit-learn library. We will use a hypothetical employee attrition dataset for this example.

## Import Libraries
First, let's import the necessary libraries.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score


## Load and Explore the Dataset
Next, we will load the employee attrition dataset ('employee_attrition_small.csv') and explore its contents.

In [None]:
df = pd.read_csv('employee_attrition_small.csv')
df.head()

## Preprocess the Data
We need to preprocess the data, including handling categorical variables and missing values.

In [None]:
print(df.isnull().sum())
cat_cols = ['BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'OverTime']

le = LabelEncoder()
for col in cat_cols:
    df[col] = le.fit_transform(df[col])
le_target = LabelEncoder()
df['Attrition'] = le_target.fit_transform(df['Attrition'])


## Split the Dataset
We will split the dataset into training and testing sets.

In [None]:
# drop the outcome column

# Split the dataset into training and testing sets
x = df.drop('Attrition', axis=1)
y = df['Attrition']

## Train and evaluate the Decision Tree Model
## Please not that the maximum depth shouldn't be greater than 3
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
dt_classifier = DecisionTreeClassifier(max_depth=3,random_state=42)
dt_classifier.fit(x_train, y_train)


In [None]:
# Create and train the decision tree classifier


# Make predictions on the test set

# Calculate the accuracy of the decision tree model
y_pred = dt_classifier.predict(x_test)
print(y_pred)
acc = accuracy_score(y_test, y_pred)
print(f"accuracy: {acc}")


