# Decision Tree Classifier with Employee Attrition Dataset

In this notebook, we will build a decision tree classifier using the scikit-learn library. We will use a hypothetical employee attrition dataset for this example.

## Import Libraries
First, let's import the necessary libraries.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score


## Load and Explore the Dataset
Next, we will load the employee attrition dataset and explore its contents.

In [4]:
# Load the dataset
df = pd.read_csv('employee_attrition.csv')

# Display the first few rows of the DataFrame
df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,EducationField,Gender,HourlyRate,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,OverTime,TotalWorkingYears
0,41,Yes,Travel_Rarely,1102,Sales,Life Sciences,Female,94,Sales Executive,4,Single,5993,19479,8,Yes,8
1,49,No,Travel_Frequently,279,Research & Development,Life Sciences,Male,61,Research Scientist,2,Married,5130,24907,1,No,10
2,37,Yes,Travel_Rarely,1373,Research & Development,Other,Male,92,Laboratory Technician,3,Single,2090,2396,6,Yes,7
3,33,No,Travel_Frequently,1392,Research & Development,Life Sciences,Female,56,Research Scientist,3,Married,2909,23159,1,Yes,8
4,27,No,Travel_Rarely,591,Research & Development,Medical,Male,40,Laboratory Technician,2,Married,3468,16632,9,No,6


## Preprocess the Data
We need to preprocess the data, including handling categorical variables and missing values.

In [None]:
# For simplicity, let's assume there are no missing values in this example.
# If there are, you can handle them with techniques like imputation or dropping rows/columns.

from sklearn.calibration import LabelEncoder

#Perform conversion of categorical variables
# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the columns that does not have values
columns_to_convert = ['BusinessTravel','Department','EducationField','Gender','JobRole','JobSatisfaction','MaritalStatus','OverTime']
for column in columns_to_convert:
    df[column] = label_encoder.fit_transform(df[column])
df.head()


Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,EducationField,Gender,HourlyRate,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,OverTime,TotalWorkingYears
0,41,Yes,2,1102,2,1,0,94,7,3,2,5993,19479,8,1,8
1,49,No,1,279,1,1,1,61,6,1,1,5130,24907,1,0,10
2,37,Yes,2,1373,1,4,1,92,2,2,2,2090,2396,6,1,7
3,33,No,1,1392,1,1,0,56,6,2,1,2909,23159,1,1,8
4,27,No,2,591,1,3,1,40,2,1,1,3468,16632,9,0,6


## Split the Dataset
We will split the dataset into training and testing sets.

In [None]:
# Split the dataset into training and testing sets
y = df['Attrition']
# drop the outcome column
x = df.drop('Attrition', axis=1)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)


339


##  Modify the code below to train the Decision Tree Classifier and leverage MLFlow to track 
### &emsp; 1. the experiment run 
### &emsp; 2. by logging hyperparameters, metrics.
### &emsp; 3. Register the model

In [None]:
#Import all the required mlflow related libraries

# modify the code below

# Create and train the decision tree classifier
dt_classifier = DecisionTreeClassifier(max_depth=4,random_state=42)
dt_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred_dt = dt_classifier.predict(X_test)

# Calculate the accuracy of the decision tree model
accuracy_dt = accuracy_score(y_test, y_pred_dt)

Decision Tree Accuracy: 86.39%
['No']




##  Load the Registered model and version
###  &emsp;Try to make a prediction with this loaded model

In [None]:
## Write your code here