# Decision Tree

In this notebook, a decision tree classifier is applied. Decision trees are intuitive and interpretable models that split the data based on feature thresholds, making them especially useful for understanding decision logic and identifying key variables.

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, f1_score
from sklearn import tree
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_excel(r'combined_data_binary.xlsx', index_col=0)

In [3]:
# Separate features and target variable
X = df.drop('booked_energy_consultation', axis=1)
y = df['booked_energy_consultation']

# Identify numerical and categorical columns
numerical_cols = X.select_dtypes(include=['int64', 'float64']).columns
categorical_cols = X.select_dtypes(include=['object', 'category']).columns

# One-hot encode the categorical variables
encoder = OneHotEncoder(sparse_output=False)
categorical_encoded = encoder.fit_transform(X[categorical_cols])
categorical_encoded_df = pd.DataFrame(categorical_encoded, columns=encoder.get_feature_names_out(categorical_cols))

data_preprocessed = pd.concat([X[numerical_cols].reset_index(drop=True), categorical_encoded_df.reset_index(drop=True)], axis=1)

### Splitting the data into test and training set, training set 80%, test set 20%

In [4]:
X_train, X_test, y_train, y_test = train_test_split(data_preprocessed, y, test_size=0.3, random_state=42)

### Standardizing the Data

In [5]:
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### Creating the Decision Tree

In [6]:
# Create a decision tree classifier
dt_classifier = DecisionTreeClassifier(random_state=42)

# Train the classifier
dt_classifier.fit(X_train_scaled, y_train)

#### Confusion Matrix and Classification Report

In [7]:
# Predictions
y_pred = dt_classifier.predict(X_test_scaled)

# Evaluation
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report Decision Tree:")
print(classification_report(y_test, y_pred))

Confusion Matrix:
[[1367  107]
 [ 110  666]]

Classification Report Decision Tree:
              precision    recall  f1-score   support

       False       0.93      0.93      0.93      1474
        True       0.86      0.86      0.86       776

    accuracy                           0.90      2250
   macro avg       0.89      0.89      0.89      2250
weighted avg       0.90      0.90      0.90      2250



The decision tree achieved an overall accuracy of 90%, performing similarly to the logistic regression model. The model shows strong precision, with a score of 86% for the positive class ("True").