# CYBR 486 - Lab #6: Decision Trees

## Objective
This lab explores decision tree classifiers by:
1. Viewing the dataset summary.
2. Encoding categorical variables into numeric representations.
3. Splitting the dataset into training and testing sets.
4. Building and training decision tree models using both 'gini index' and 'entropy' criteria.
5. Evaluating the models using accuracy, recall, and precision.
6. Visualizing the decision trees using Graphviz.

---

## Required Imports


In [32]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import graphviz

## Step 1: Dataset Overview

In [33]:
# Load the dataset
dataset = pd.read_csv('car_evaluation.csv')

# Display dataset info
print(dataset.info())
print(dataset.head())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   buying_price       1728 non-null   object
 1   maintenance_cost   1728 non-null   object
 2   number_of_doors    1728 non-null   object
 3   number_of_persons  1728 non-null   object
 4   lug_boot           1728 non-null   object
 5   safety             1728 non-null   object
 6   decision           1728 non-null   object
dtypes: object(7)
memory usage: 94.6+ KB
None
  buying_price maintenance_cost number_of_doors number_of_persons lug_boot  \
0        vhigh            vhigh               2                 2    small   
1        vhigh            vhigh               2                 2    small   
2        vhigh            vhigh               2                 2    small   
3        vhigh            vhigh               2                 2      med   
4        vhigh            vhigh    

## Step 2: Preprocessing the Dataset

### Encoding Categorical Variables

Ordinal encoding is used for feature columns, while label encoding is used for the target column.


In [34]:
# Define the order for ordinal encoding
buying_price_order = ['low', 'med', 'high', 'vhigh']
maintenance_cost_order = ['low', 'med', 'high', 'vhigh']
number_of_doors_order = ['2', '3', '4', '5more']
number_of_persons_order = ['2', '4', 'more']
lug_boot_order = ['small', 'med', 'big']
safety_order = ['low', 'med', 'high']

# Create the ordinal encoder
ordinal_encoder = OrdinalEncoder(categories=[
    buying_price_order, maintenance_cost_order, number_of_doors_order, 
    number_of_persons_order, lug_boot_order, safety_order
])

# Encode the feature columns
car_X = dataset.iloc[:, :-1]
car_X_encoded = ordinal_encoder.fit_transform(car_X)

# Convert back to DataFrame
encoded_column_list = ["buying_price", "maintenance_cost", "number_of_doors", 
                       "number_of_persons", "lug_boot", "safety"]
car_X_encoded = pd.DataFrame(car_X_encoded, columns=encoded_column_list)
print(car_X_encoded.head())

# Encode the target column
car_y = dataset.iloc[:, -1]
label_encoder = LabelEncoder()
car_y_encoded = label_encoder.fit_transform(car_y)
car_y_encoded = pd.DataFrame(car_y_encoded, columns=['decision'])
print(car_y_encoded.head())


   buying_price  maintenance_cost  number_of_doors  number_of_persons  \
0           3.0               3.0              0.0                0.0   
1           3.0               3.0              0.0                0.0   
2           3.0               3.0              0.0                0.0   
3           3.0               3.0              0.0                0.0   
4           3.0               3.0              0.0                0.0   

   lug_boot  safety  
0       0.0     0.0  
1       0.0     1.0  
2       0.0     2.0  
3       1.0     0.0  
4       1.0     1.0  
   decision
0         2
1         2
2         2
3         2
4         2


## Step 3: Splitting the Dataset

In [35]:
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(car_X_encoded, car_y_encoded, 
                                                    test_size=0.2, random_state=42)


## Step 4: Building Decision Tree Models

### Decision Tree with Entropy Criterion


In [36]:
# Create and train the model
maxDepth = 6
classifier_tree_entropy = DecisionTreeClassifier(criterion='entropy', max_depth=maxDepth)
classifier_tree_entropy.fit(X_train, y_train)

# Make predictions
predictions_entropy = classifier_tree_entropy.predict(X_test)


### Decision Tree with Gini Index Criterion

In [37]:
# Create and train the model
classifier_tree_gini = DecisionTreeClassifier(criterion='gini', max_depth=maxDepth)
classifier_tree_gini.fit(X_train, y_train)

# Make predictions
predictions_gini = classifier_tree_gini.predict(X_test)


## Step 5: Model Evaluation


In [41]:
# Evaluate entropy-based tree
print("Entropy-Based Tree Evaluation:")
print(classification_report(y_test, predictions_entropy))

# Evaluate gini-based tree
print("Gini-Based Tree Evaluation:")
print(classification_report(y_test, predictions_gini))


Entropy-Based Tree Evaluation:
              precision    recall  f1-score   support

           0       0.91      0.73      0.81        83
           1       0.45      0.82      0.58        11
           2       0.96      0.98      0.97       235
           3       0.79      0.88      0.83        17

    accuracy                           0.91       346
   macro avg       0.78      0.85      0.80       346
weighted avg       0.93      0.91      0.92       346

Gini-Based Tree Evaluation:
              precision    recall  f1-score   support

           0       0.96      0.81      0.88        83
           1       0.45      0.82      0.58        11
           2       0.99      1.00      0.99       235
           3       0.79      0.88      0.83        17

    accuracy                           0.94       346
   macro avg       0.80      0.88      0.82       346
weighted avg       0.95      0.94      0.94       346



## Step 6: Visualizing the Decision Trees

In [44]:
# Export and visualize the decision tree
dot_data = export_graphviz(classifier_tree_entropy,
                           out_file=None,
                           feature_names=car_X_encoded.columns.tolist(),
                           class_names=label_encoder.classes_,
                           filled=True, rounded=True, special_characters=True)

# Generate the Graphviz visualization
graph = graphviz.Source(dot_data)
graph.render("entropy_tree")  # Saves as entropy_tree.pdf
graph.view()


ExecutableNotFound: failed to execute WindowsPath('dot'), make sure the Graphviz executables are on your systems' PATH