# Decision Tree 

* A Decision Tree is a supervised machine learning algorithm used for classification and regression tasks. It splits data into subsets based on feature values to make predictions.  
* To decide where to split data, Decision Trees use impurity measures such as:      
  * Entropy (Information Gain)
  * Gini index
* This measures help determine the best feature to split in by evaluating how "pure" and "impure" a node is.  

# Entropy in Decisioin Tree

* Entropy is measure of uncertainty or randomness in the dataset. It helps determine how mixed the classes are within a given node.
  * If a dataset is pure (all samples belong to the same class), entropy is zero.
  * if a dataset is completely impure (equal distribution of classed), entropy is maximum.
  * The goal of a Decision Tree is to reduce entropy by splitting the dataset in a way that increases class purity.
* To determine the best split, the tree calculates Information Gain, which measures how much entropy decreases after a split. A higher Information Gain means a better split.

# Gini Index 

* The Gini Index is another measure of used in Decision Trees. It represents the probability of randomly classifying a sample incorrectly if labels are randomly assigned based on the class distribution.
  * A pure node has a Gini Index of Zero (all samples belong to one class).
  * A node with equal distribution of classes has a higher Gini value, indicating more impuring.
* Gini Index is computationally simpler than entropy, making it the the preferred criterion in many Decision Tree implementations.Unlike entropy, it does not involve logarithmic calculations, making it faster in some cases.

# Why use Entropy and Gini Index?

* Entropy and Gini Index help the Decision Tree identify the best feature for splitting data:
  * Entropy (with Information Gain):  
    ensures that the split maximizes information gained.
  * Gini Index:  
    Focuses on minimizing misclassification probability.
* Both methods aim to create homogeneous subsets where most sapmples belong to the same class, improving the model's accuracy.

# What Do Entropy and Gini Index Do?

* They measure impurity in the dataset.
* They help the Decision Tree decide where to split the data.
* The work by selecting the feature that leads to the most pure child nodes.
* They improve classification accuracy by making the tree more efficient in seperating different classes.

# Import Libraries

In [35]:
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score, recall_score, accuracy_score, classification_report

# Load Dataset

In [36]:
# Load the Wine dataset
wine = load_wine()

In [37]:
# Convert the dataset to a Pandas DataFrame
df = pd.DataFrame(wine.data, columns=wine.feature_names)
df['target'] = wine.target

In [38]:
df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


# Train-Test Split

In [44]:
# Split the data into training and test sets
X = wine.data
y = wine.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model Building

In [53]:
# Function to train and evaluate the model
def evaluate_decision_tree(criterion):
    # Train the model with specified criterion (Gini or Entropy)
    clf = DecisionTreeClassifier(random_state=42, criterion=criterion)
    clf.fit(X_train, y_train)
    
    # Make predictions
    y_pred = clf.predict(X_test)
    
    # Calculate precision, recall, and accuracy
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    accuracy = accuracy_score(y_test, y_pred)
    
    # Create a dictionary to hold the results in a structured way
    results = {
        'Accuracy': [accuracy],
        'Precision': [precision],
        'Recall': [recall]
    }

    # Convert results to a Pandas DataFrame
    results_df = pd.DataFrame(results)
    
    # Generate the classification report
    classification_rep = classification_report(y_test, y_pred, target_names=wine.target_names, output_dict=True)
    classification_df = pd.DataFrame(classification_rep).transpose()

    return results_df, classification_df

In [54]:
# Evaluate using Gini
gini_results, gini_classification_report = evaluate_decision_tree('gini')

In [55]:
# Evaluate using Entropy
entropy_results, entropy_classification_report = evaluate_decision_tree('entropy')

In [56]:
# Display Results for Gini and Entropy
print("Gini Criteria - Model Performance Metrics:")
print(gini_results)
print("\nGini Classification Report:")
print(gini_classification_report)

print("\nEntropy Criteria - Model Performance Metrics:")
print(entropy_results)
print("\nEntropy Classification Report:")
print(entropy_classification_report)

Gini Criteria - Model Performance Metrics:
   Accuracy  Precision    Recall
0  0.962963   0.963805  0.962963

Gini Classification Report:
              precision    recall  f1-score    support
class_0        0.947368  0.947368  0.947368  19.000000
class_1        0.954545  1.000000  0.976744  21.000000
class_2        1.000000  0.928571  0.962963  14.000000
accuracy       0.962963  0.962963  0.962963   0.962963
macro avg      0.967305  0.958647  0.962359  54.000000
weighted avg   0.963805  0.962963  0.962835  54.000000

Entropy Criteria - Model Performance Metrics:
   Accuracy  Precision    Recall
0  0.851852   0.855205  0.851852

Entropy Classification Report:
              precision    recall  f1-score    support
class_0        0.818182  0.947368  0.878049  19.000000
class_1        0.894737  0.809524  0.850000  21.000000
class_2        0.846154  0.785714  0.814815  14.000000
accuracy       0.851852  0.851852  0.851852   0.851852
macro avg      0.853024  0.847536  0.847621  54.000000
we