### Decision Trees

A Decision Tree is a popular supervised machine learning algorithm used for both classification and regression tasks. It recursively splits the data into subsets based on the most significant attribute at each level, creating a tree-like structure of decisions. Each node in the tree represents a decision based on a feature, and each leaf node represents the predicted output.

<u>Inner Workings of a Decision Tree:</u>

1. Choosing the Best Split:
At each node, the algorithm chooses the feature that provides the best split. The "best" split is typically determined by maximizing information gain (for classification) or minimizing variance (for regression).
    
    - ***Information Gain***: Measures the reduction in uncertainty about the target variable.
    - ***Variance***: Measures the homogeneity of the target variable in a node.

2. Recursive Splitting:
    - The chosen feature is used to split the data into subsets. This process is repeated recursively for each subset until a stopping condition is met.
    - Stopping conditions may include reaching a maximum depth, having a minimum number of samples in a node, or achieving perfect purity.
    
3. Leaf Nodes and Predictions:
    - For classification, the majority class in the leaf node is the predicted class.
    - For regression, the mean or median of the target values in the leaf node is the predicted value.

In [1]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Instantiate a Decision Tree classifier
decision_tree_classifier = DecisionTreeClassifier(random_state=42)

# Train the classifier
decision_tree_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = decision_tree_classifier.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_rep)

Accuracy: 1.0
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

