<a href="https://colab.research.google.com/github/augusto-silva199/ML_course_assets/blob/main/DecisionTreesTut.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üå≤ A Tutorial on Decision Trees
---
> **Author:** Augusto Silva  
> **Version:** 1.0 | **Date:** February 2026  
> *Developed with the support of Gemini AI*

---
### üß¨ Clinical Context
This tutorial introduces **Decision Trees (DTs)** as transparent "Classifier Engines." Unlike "black-box" AI, DTs mirror the logical flow of medical decision-making.

**Note:** We are utilizing a "well-behaved" version of the Wisconsin Breast Cancer dataset (balanced classes, no missing values) to focus specifically on the mechanics of Gini Impurity and Entropy.

In [None]:
# @title üõ†Ô∏è Setup
import ipywidgets as widgets
from IPython.display import display, clear_output


# Define the button
setup_button = widgets.Button(
    description='Initialize Environment',
    button_style='info',
    icon='gears'
)
out = widgets.Output()

def on_setup_clicked(b):
    with out:
        clear_output()
        print("‚è≥ Loading medical AI libraries...")

        # The actual heavy lifting
        global pd, np, plt, sns, train_test_split, DecisionTreeClassifier
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        import seaborn as sns
        from sklearn.model_selection import train_test_split
        from sklearn.tree import DecisionTreeClassifier
        from sklearn.metrics import accuracy_score

        print("‚úÖ Success: Libraries imported.")

setup_button.on_click(on_setup_clicked)
display(setup_button, out)

# General Concepts

Decision Trees are a widely used methodology in machine learning to solve classification and regression problems. The model is organized in the form of a hierarchical structure consisting of nodes and branches, where each node represents a decision based on a data attribute.
This method stands out for its interpretability, conceptual simplicity and ability to model nonlinear relationships between variables. It is an interesting starting point for medical contexts where decision making workflows often appear to depend on decision trees.

<p align="center">
  <img src="https://raw.githubusercontent.com/augusto-silva199/ML_course_assets/main/Figs/DT/examplemedical.png" width="400">
  <br>
  <em>Figure 1: A sample medical Decision Tree</em>
</p>

# Tree structure
A tree is composed of:
*   A Root node: the starting point, where the first division of the data is made.
*   Internal nodes: points where additional decisions occur.
*   Leaves: terminal nodes that represent the predicted class (in the classification) or a numerical value (in the regression).
*   Branches: paths that represent the result of the decisions made at each node.

So at each node a subset partition is performed considering a decision rule. A subset is made more pure if it contains even more elements of a class or of an numeric interval.
So the criteria to make a proper split is to maximize the purity of the subsets created.

<p align="center">
  <img src="https://raw.githubusercontent.com/augusto-silva199/ML_course_assets/main/Figs/DT/IdealDTree.png" width="700">
  <br>
  <em>Figure 2: Left: An abstract two feature Decision Tree with 4 leaf nodes (classes). Right: Ideal feature space partioning. </em>
</p>

# Split criteria
In the learning phase at each node, one feature of the trining dataset is chosen to split training examples into distinct classes as much as possible. So a DT is grown by spliting the nodes up to a point where the subsets are clearly dominated by a class. Once the leaning is finished prediction may carried out. It just follows a unique path from root to a leaf (class)

<p align="center">
  <img src="https://raw.githubusercontent.com/augusto-silva199/ML_course_assets/main/Figs/DT/Goalofsplit.png" width="500">
  <br>
  <em>Figure 3: Testing decision rules. </em>
</p>

Decision tree algorithms (such as CART - Classification and Regression Trees ) select the best split based on **impurity** measures.
The two most commonly used impurity metrics in classification are Gini impurity and Entropy impurity.
## Gini impurity
The Gini impurity test assesses the degree of mixing between classes in a dataset. The lower the Gini value, the more homogeneous (purer) the dataset.
Mathematically, the Gini Index represents the probability of misclassifying a randomly chosen element from the set if it were randomly labeled according to the distribution of labels in the subset. For a set of data with $N$ classes, let $p_i$ be the probability (or proportion) of an item belonging to class $i$. The Gini Index $G$ is defined as:
$$G = \sum_{i=1}^{N} p_i (1 - p_i)$$Since $\sum p_i = 1$, we can simplify this to:
$$G = 1 - \sum_{i=1}^{N} p_i^2$$

### A clinical example for breast masses
Assume two classes in a large number of mammography cases: Malignant (M) and Benign (B). Let $p_1$ be the probability of Malignant. Let $p_2$ be the probability of Benign. The formula becomes:
$$G = 1 - (p_M^2 + p_B^2)$$
**Case A**: Perfect Purity (The Goal)

If a node subset contains only Malignant cases:
$$p_M = 1.0, p_B = 0$$
$$G = 1 - (1^2 + 0^2) = \mathbf{0}$$
**Case B**: Maximum Impurity (The "Coin Toss")

If a node subset is perfectly balanced (50/50):
$$p_M = 0.5, p_B = 0.5$$
$$G = 1 - (0.5^2 + 0.5^2) = 1 - (0.25 + 0.25) = \mathbf{0.5}$$

### Decreasing Impurity
So the DT algorithm at each node seeks to split to more pure children nodes. So the optimal split will attain a maximum impurity decrease. For example in a case of a binary split examining a feature "lesion diameter > T cm" the impurity deacrease is
$$\Delta G = G_{parent} - \sum_{k=1}^{2} \frac{N_{subset_k}}{N_{parent}} G_{subset_k}$$
The algorithm will look for the range of values in the "lesion diameter" feature and determines the T value that leads to best impurity decrease for this partiuclar decision rule.


## Entropy
Entropy measures the randomness or uncertainty or "chaotic status" of a set  of clinical data. In a binary classification task (like Malignant vs. Benign), the Entropy $H(S)$ of a set $S$ is defined by the following equation:
$$H(S) = - \sum_{i=1}^{c} p_i \log_2(p_i)$$
Where:
*   $S$: The current subset of patient data.
*   $c$: The number of classes (in this case, $c = 2$)
*   $p_i$: The proportion (probability) of samples belonging to class $i$ in the set $S$.  

### Information Gain
During a split at a node we seek to gain information in the sense that the resulytig subsets are less entropic. In plays the same role as impurity decreasing with the Gini index. They the same functional definition
$$\Delta H = H_{parent} - \sum_{k=1}^{2} \frac{N_{subset_k}}{N_{parent}} H_{subset_k}$$

Information Gain may represent the reduction in diagnostic uncertainty achieved by asking a specific question (e.g., Is the margin irregular?). The higher the gain, the more 'useful' that feature is for the final diagnosis.

# Gini vs Entropy
In Decision Trees, both Gini and Entropy serve the same purpose: they are objective functions used to measure the "disorder" in a group of patients. The algorithm uses these metrics to decide where to place its membership test (the split)

$$\begin{array}{|l|l|l|}
\hline
\mathbf{Feature} & \mathbf{Gini \ Impurity} & \mathbf{Information \ Entropy} \\ \hline
\text{Focus} & \text{Misclassification Probability} & \text{Uncertainty / Chaos} \\ \hline
\text{Math Definition} & G = 1 - \sum p_i^2 & H = -\sum p_i \log_2(p_i) \\ \hline
\text{Peak Value} & 0.5 \text{ (Total Mix)} & 1.0 \text{ (Total Mix)} \\ \hline
\text{Clinical Analogy} & \text{"Random Guessing Risk"} & \text{"Diagnostic Noise Level"} \\ \hline
\text{Goal} & \text{Maximize Impurity Decrease } (\Delta G) & \text{Maximize Information Gain } (IG) \\ \hline
\text{Computation} & \text{Fast (Quadratic)} & \text{Slower (Logarithmic)} \\ \hline
\end{array}$$

Clinical insights:

*   GINI: "How likely am I to be wrong?"
*   Entropy: "How much more do I know now?"
*   Neither is "better." They are different rulers measuring the same "Diagnostic Clarity." In clinical practice, if a feature is a strong predictor (like a BI-RADS 5 score), both Gini and Entropy will identify it as the best split immediately.



# About data types
One of the greatest strengths of a Decision Tree (DT) in a clinical setting is its "omnivore" nature‚Äîit handles various data types with almost zero pre-processing, which mimics how a radiologist combines different types of information.

** Data Type Flexibility**: The "Unfair Advantage"Unlike many algorithms that require everything to be a specific type of number, DTs adapt their "Membership Tests" to the data type:

*   Numerical (Continuous): As we discussed, the DT finds a threshold (e.g., $Radius > 14.5\text{ mm}$) to split the data.
*   Categorical (Ordinal/Nominal): For features like "Tissue Density" (Low, Medium, High) or "Shape" (Round, Irregular), the tree simply groups the categories into subsets that maximize purity

*   No Scaling Required: Because the tree looks at one feature at a time to make a split, it doesn't care if one feature is measured in millimeters and

$$\begin{array}{|l|l|l|l|}
\hline
\mathbf{Data \ Type} & \mathbf{Raw \ Input} & \mathbf{Membership \ Rule \ (The \ "Set")} & \mathbf{Logic \ Type} \\ \hline
\text{Numerical} & 14.8 \text{ mm} & x \in \{ \text{all values } \le 15 \} & \text{Boundary Split} \\ \hline
\text{Ordinal} & \text{BI-RADS 3} & x \in \{ \text{Score 1, 2, 3} \} & \text{Range Split} \\ \hline
\text{Categorical} & \text{"Irregular"} & x \in \{ \text{"Irregular", "Spiculated"} \} & \text{Group Split} \\ \hline
\end{array}$$




**The Probabilistic Starting Point**
A Decision Tree is essentially a machine that converts Global Probabilities into Local Certainty.

Before any splits occur, every patient has the same "Global" probability of being Malignant (e.g., the prevalence in your dataset). If the dataset is 30% Malignant, your starting point is $p = 0.3$.

**The Membership Filter**

Each split is a filter. By asking a question ($Area > 500$), the model creates two new "Sets.

*   Set A: Patients who met the criteria.
*   Set B: Patients who did not.

**The Probability Shift**
Inside these new sets, the Underlying Probability changes.

*   In Set A (large area), the Malignant probability might jump to $p = 0.85$
*   In Set B (small area), it might drop to $p = 0.05$.


#Prunning

Growing trees to make predictions within underlying local probability spaces should be a controlled process for the sake of robustness. Pruning is an algorithmic way of controlling tree growth.
Pruning is a preventive or "therapeutic" measure to ensure quality control and robustness

The goal of pruning is to achieve Parsimony. In radiography, we don't want a model that needs 50 different measurements to decide if a lesion is malignant; we want the 3 to 5 most robust indicators that work for every patient, not just the ones in our training set.

In clinical AI, we must decide when the "Information Gain" from a split is no longer representing a medical truth, but rather statistical noise.

$$\begin{array}{|l|l|l|}
\hline
\mathbf{Strategy} & \mathbf{Mechanism} & \mathbf{Clinical \ Analogy} \\ \hline
\text{Pre-Pruning} & \text{Stop-growth rules during training.} & \text{Setting a time limit on a diagnostic exam.} \\ \hline
\text{Post-Pruning} & \text{Trimming branches after full growth.} & \text{Editing a detailed report for clarity.} \\ \hline
\end{array}$$

**Pre-Pruning (Early Stopping)**

Pre-pruning is the most common method used in teaching. You set "Stop Signs" (Hyperparameters) that prevent the tree from becoming too deep or complex.


*   $\texttt{max_depth}$: Limits the number of "questions" in the sequence.
*   $\texttt{min_samples_leaf}$: Ensures every diagnostic "bucket" has a statistically significant number of patients (e.g., at least 5-10).

*   $\texttt{min_impurity_decrease}$:  Only allows a split if the "Information Gain" is higher than a certain threshold.

Just as we limit radiation dose to what is "As Low As Reasonably Achievable" (ALARA), we limit tree growth to what is "As Simple As Reasonably Accurate."

**Post-Pruning (Cost-Complexity)**

Post-pruning allows the tree to grow to its maximum potential (where every leaf is pure), and then systematically removes branches that do not significantly improve predictive power.



*   $\alpha$ (Complexity Parameter): A mathematical penalty added to the tree's score for every leaf it has.  
*   The Goal: Find the smallest tree that maintains the highest accuracy on unseen data

$$\text{Cost}(T) = \text{Error}(T) + \alpha |T|$$
Where $|T|$ is the number of terminal leaves.

$$\begin{array}{|l|l|l|}
\hline
\mathbf{Constraint} & \mathbf{Mechanism} & \mathbf{Clinical \ Result} \\ \hline
\texttt{max\_depth} & \text{Limits vertical growth} & \text{Prevents over-specialized rules} \\ \hline
\texttt{min\_samples\_leaf} & \text{Ensures minimum subset size} & \text{Guarantees statistical significance} \\ \hline
\texttt{ccp\_alpha} & \text{Applies a penalty to complexity} & \text{Surgically removes weak branches} \\ \hline
\end{array}$$












# Robustness and Generalization

This table summarizes how data characteristics and model constraints interact to produce a trustworthy diagnostic tool.

$$\begin{array}{|l|l|l|}
\hline
\mathbf{Pillar} & \mathbf{Technical \ Requirement} & \mathbf{Clinical \ Importance} \\ \hline
\text{Robustness} & \text{High } N \text{ (Sample Size)} & \text{Stability: Prevents rules based on outliers.} \\ \hline
\text{Generalization} & \text{Pruning (e.g., } \texttt{max\_depth}\text{)} & \text{Reliability: Ensures logic works on new patients.} \\ \hline
\text{Quality Control} & \text{Impurity Metrics (Gini/Entropy)} & \text{Objectivity: Standardizes the "Membership Tests."} \\ \hline
\text{Demographics} & \text{Representative Distribution} & \text{Equity: Avoids bias against specific subgroups.} \\ \hline
\end{array}$$

**Robustness**: The Law of Large Numbers

A "robust" model is one where the splits do not change drastically if you add or remove a few patients.
* The Math: Robustness depends on having enough samples in each node to ensure the Underlying Probabilities ($p_i$) are accurate.
* The Clinical Risk: Small datasets lead to "Spurious Correlations"‚Äîfinding a "rule" that only exists in your specific training set.

**Generalization**: The "Parsimony" Principle
Generalization is the model's ability to perform as well in the "Real World" as it did in the "Lab."

* The Therapy: We use Preventive measures (max_depth) and Therapeutic measures (ccp_alpha) to stop the tree from "memorizing" the training data.
* The Goal: A simpler tree is almost always more generalizable than a complex one.

**Dataset Demographics & Probability Distribution**
The "Intelligence" of the tree is limited by the Diversity of the input.

* The Bias Trap: If your dataset demographics are skewed (e.g., only patients over 60), the probability distributions the tree learns for "Malignancy" will not translate to a younger population.
* Class Imbalance: If 99% of your data is Benign, the tree will have a high "Global Purity" by simply ignoring the Malignant cases. We correct this using class_weight='balanced'.

In Radiology, we don't trust a single pixel; we look for patterns across the whole image. Similarly, in AI, we don't trust a single split; we ensure that split is supported by Robust data, a Balanced distribution, and Pruned logic.

# Use Case

The [Wisconsin Diagnostic Breast Cancer](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic) (WDBC) dataset is a widely used, publicly available binary classification dataset from the UCI Machine Learning Repository containing 569 instances (357 benign, 212 malignant). It features 30 numerical, real-valued features computed from digitized images of fine needle aspirate (FNA) of breast masses, representing characteristics of cell nuclei.

<p align="center">
  <img src="https://raw.githubusercontent.com/augusto-silva199/ML_course_assets/main/Figs/DT/BreastCancerNucleus.png" width="400">
  <br>
  <em>Figure 4: A sample case with several outlined nucleus
  </em>
</p>


**Key Details of the WDBC Dataset**
* Source: Collected in the early 1990s by Dr. William H. Wolberg at the University of Wisconsin Hospital.
* Classes: 569 total instances: 357 benign (B) and 212 malignant (M).
* Features: 30 numerical features (10 real-valued features, each with mean, standard error, and "worst" value) calculated from digitized images.
* Target: Binary classification: Malignant (M) or Benign (B).
* Usage: Frequently used for supervised machine learning, specifically in SVM, KNN, and decision tree classifiers to predict breast cancer.
* Features Included:
Radius (mean of distances from center to points on the perimeter)
Texture (standard deviation of gray-scale values)
Perimeter, Area, Smoothness, Compactness, Concavity, Concave points, Symmetry, and Fractal dimension.


In [None]:
#@title Load Data
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output

# 1. Define the Button
load_button = widgets.Button(
    description='Load Wisconsin Dataset',
    button_style='success', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to fetch data from GitHub',
    icon='database'
)

output = widgets.Output()

def on_button_clicked(b):
    with output:
        clear_output()
        try:
            # Replace 'YOUR_USERNAME' with your actual GitHub username
            url = "https://raw.githubusercontent.com/augusto-silva199/ML_course_assets/main/Datafiles/BreastC/breastc_data.csv"

            # Loading the data
            global df
            df = pd.read_csv(url)

            print("‚úÖ Data Loaded Successfully!")
            print(f"Total Patients: {df.shape[0]}")
            print(f"Diagnostic Features: {df.shape[1] - 1}")
            display(df.head(5))

        except Exception as e:
            print(f"‚ùå Error: Could not find the file. Check the URL path.\n{e}")

load_button.on_click(on_button_clicked)

# 2. Display the Interface
display(load_button, output)

## Elementary Statistics

In [None]:
# remove the "id" column"
df = df.drop(columns=['id'])
df.describe()

In [None]:
#@title Class Balance


# 1. Define the UI Elements
load_plot_button = widgets.Button(
    description='Analyze Balance',
    button_style='info',
    layout=widgets.Layout(width='250px'),
    icon='chart-bar'
)

output_area = widgets.Output()

def load_and_visualize(b):
    with output_area:
        clear_output()
        try:

            # Setting up the visual style
            sns.set_theme(style="whitegrid")
            fig, ax = plt.subplots(figsize=(5, 3))

            # 2. Create the Seaborn Bar Plot
            # Assuming your target column is named 'diagnosis'
            sns.countplot(data=df,
                         x='diagnosis',
                         hue='diagnosis',    # Link colors to the diagnosis categories
                         palette=['#3498db', '#e74c3c'],
                         legend=False,       # Remove the redundant legend
                         ax=ax)

            plt.title('Clinical Dataset Balance: Benign vs. Malignant', fontsize=9)
            plt.xlabel('Diagnosis', fontsize=9)
            plt.ylabel('Patient Count', fontsize=9)

            # Adding percentage labels on top of bars for "Robustness" context
            total = len(df)
            for p in ax.patches:
                percentage = f'{100 * p.get_height() / total:.1f}%'
                ax.annotate(percentage, (p.get_x() + p.get_width() / 2., p.get_height()/2),
                            ha='center', va='center', xytext=(0, 10), textcoords='offset points')

            plt.show()


        except Exception as e:
            print(f"‚ùå Error: Check the GitHub path or Column names.\n{e}")

load_plot_button.on_click(load_and_visualize)

# 3. Display
display(load_plot_button, output_area)

## Visual Insights

In [None]:
#@title Histograms
import ipywidgets as widgets
from ipywidgets import interact

def plot_feature_distribution(feature):
    plt.figure(figsize=(5, 3))

    # Create the overlapping histograms
    sns.histplot(data=df, x=feature, hue='diagnosis',
                 palette=['#3498db', '#e74c3c'],
                 kde=True, element="step", common_norm=False)

    # Formatting for clinical clarity
    plt.title(f'Diagnostic Distribution: {feature}', fontsize=12)
    plt.xlabel(f'{feature} Value', fontsize=12)
    plt.ylabel('Patient Frequency', fontsize=12)
    plt.grid(axis='y', alpha=0.3)
    plt.show()

# Create the dropdown menu using the column names (excluding 'diagnosis' and 'id')
features_list = [col for col in df.columns if col not in ['diagnosis', 'id']]

print("üìä Select a Clinical Predictor to analyze the class overlap:\n")
interact(plot_feature_distribution, feature=features_list);

In [None]:
#@title Scatter plots
import ipywidgets as widgets
from ipywidgets import interact

def plot_2d_interaction(feat_x, feat_y):
    plt.figure(figsize=(5, 3))

    # Create the scatter plot
    sns.scatterplot(data=df, x=feat_x, y=feat_y, hue='diagnosis',
                    palette=['#3498db', '#e74c3c'],
                    alpha=0.7, s=60, edgecolor='w')

    # Clinical Aesthetics
    plt.title(f'Diagnostic Space: {feat_x} vs {feat_y}', fontsize=10)
    plt.xlabel(feat_x, fontsize=10)
    plt.ylabel(feat_y, fontsize=10)
    plt.legend(title='Diagnosis', loc='upper right')
    plt.grid(True, linestyle='--', alpha=0.5)
    plt.show()

# List of columns to choose from
features_list = [col for col in df.columns if col not in ['diagnosis', 'id']]

print("üéØ Select two predictors to visualize the Decision Space:\n")
interact(plot_2d_interaction,
         feat_x=widgets.Dropdown(options=features_list, value='radius_mean'),
         feat_y=widgets.Dropdown(options=features_list, value='texture_mean'));

## Train and test

In [None]:
# @title
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import ipywidgets as widgets

def train_dt(criterion, max_depth, test_size):
    # 1. Prepare Data
    X = df.drop(columns=['diagnosis'])
    y = df['diagnosis']

    # 2. The Split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=42
    )

    # 3. The Classifier
    model = DecisionTreeClassifier(
        criterion=criterion,
        max_depth=max_depth,
        random_state=42
    )
    model.fit(X_train, y_train)

    # 4. Evaluation
    train_acc = accuracy_score(y_train, model.predict(X_train))
    test_acc = accuracy_score(y_test, model.predict(X_test))

    # Visual Output
    print(f"\nüè• CLINICAL MODEL AUDIT\n")
    print(f"{'='*30}")
    print(f"Logic: {criterion.title()} | Max Depth: {max_depth} | Test Split: {test_size*100}%\n")
    print(f"{'-'*30}")
    print(f"‚úÖ Training Accuracy: {train_acc:.2%}")
    print(f"üöÄ Testing Accuracy:  {test_acc:.2%}")

    # Robustness Feedback
    if train_acc > test_acc + 0.05:
        print("\n‚ö†Ô∏è STATUS: Overfitting. (The model is too 'narrow-minded')")
    elif train_acc < 0.85:
        print("\n‚ö†Ô∏è STATUS: Underfitting. (The model is too 'simplistic')")
    else:
        print("\nüíé STATUS: Robust. (Ideal balance for Clinical Use)")

# --- The "Silent" UI Logic ---
# Define the controls manually to avoid the function signature printout
style = {'description_width': 'initial'}
criterion_sel = widgets.Dropdown(options=['gini', 'entropy'], value='gini', description='Criterion:', style=style)
depth_sld = widgets.IntSlider(min=1, max=20, value=3, description='Max Depth:', style=style)
test_sld = widgets.FloatSlider(min=0.1, max=0.5, step=0.05, value=0.3, description='Test Size %:', style=style)

# Use interactive_output to link controls to the function without printing metadata
ui = widgets.HBox([criterion_sel, depth_sld, test_sld])
out = widgets.interactive_output(train_dt, {'criterion': criterion_sel, 'max_depth': depth_sld, 'test_size': test_sld})

display(ui, out)

## Performance report

In [None]:
# @title Confusion Matrix
from sklearn.metrics import confusion_matrix, classification_report
import pandas as pd

def evaluate_performance(criterion, max_depth, test_size):
    # 1. Setup & Train (Reusing our logic)
    X = df.drop(columns=['diagnosis',])
    y = df['diagnosis']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

    model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    # 2. Confusion Matrix Calculation
    cm = confusion_matrix(y_test, y_pred)

    # 3. Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

    # Heatmap
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax1,
                xticklabels=['Benign', 'Malignant'],
                yticklabels=['Benign', 'Malignant'])
    ax1.set_title('Confusion Matrix (Clinical Audit)', fontsize=14)
    ax1.set_xlabel('Predicted Diagnosis')
    ax1.set_ylabel('Actual Ground Truth')

    # Performance Report Table
    report_dict = classification_report(y_test, y_pred, output_dict=True)
    report_df = pd.DataFrame(report_dict).transpose().round(3)

    # Plotting the table on the second axis
    ax2.axis('off')
    table = ax2.table(cellText=report_df.values,
                      colLabels=report_df.columns,
                      rowLabels=report_df.index,
                      loc='center', cellLoc='center')
    table.set_fontsize(12)
    table.scale(1, 2)
    ax2.set_title('Metric Performance Table', fontsize=14)

    plt.tight_layout()
    plt.show()

# Link back to the same interactive controls for seamless auditing
interact_out = widgets.interactive_output(evaluate_performance,
                                          {'criterion': criterion_sel,
                                           'max_depth': depth_sld,
                                           'test_size': test_sld})
display(ui, interact_out)

In [None]:
#@title ROC Curve
from sklearn.metrics import roc_curve, auc

def plot_roc_curve(criterion, max_depth, test_size):
    # 1. Setup & Train
    X = df.drop(columns=['diagnosis'])
    y = df['diagnosis'].map({'M': 1, 'B': 0}) # Convert to binary for ROC
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

    model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
    model.fit(X_train, y_train)

    # 2. Get Probabilities (Important: ROC uses probabilities, not just labels)
    y_score = model.predict_proba(X_test)[:, 1]
    fpr, tpr, thresholds = roc_curve(y_test, y_score)
    roc_auc = auc(fpr, tpr)

    # 3. Plotting
    plt.figure(figsize=(5, 5))
    plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--') # The "Random Chance" line
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate (1 - Specificity)')
    plt.ylabel('True Positive Rate (Sensitivity)')
    plt.title('ROC Analysis: Breast Cancer Diagnostic Power')
    plt.legend(loc="lower right")
    plt.grid(alpha=0.3)
    plt.show()

# Linking to the same UI
roc_out = widgets.interactive_output(plot_roc_curve,
                                    {'criterion': criterion_sel,
                                     'max_depth': depth_sld,
                                     'test_size': test_sld})
display(ui, roc_out)

In [None]:
#@title Tree layout
from sklearn.tree import plot_tree

def visualize_clinical_tree(criterion, max_depth, test_size):
    # 1. Setup & Train (Standard Audit Logic)
    X = df.drop(columns=['diagnosis'])
    y = df['diagnosis']
    feature_names = X.columns.tolist()

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

    model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
    model.fit(X_train, y_train)

    # 2. Plotting the Flowchart
    plt.figure(figsize=(20, 10), dpi=100)
    plot_tree(model,
              feature_names=feature_names,
              class_names=['Benign', 'Malignant'],
              filled=True,
              rounded=True,
              fontsize=10,
              precision=2)

    plt.title(f"Diagnostic Logic Flow (Depth: {max_depth})", fontsize=16)
    plt.show()

# Connect to our dashboard UI
tree_out = widgets.interactive_output(visualize_clinical_tree,
                                     {'criterion': criterion_sel,
                                      'max_depth': depth_sld,
                                      'test_size': test_sld})
display(ui, tree_out)

In [None]:
#@title Feature Importance
import numpy as np

def plot_importance(criterion, max_depth, test_size):
    # 1. Train the model using the current slider values
    X = df.drop(columns=['diagnosis'])
    y = df['diagnosis']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

    model = DecisionTreeClassifier(criterion=criterion, max_depth=max_depth, random_state=42)
    model.fit(X_train, y_train)

    # 2. Extract Importances
    importances = model.feature_importances_
    indices = np.argsort(importances)[-10:]  # Show only top 10 for clarity

    # 3. Plotting
    plt.figure(figsize=(10, 6))
    plt.title('Clinical Weight: Top 10 Diagnostic Predictors', fontsize=10)
    plt.barh(range(len(indices)), importances[indices], color='#3498db', align='center')
    plt.yticks(range(len(indices)), [X.columns[i] for i in indices])
    plt.xlabel('Relative Importance (Gini Reduction)')
    plt.grid(axis='x', linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

# Link to UI
imp_out = widgets.interactive_output(plot_importance,
                                    {'criterion': criterion_sel,
                                     'max_depth': depth_sld,
                                     'test_size': test_sld})
display(ui, imp_out)

# Final Remarks

This exercise serves as a fundamental template for understanding how Machine Learning translates into Clinical Decision Support. While we have used the Wisconsin Breast Cancer dataset to build our "Engine," the logic applies to almost any diagnostic pathway in radiology and medicine.

üß¨ Summary of our Findings
* Algorithmic Transparency: We moved from a "Black Box" to a visible flowchart. The Decision Tree mimics the "If-Then" logic used by clinicians during differential diagnosis.

* The Power of Simplicity: We saw that a well-behaved dataset allows a simple classifier to achieve high accuracy without needing complex neural networks.

* Quality Control through Pruning: We demonstrated that "more depth" isn't always better. Restricting the tree (Pruning) ensures the model learns stable clinical patterns rather than memorizing individual patient "noise."

üèõÔ∏è Why this Dataset is the "Perfect Lab"
It is important to note that this specific exercise used a "Well-Behaved" dataset, which is rare in real-world clinical practice:

* Class Balance: The split between Benign and Malignant cases was relatively even (~60/40). In real screening environments, the "Disease" class might be less than 1%, requiring much more advanced "Class Weighting" techniques.

* Data Integrity: There were no missing values. In a typical hospital database, you would encounter missing labels, corrupted images, or incomplete patient histories that require significant "Data Cleaning" before a tree can be grown.

* Linear Separability: Many features in this dataset (like concave_points) have a very clear "Threshold" that separates the classes, which is why our ROC curve showed such a strong "Knee" early on.

* Feature Importance is specific to this dataset ‚Äî in a different patient population, the most important "predictor" might change.

