# CBM Metrics – Quantitative Evaluation (Research Notes)

This notebook provides a deep and practical explanation of the quantitative metrics
used to evaluate Concept Bottleneck Models (CBMs).

CBMs must be evaluated beyond standard accuracy. In addition to performance, we must
measure interpretability, causal behavior, and whether the chosen concepts preserve
task-relevant information.

These notes explain:
- What each metric measures
- Why the metric exists
- How to interpret its values
- What failure modes it reveals

---

## 1. Concept Bottleneck Models – Notation and Setup

A Concept Bottleneck Model decomposes prediction into two explicit stages.

First, the input is mapped to a set of human-interpretable concepts:

$$
\hat{c} = g(x)
$$

Then, the predicted concepts are used to make the final task prediction:

$$
\hat{y} = f(\hat{c})
$$

Where:
- $x$ is the input (e.g., image, signal, text)
- $c \in \mathbb{R}^K$ is a vector of $K$ concepts
- $y$ is the task label
- $g$ is the concept predictor
- $f$ is the task predictor

We assume access to ground-truth annotations:
- True concepts $c$
- True labels $y$

The goal of CBMs is to enforce that predictions are mediated through concepts, enabling
interpretability and human intervention.

---

## 2. Predictive Performance Metrics

Predictive performance answers the question:

**How accurate is the CBM?**

In CBMs, accuracy must be evaluated at two distinct levels:
1. Task-level performance
2. Concept-level performance

Evaluating only one of these is insufficient.

---

### 2.1 Task Predictive Performance

Task predictive performance evaluates how well the CBM predicts the final task output:

$$
\hat{y} = f(g(x))
$$

This is analogous to standard supervised learning evaluation.

Common metrics include:
- Accuracy
- ROC-AUC
- F1-score
- Mean squared error (for regression)

#### Why this metric matters

A CBM that performs poorly on the task is not useful, regardless of interpretability.
Task performance is therefore a **necessary condition**.

#### Key limitation

High task accuracy does **not** imply:
- Concepts are meaningful
- Concepts are causally used
- Human intervention is effective

A CBM may behave like a black box despite strong accuracy.

---

### 2.2 Concept Predictive Performance

Concept predictive performance evaluates how accurately each concept is predicted:

$$
\hat{c}_k = g_k(x), \quad k = 1, \dots, K
$$

Each concept is treated as an independent prediction problem.

Typical metrics include:
- ROC-AUC per concept
- Accuracy per concept
- Mean ROC-AUC across concepts

#### Why this metric matters

Concepts are the interface between the model and humans.
If concept predictions are unreliable:
- Human trust is undermined
- Interventions become ineffective
- Interpretability claims weaken

#### Critical insight

High concept accuracy does **not** guarantee:
- Correct task predictions
- Meaningful concept usage

Concepts may be predicted well but combined poorly.

---

## 3. Intervention Effectiveness

Intervention effectiveness is the most important CBM-specific metric.

It evaluates whether correcting a concept actually changes the model’s decision.

This directly tests the **causal role of concepts**.

---

### 3.1 Motivation

A core promise of CBMs is human-in-the-loop correction.

Example:
- Model predicts "no pedestrian" and outputs "go"
- A human corrects the concept to "pedestrian present"
- The model should update its prediction to "stop"

If the prediction does not change, the concept bottleneck is ineffective.

---

### 3.2 Definition of an Intervention

An intervention replaces one or more predicted concepts with their ground-truth values.

For a set of concept indices $I \subseteq \{1, \dots, K\}$:

$$
\hat{c}^{(I)}_k =
\begin{cases}
c_k & \text{if } k \in I \\
\hat{c}_k & \text{otherwise}
\end{cases}
$$

The task prediction after intervention is:

$$
\hat{y}_{\text{intervene}} = f(\hat{c}^{(I)})
$$

This simulates a human correcting the model’s internal reasoning.

---

### 3.3 Measuring Intervention Effectiveness

The standard procedure is:
1. Compute baseline task performance using predicted concepts
2. Perform interventions on selected concepts
3. Recompute task performance
4. Measure the improvement

Intervention effectiveness quantifies how much task performance improves after correction.

#### Interpretation

- Large improvement: concepts are causally important
- Small improvement: concepts are weakly used or ignored

#### Important nuance

Interventions may involve:
- Single concepts
- Subsets of concepts
- All concepts

Results are typically averaged across samples and intervention sets.

---

## 4. Concept Completeness

Concept completeness evaluates whether the chosen concept set contains sufficient
information to solve the task.

This metric focuses on **concept design**, not model architecture.

---

### 4.1 Motivation

Using concepts restricts the information flow.
If concepts are poorly chosen, critical task information may be lost.

Concept completeness asks:

**How much predictive power is preserved by using only concepts?**

---

### 4.2 Formal Definition

Let:
- $S_{\text{CBM}}$ be the task performance of the CBM
- $S_{\text{BB}}$ be the task performance of a black-box model trained on raw inputs

Concept completeness is defined as:

$$
\text{Completeness} =
\frac{S_{\text{CBM}}}{S_{\text{BB}}}
$$

---

### 4.3 Interpretation of Completeness

| Value | Interpretation |
|------|---------------|
| $\approx 1.0$ | Concepts are sufficient |
| $< 1.0$ | Concepts miss important information |
| $> 1.0$ | CBM generalizes better than black-box |

#### Important clarification

Completeness evaluates the **concept vocabulary**, not:
- Concept prediction accuracy
- Training procedure
- Task model capacity

Low completeness indicates missing or insufficient concepts.

---

## 5. Summary of CBM Metrics

| Metric | What it measures | Why it matters |
|------|------------------|---------------|
| Task performance | End-task accuracy | Baseline usefulness |
| Concept performance | Concept prediction quality | Reliability of explanations |
| Intervention effectiveness | Causal impact of concepts | True interpretability |
| Concept completeness | Information sufficiency | Concept design quality |

---

## 6. Final Takeaways

- Accuracy alone is insufficient for CBM evaluation
- Concept accuracy can be misleading
- Intervention effectiveness tests causal reliance
- Concept completeness evaluates concept sufficiency

A CBM is only interpretable if **all metrics align**.
