# Mathematical Formulation of CBM Evaluation Metrics

This section presents the mathematical formulation of the main metrics used to
evaluate Concept Bottleneck Models (CBMs).

The goal is to understand:
- The exact quantities being measured
- The probabilistic meaning of each metric
- How these metrics relate to causality, information flow, and interpretability

---

## 1. Notation and Probabilistic Setup

We assume access to a dataset:

$$
\mathcal{D} = \{(x_i, c_i, y_i)\}_{i=1}^N
$$

Where:
- $x \in \mathcal{X}$ is the input
- $c \in \mathcal{C}^K$ is a vector of $K$ ground-truth concepts
- $y \in \mathcal{Y}$ is the task label

A CBM is composed of two functions:

$$
\hat{c} = g(x)
$$

$$
\hat{y} = f(\hat{c})
$$

The joint predictive distribution factorizes as:

$$
p(y \mid x) = \sum_c p(y \mid c) \, p(c \mid x)
$$

This factorization is the core structural assumption behind CBMs.

---

## 2. Task Predictive Performance

### 2.1 Mathematical Definition

Task predictive performance measures the expected correctness of task predictions:

$$
\mathcal{L}_{\text{task}} =
\mathbb{E}_{(x,y) \sim \mathcal{D}}
\left[
\ell\big(y, f(g(x))\big)
\right]
$$

Where:
- $\ell(\cdot, \cdot)$ is a task loss function
  - 0â€“1 loss (accuracy)
  - cross-entropy
  - squared error

For accuracy, this becomes:

$$
\text{Acc}_{\text{task}} =
\mathbb{E}_{(x,y)}
\left[
\mathbf{1}\{f(g(x)) = y\}
\right]
$$

---

### 2.2 Interpretation

This metric estimates:

$$
\mathbb{E}_{x}
\left[
p(y = f(g(x)) \mid x)
\right]
$$

It measures whether the full CBM pipeline correctly maps inputs to task labels.

Importantly, this metric does **not** reveal:
- Whether concepts are meaningful
- Whether concepts are causally used
- Whether the model is interpretable

---

## 3. Concept Predictive Performance

### 3.1 Mathematical Definition

Concept predictive performance evaluates how accurately concepts are inferred from inputs.

For each concept $k$:

$$
\mathcal{L}_{\text{concept}}^{(k)} =
\mathbb{E}_{(x,c_k)}
\left[
\ell\big(c_k, g_k(x)\big)
\right]
$$

For binary concepts, ROC-AUC estimates:

$$
\mathbb{P}
\left(
g_k(x^+) > g_k(x^-)
\right)
$$

where:
- $x^+$ is sampled from $c_k = 1$
- $x^-$ is sampled from $c_k = 0$

Mean concept performance is:

$$
\overline{\mathcal{L}}_{\text{concept}} =
\frac{1}{K}
\sum_{k=1}^K
\mathcal{L}_{\text{concept}}^{(k)}
$$

---

### 3.2 Interpretation

Concept performance estimates:

$$
\mathbb{E}_{x}
\left[
p(c \mid x)
\right]
$$

It measures how reliably the model predicts human-interpretable attributes.

However, high concept accuracy does **not** imply:

$$
p(y \mid \hat{c}) \approx p(y \mid c)
$$

That is, accurate concepts do not guarantee correct task reasoning.

---

## 4. Intervention Effectiveness

Intervention effectiveness measures the **causal influence** of concepts on predictions.

This is the most CBM-specific metric.

---

### 4.1 Intervention Operator

Given a set of concept indices $I \subseteq \{1, \dots, K\}$, define the intervention:

$$
\hat{c}^{(I)}_k =
\begin{cases}
c_k & \text{if } k \in I \\
\hat{c}_k & \text{otherwise}
\end{cases}
$$

This corresponds to a **do-intervention**:

$$
\text{do}(c_k = c_k^{\text{true}})
$$

---

### 4.2 Post-Intervention Prediction

The task prediction after intervention is:

$$
\hat{y}^{(I)} = f(\hat{c}^{(I)})
$$

The post-intervention task loss is:

$$
\mathcal{L}_{\text{task}}^{(I)} =
\mathbb{E}
\left[
\ell\big(y, f(\hat{c}^{(I)})\big)
\right]
$$

---

### 4.3 Intervention Effectiveness Score

Intervention effectiveness is defined as the expected improvement:

$$
\Delta^{(I)} =
\mathcal{L}_{\text{task}} -
\mathcal{L}_{\text{task}}^{(I)}
$$

Averaging across intervention sets:

$$
\text{IE} =
\mathbb{E}_{I}
\left[
\Delta^{(I)}
\right]
$$

---

### 4.4 Interpretation

This metric estimates a **causal effect**:

$$
\mathbb{E}
\left[
y \mid \text{do}(c = c^{\text{true}})
\right]
-
\mathbb{E}
\left[
y \mid c = \hat{c}
\right]
$$

- Large IE: task predictions causally depend on concepts
- Small IE: concepts are weak or ignored

---

## 5. Concept Completeness

Concept completeness measures how much task-relevant information is preserved
when predictions are restricted to concepts.

---

### 5.1 Formal Definition

Let:
- $S_{\text{CBM}} = \mathbb{E}[\ell(y, f(g(x)))]$
- $S_{\text{BB}} = \mathbb{E}[\ell(y, h(x))]$

where $h(x)$ is a black-box model trained directly on inputs.

Concept completeness is:

$$
\text{Completeness} =
\frac{S_{\text{CBM}}}{S_{\text{BB}}}
$$

---

### 5.2 Information-Theoretic Interpretation

Completeness approximates the ratio:

$$
\frac{I(c; y)}{I(x; y)}
$$

where $I(\cdot;\cdot)$ denotes mutual information.

A low score indicates that the concept bottleneck discards task-relevant information.

---

## 6. Summary of Mathematical Roles

| Metric | Mathematical role |
|------|-------------------|
| Task performance | Estimates $p(y \mid x)$ |
| Concept performance | Estimates $p(c \mid x)$ |
| Intervention effectiveness | Estimates causal effect of $c$ on $y$ |
| Concept completeness | Estimates information sufficiency |

---

## 7. Final Perspective

CBM metrics decompose prediction quality into:
- Statistical accuracy
- Concept reliability
- Causal influence
- Information preservation

Only when all four align can a CBM be considered both accurate and interpretable.