
# Concept Bottleneck Models â€” Probabilistic Inference View

## 1. From Causal Structure to Inference

In the causal view, we assumed the structure:

$$
X \rightarrow C \rightarrow Y
$$

This notebook focuses on **how inference is performed** in Concept Bottleneck Models (CBMs), and what approximations are made in practice.



## 2. Exact Probabilistic Inference

Given the CBM factorization:

$$
p(Y \mid X) = \sum_C p(Y \mid C) p(C \mid X)
$$

This equation requires summing over **all possible concept configurations**.

If concepts are binary and there are $K$ concepts:

$$
|\mathcal{C}| = 2^K
$$

Exact inference quickly becomes intractable.



## 3. Concept Posterior Distribution

The concept encoder models the posterior:

$$
p(C \mid X)
$$

In practice, this is approximated by a neural network:

$$
\hat{C} = g_\theta(X)
$$

This network performs **amortized inference**: instead of solving a new inference problem for each input, it learns a global mapping from $X$ to $C$.



## 4. Deterministic Approximation

Rather than marginalizing over all $C$, CBMs use a point estimate:

$$
C \approx \hat{C}
$$

Thus:

$$
p(Y \mid X) \approx p(Y \mid C = \hat{C})
$$

This leads to the standard CBM prediction:

$$
\hat{Y} = f_\phi(g_\theta(X))
$$

This approximation trades exact probabilistic reasoning for computational efficiency.



## 5. Stochastic vs Deterministic Concepts

Two inference strategies exist:

### Deterministic CBMs
- Use $\hat{C} = \mathbb{E}[C \mid X]$
- Simple and efficient
- Most common in practice

### Stochastic CBMs
- Sample $C \sim p(C \mid X)$
- Approximate the sum via Monte Carlo
- More faithful to probabilistic modeling but harder to train



## 6. Relation to Variational Inference

CBMs resemble variational models:

- Concepts act as latent variables
- The encoder approximates $p(C \mid X)$
- The predictor approximates $p(Y \mid C)$

However, unlike VAEs:
- Concepts are **observed and supervised**
- There is no KL regularization term
- Latent variables are semantically meaningful



## 7. Bias Introduced by Point Estimates

Using $\hat{C}$ instead of full marginalization introduces bias:

$$
p(Y \mid X) \neq p(Y \mid \hat{C})
$$

This bias increases when:
- Concept uncertainty is high
- Concepts are poorly predicted
- The decision boundary is nonlinear

This explains why CBMs are sensitive to concept quality.



## 8. When Exact Inference Matters

Exact or stochastic inference is important when:
- Concepts are uncertain
- Downstream decisions are sensitive to small changes
- Safety-critical applications require calibrated uncertainty

In such cases, stochastic CBMs or Bayesian variants are preferable.



## 9. Key Takeaways

- CBMs rely on a probabilistic factorization
- Exact inference is intractable for many concepts
- Practical CBMs use deterministic amortized inference
- This introduces bias but improves scalability
- Concept uncertainty plays a central role in performance
