I. Analytical & Structural Simplifications

(Reduce complexity by exploiting structure)

1. Factorization & Conditional Independence

Bayesian networks  
Markov random fields  
Graphical model decomposition  
Chain rule factorization  

Key idea:

$$
p(x_1, \ldots, x_n) = \prod_i p\left(x_i \mid \text{parents}(x_i)\right)
$$

2. Sufficient Statistics

Exponential family models  
Dimension reduction without loss of information  

$$
p(x \mid \theta) = h(x)\exp\left(\eta(\theta)^{\top} T(x) - A(\theta)\right)
$$

3. Conjugate Priors

Closed-form posteriors  
Avoid numerical integration  

Examples:  
Beta–Binomial  
Dirichlet–Multinomial  
Gaussian–Gaussian  

4. Symmetry & Invariance Exploitation

Exchangeability  
Stationarity  
Rotational / translational invariance  

Example: de Finetti’s theorem  

---

II. Approximation of Distributions

(Replace the true distribution with a tractable surrogate)

5. Moment-Based Approximations

Mean-field approximations  
Moment closure methods  
Gaussian approximations via mean and covariance  

6. Variational Inference (VI)

Replace intractable posterior  

$$
p(z \mid x)
$$

with tractable  

$$
q(z)
$$

Optimization objective:

$$
\min_{q \in \mathcal{Q}} \mathrm{KL}\big(q(z) \,\|\, p(z \mid x)\big)
$$

Variants:  
Mean-field VI  
Structured VI  
Black-box VI  
Stochastic VI  
Amortized VI (VAEs)  

7. Laplace Approximation

Second-order Taylor expansion around MAP:

$$
p(\theta \mid x) \approx \mathcal{N}\left(\hat{\theta}, H^{-1}\right)
$$

8. Expectation Propagation (EP)

Local moment matching  
Replaces factors with Gaussian approximations  

9. Saddle-Point & Asymptotic Approximations

Large-sample limits  
Stationary phase methods  
WKB approximations  

---

III. Sampling-Based Methods

(Avoid computing distributions explicitly)

10. Monte Carlo Integration

Estimate expectations directly:

$$
\mathbb{E}[f(X)] \approx \frac{1}{N}\sum_{i=1}^{N} f(x_i)
$$

11. Markov Chain Monte Carlo (MCMC)

Circumvent normalization constants entirely.

Key families:  
Metropolis–Hastings  
Gibbs sampling  
Hamiltonian Monte Carlo (HMC)  
Langevin dynamics  
Slice sampling  

12. Sequential Monte Carlo (SMC)

Particle filters  
Annealed importance sampling  
Tempered transitions  

13. Importance Sampling

Reweight samples from easier proposal $q(x)$:

$$
\mathbb{E}_p[f] = \mathbb{E}_q\left[f(x)\frac{p(x)}{q(x)}\right]
$$

14. Pseudo-Marginal Methods

Use unbiased estimators of likelihoods  
Enables MCMC without exact likelihoods  

---

IV. Transformational Techniques

(Change the problem representation)

15. Change of Variables & Normalizing Flows

Invertible transformations:

$$
z = f(x), \quad p(x) = p(z)\left|\det J_f(x)\right|
$$

Examples:  
RealNVP  
Glow  
Neural spline flows  

16. Data Augmentation

Introduce latent variables to simplify conditionals.

Examples:  
EM algorithm  
Polya–Gamma augmentation  
Auxiliary variable methods  

17. Latent Variable Models

Mixture models  
Hidden Markov Models  
State-space models  

---

V. Optimization-Driven Surrogates

(Convert probabilistic inference into optimization)

18. MAP Estimation

Ignore normalization:

$$
\arg\max_{\theta} p(\theta \mid x)
$$

19. Penalized Likelihood & Regularization

Lasso  
Ridge  
Elastic net  

Bayesian interpretation: implicit priors  

20. Score Matching & Contrastive Divergence

Avoid partition function computation.

Used in:  
Energy-based models  
Restricted Boltzmann Machines  

---

VI. Information-Theoretic Methods

(Replace probability with divergence minimization)

21. Entropy Maximization

MaxEnt principle  
Leads to exponential family forms  

22. f-Divergence Minimization

KL  
Jensen–Shannon  
Wasserstein distance  

Used in:  
GANs  
Variational objectives  

---

VII. Stochastic Differential & Continuous Limits

(Model distributions via dynamics)

23. Stochastic Differential Equations (SDEs)

Langevin diffusion  
Fokker–Planck equations  

Stationary distribution:

$$
dX_t = \nabla \log p(X_t)\,dt + \sqrt{2}\,dW_t
$$

24. Diffusion Models

Score-based generative models  
Reverse-time SDEs  

---

VIII. Discretization & Numerical Schemes

(Approximate continuous objects numerically)

25. Quadrature & Numerical Integration

Gaussian quadrature  
Sparse grids  
Monte Carlo quadrature  

26. Grid-Based & Histogram Methods

Curse of dimensionality limits applicability  

---

IX. Model Reduction & Coarse-Graining

(Reduce state space)

27. State Aggregation

Lumpability in Markov chains  
Reduced order models  

28. Projection Methods

Principal component projections  
Reduced sufficient statistics  

---

X. Hybrid & Modern AI-Driven Approaches

(Learn the distribution implicitly)

29. Implicit Generative Models

GANs  
Energy-based neural models  

No explicit likelihood needed.  

30. Amortized Inference

Neural networks learn inference maps  
Used in VAEs and simulators  

31. Likelihood-Free Inference (ABC)

Approximate Bayesian Computation:

$$
p\big(\theta \mid \rho(S(x), S(x^\ast)) < \varepsilon\big)
$$

---

XI. Philosophical Meta-Strategies

(How experts think about intractability)

32. Replace Exactness with Guarantees

Bounds instead of values  
Concentration inequalities  
PAC-Bayesian bounds  

33. Replace Distribution with Expectations

Predictive distributions replaced by moments  
Risk minimization frameworks  

---

XII. Summary Table (Mental Model)

| Strategy             | Core Idea                         |
|----------------------|-----------------------------------|
| Analytical           | Exploit structure                 |
| Approximation        | Replace distribution              |
| Sampling             | Avoid density evaluation           |
| Transformation       | Change variables                  |
| Optimization         | Ignore normalization              |
| Information theory   | Minimize divergence               |
| Dynamics             | Simulate stationary laws           |
| Reduction            | Shrink state space                |
| Learning             | Learn distribution implicitly     |

Final Expert Insight

Intractability is not a failure; it is the signal that probability must be approached indirectly.  
Every successful probabilistic method ever devised sidesteps the distribution rather than confronting it head-on.
