# Cheat-Sheet


| Concept | Definition |
|--------|------------|
| Self-Information | $I(x) = -\log p(x)$ |
| Entropy | $H(X) = -\sum_{x \in \mathcal{X}} p(x)\log p(x)$ |
| Conditional Entropy | $H(Y \mid X) = -\sum_{x \in \mathcal{X}} p(x)\sum_{y \in \mathcal{Y}} p(y \mid x)\log p(y \mid x)$ |
| Mutual Information (written $I(X;Y)$ in the literature) | $H(X \Cap Y) = H(X) - H(X \mid Y)$ |
| Cross-Entropy | $H(p \,\|\, q) = -\sum_x p(x)\log q(x)$ |
| KL Divergence | $D_{KL}(p \,\|\, q) = \sum_{x} p(x)\log \frac{p(x)}{q(x)}$ |


## Intuitive interpretation

- Entropy $H(p)$ → uncertainty of a distribution
- Mutual information $ H(X \Cap Y)$ → shared information between variables
- KL divergence $D_{KL}(p\|q)$ → how *wrong* $q$ is as a model for $p$

KL divergence measures the expected additional information needed to represent
samples from $p$ when assuming the distribution is $q$ instead of $p$.

| Property | Expression |
|--------|------------|
| Entropy bounds | $0 \le H(X) \le \log(\|\mathcal{X}\|)$, with equalities iff $X$ is constant or uniform |
| Conditional entropy bounds | $0 \le H(Y \mid X) \le H(Y)$ with equalities iff functional dependence / independence |
| Mutual information bounds | $0 \le  H(X \Cap Y) \le \min\{H(X),H(Y)\}$ with equalities iff independence /functional dependence|
| Alternative form | $H(Y \mid X) = H(X,Y) - H(X)$ |
| Equivalent form | $ H(X,Y) = H(X) + H(Y) -  H(X \Cap Y)$ |
| KL-divergence form | $ H(X \Cap Y) = D_{KL}\!\left( p_{(X,Y)} \,\|\|\, p_X p_Y \right)$ |
| Main relation | $H(p \,\|\|\, q) = D_{KL}(p \,\|\|\, q) + H(p)$ |

**Informal set–measure analogy for information quantities**

Let $\mathcal{X}$ and $\mathcal{Y}$ denote the “uncertainty regions” of
random variables $X$ and $Y$ (as in information diagrams).  
Then entropy may be viewed as a measure $\mu$ on these regions:

- Joint entropy:
  $$H(X,Y) = \mu(\mathcal{X} \cup \mathcal{Y})$$

- Conditional entropy:
  $$H(Y \mid X) = \mu(\mathcal{Y} \setminus \mathcal{X})$$

- Mutual information:
  $$H(X \Cap Y) = \mu(\mathcal{X} \cap \mathcal{Y})$$

This notation captures the Venn-diagram intuition while staying consistent
with information theory.