# Shannon's Information Measures

There are four types of Shannon's Information Measures:

- Entropy
- Conditional entropy
- Mutual information
- Conditional mutual information

**Definition 2.13** The entropy $H(X)$ of a random variable $X$ is defined as

  $$H(X) = -\sum_x\, p(x)\,log\, p(x)$$
  
- Convention: summation is taken over $S_X$
- When the base of the logarithm is $\alpha$, write $H(X)$ as $H_{\alpha}(X)$.
- Entropy measures the uncertainty of a discrete random variable.
- The unit for entropy is:
  $$\text{bit} \qquad \text{if}\, \alpha=2\\
  \text{nat} \qquad \text{if}\, \alpha=e\\
  D\text{-it} \qquad \text{if}\, \alpha=D$$
- A bit in information theory is **different** from a bit in computer science.

**Remark** $H(X)$ depends only on the distribution of $X$ but not on the actual values taken by $X$, hence also write $H(p_X)$.

**Example** Let $X$ and $Y$ be random variable with $\mathcal{X} = \mathcal{Y} = \{0,1\}$, and let

  $$p_X(0) = 0.3,\qquad p_X(1) = 0.7$$

and

  $$p_Y(0) = 0.7,\qquad p(1) = 0.3$$
  
Although $P_X \neq P_Y$, $H(X)=H(Y)$.

## Entropy as Expectation

- Convention

  $$E_g(X) = \sum_x p(x) g(x)$$
  
  where summation is over $S_X$.
  
- Linearity

  $$E[f(X) + g(X)] = Ef(X) + Eg(X)$$
  
- Can write

  $$H(X) = -Elogp(X) = -\sum_xp(x)\,log\,p(x)$$
  
- In probability theory, when $E_g(X)$ is considered, usually $g(x)$ depends only on the value of $x$ but not on $p(x)$.

## Binary Entropy Function

- For $0 \leq \gamma \leq 1$, define the binary entropy function

  $$h_b(\gamma) = -\gamma\, log\, \gamma - (1-\gamma)log\, (1-\gamma)$$
  
  with the convention $0\,log\,0 = 0$, as by L'Hospital rule:
  
  $$lim_{\alpha \rightarrow 0} \alpha\, log\, \alpha = 0$$
  
- For $X \in \{\gamma, 1-\gamma\}$,

  $$H(X)=h_b(\gamma)$$
  
- $h_b(\gamma)$ achieves the maximum value 1 when $\gamma = \frac{1}{2}$

![Binary-Entropy](images/binary-entropy.png)

## Interpretation

Consider tossing a coin with

  $$p(H) = \gamma\, \text{and}\, p(T)=1-\gamma$$
  
Then $h_b{\gamma}$ measures the amount of uncertainty in the outcome of the toss.

  - When $\gamma = 0 \, \text{or}\, 1$, the coin is *deterministic* and $h_b(\gamma)=0$. This is consistent with our intuition because for such cases we need 0 bit to convey the outcome.
  - When $\gamma=0.5$, the coin is *fair* and $h_b(\gamma)=1$. This is consistent with our intuition because we need 1 bit to convey the outcome.
  - When $\gamma \notin \{0,0.5,1\}$, $0<h_b(\gamma)<1$, i.e., the uncertainty about the outcome is somewhere between 0 and 1 bit.
  - This interpretation will be justified in terms of the source coding theorem.

**Definition 2.14** The joint entropy $H(X,Y)$ of a pair of random variables $X$ and $Y$ is defined as

  $$H(X,Y)=-\sum_{x,y}\, p(x,y)\, log\, p(x,y) = -E\, log\, p(X,Y)$$

**Definition 2.15** For random variables $X$ and $Y$, the conditional entropy $Y$ given $X$ is defined as:

  $$H(Y|X)=-\sum_{x,y}\, p(x,y)\, log\, p(y|x)=-E\,log\,p(Y|X)$$

- Write,

  $$\eqalign{
    H(Y|X) &= -\sum_{z,y}p(x,y)\,log\,p(y|x)\\
           &= -\sum_z\sum_yp(x)p(y|x)\,log\,p(y|x)\\
           &= \sum_zp(x)\big[-\sum_yp(y|x)\,log\,p(y|x)\big]
  }$$
  
- The inner sum is the entropy of $Y$ conditioning on a fixed $x \in S_X$.

- Denoting the inner sum by $H(Y|X=x)$, we have:

  $$H(Y|X)=\sum_xp(x)H(Y|X=x)$$

- Similarly,

  $$H(Y|X,Z)=\sum_zp(z)H(Y|X,Z=z)$$
  
  where

  $$H(Y|X,Z=z)=-\sum_{x,y}p(x,y|z)\,log\,p(y|x,z)$$

**Proposition 2.16**

  $$H(X,Y)=H(X)+H(Y|X)$$
  
and

  $$H(X,Y)=H(Y)+H(X|Y)$$

**Proof**

Consider

  $$\eqalign{
    H(X,Y) &= -E\,log\,p(X,Y)\\
           &= -E\,log[p(X)p(Y|X)]\\
           &= -E\,log\,p(X)-E\,log\,p(Y|X)\\
           &= H(X) + H(Y|X)
  }$$

**Definition 2.17** For random variables $X$ and $Y$, the mutual information between $X$ and $Y$ is defined as:

  $$I(X;Y)=\sum_{x,y}p(x,y)\,log\,\frac{p(x,y)}{p(x)p(y)}=E\,log\.\frac{p(X,Y)}{p(X)p(Y)}$$
  
**Remark** $I(X;Y)$ is symmetrical in $X$ and $Y$.

**Remark** Alternatively, we can write

  $$I(X;Y)=\sum_{x,y}p(x,y)\,log\,\frac{p(x,y)}{p(x)p(y)}=\sum_{x,y}p(x,y)\,log\,\frac{p(x|y)}{p(x)}=E\,log\,\frac{p(X|Y)}{p(X)}$$
  
However, it is not apparent from this form that $I(X;Y)$ is simmetrical in $X$ and $Y$.

**Proposition 2.18** The mutual information between a random variable $X$ and itself is equal to the entropy of $X$, i.e., $I(X;X)=H(X)$

**Proof**

  $$\eqalign{
    I(X;X) &= E\,log\,\frac{p(X,X)}{p(X)p(X)}\\
           &= E\,log\,\frac{p(X)}{p(X)p(X)}\\
           &= -E\,log\,p(X)\\
           &= H(X)
  }$$
  
**Remark** The entropy of $X$ is sometimes called the *self-information* of $X$.

**Proposition 2.19**

  $$I(X;Y)=H(X)-H(X|Y)\\
  I(X;Y)=H(Y)-H(Y|X)$$

and

  $$I(X;Y)=H(X)+H(Y)-H(X,Y)$$
  
provided that all the entropies and conditional entropies are finitie.

**Remark**

  $$I(X;Y)=H(X)+H(Y)-H(X,Y)$$

is analogous to

  $$\mu(A\cap B)=\mu(A)+\mu(B)-\mu(A\cup B)$$
  
where $\mu$ is a set-additive function and $A$ and $B$ are sets.

## Information Diagram

![information-diagram](images/information-diagram.png)

**Definition 2.20** For random variables $X,\,Y\,\text{and}\,Z$, the mutual information between $X$ and $Y$ conditioning on $Z$ is defined as:

  $$I(X;Y|Z)=\sum_{x,y,z}p(x,y,z)\,log\,\frac{p(x,y|z)}{p(x|z)p(y|z)}=E\,log\,\frac{p(X,Y|Z)}{p(X|Z)p(Y|Z)}$$
  
**Remark** $I(X;Y|Z)$ is simmetrical in $X$ and $Y$.

Similar to entropy, we have

  $$I(X;Y|Z)=\sum_xp(z)I(X;Y|Z=z)$$
  
where

  $$I(X;Y|Z=z)=\sum_{x,y}p(x,y|z)\,log\,\frac{p(x,y|z)}{p(x|z)p(y|z)}$$

**Proposition 2.21** The mutual information between a random variable $X$ and itself conditioning on a random variable $Z$ is equal to the conditional entropy of $X$ given $Z$, i.e., $I(X;X|Z)=H(X|Z)$.

**Proposition 2.22**

  $$I(X;Y|Z) = H(X|Z)-H(X|Y,Z)\\
  I(X;Y|Z) = H(Y|Z)-H(Y|X,Z)$$
  
and

  $$I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z)$$
  
provided that all the conditional entropies are finite.

**Remark** All Shannon's information measures are special cases of conditional mutual iformation. Let $\Phi$ denote a random variable that takes a constant value. Then

  $$\eqalign{
    H(X) &= I(X;X|\Phi)\\
    H(X|Z) &= I(X;X|Z)\\
    I(X;Y) &= I(X;Y|\Phi)
  }$$