## Entropy and information 
- Kardar 2.7


Previously, we defined $S \equiv \ln [\underbrace{\# \text { of configurations }}_{\Omega}](*)$. This definition is appropriate if all configurations are equally likely.

Example (Coin flipping): given $N_{+}$ heads, the number of possible sequences with this number of heads is $\Omega\left(N_{+}\right)=\left(\begin{array}{c}N \\ N_{+}\end{array}\right)=\frac{N !}{N_{+}! N_{-}!}$. Thus,

$$
\begin{align}
S\left(N_{+}\right) &= N\ln N - N - N_+\ln N_+ + N_+ - N_-\ln N_- + N_- \\
                    &= (N_+ + N_-) \ln N - N_+\ln N_+ - N_-\ln N_-  \\
                    &= -N_{+} \ln(N_{+} / N) - N_{-} \ln(N_{-} / N).
\end{align}
$$


In general $N_+$ is not fixed but itself a random variable with some distribution $P\left(N_{+}\right)$, so that the entropy too is a random variable, with $P_{S}(S) d S=P\left(N_{+}\right) d N_{+} $.

Nonetheless, from last lecture, in the thermodynamic limit we know that $P\left(N_{+}\right)$ is sharply peaked, with

$$
\begin{aligned}
& \begin{array}{c}
N_{+} \rightarrow\left\langle N_{+}\right\rangle=p N, \quad N_{-}=\left\langle N_{+}\right\rangle=q N . \\
\end{array}
\end{aligned}
$$

Thus,

$$S \to -N (p \ln p+q \ln q).$$

$\Rightarrow$ In the thermodynamic limit $(N \rightarrow \infty)$, we can *only* observe "typical" configurations $\left(N_{+}=p N ; N_{-}=q N\right)$; there are $e^{S}$ of them and all of them are equally likely, $P(\{\sigma_i\}) = 1/e^{S} = p^{N_+} q^{N_-}$.

These observations are easily generalized to a dice with $M$ faces. If rolling the dice results in face $i$ with probability $p_{i}$, we expect face $i$ to show up exactly $N p_i$ times in the thermodynamic limit, $N \rightarrow \infty$. The number of typical configurations is therefore

$$
\begin{aligned}
\Omega & \equiv\text{nr. of config's}=\frac{N !}{\left(N p_{1}\right) !\left(N p_{2}\right) ! \ldots\left(N p_{n}\right) !} \\
& S\equiv \ln_{2}(\Omega)=N[\ln (N)-1]-\sum_{i}\left(N p_{i}\right)\left[\ln N p_{i}-1\right] \\
& =-N \sum_{i=1}^M p_{i} \ln p_{i}.\left({*}\right)
\end{aligned}
$$

In physics, $(*)$ arises as 

- the entropy change when $M$ components are mixed together. It is therefore called "entropy of mixing".
- the entropy of system of $N$ non-interacting subsystems. (In practice, it is enough if the subsystems are weakly interacting. For example, we can subdivide a $1 m^3$ cube of water into $N=10^6$ subsystems of $cm^3-$cubes of water. Even though there is some interaction on the interface between the small cubes, the interaction energies are negligible compared to the relevant bulk energies. In practice, it depends on a correlation length how strongly we can subdivide a given macroscopic system.) We further assume that each subsystem can be in one of $M$ states following the probability distribution $\{p_i\}$, i.e. a subsystem is in state $i$ with probability $p_i$. Then $s=-\sum_{i=1}^M p_{i} \ln p_{i}$ is the Gibbs entropy of each of the subsystems, and $S=N s$ is the total entropy of the system. 

#### Interpretation as lack of knowledge

Shannon realized that the number of possible configurations consistent with our macroscopic constraints can be viewed as a *lack of knowledge* about the current microstate.

Examples: 

- Suppose we flip coin $N$ times and we know $N_+$. Then, the actual microstate is one of $e^{S\left(N_{+}\right)}$ micro-states.
- If we don't know $N_+$, respectively $N_+$ is not fixed? $\Rightarrow e^{S}$ typical microstates, $S=-N \sum_{i} p_{i} \ln p_{i}$. For a coin: $S=-N (p \ln p +q \ln q)$.


#### Consequences:

##### Coding 

Suppose we end up measuring the micro-state of our system of $N$ coin flips or dice tosses, how many bits do we need to store this information?

For $N \rightarrow \infty$, simply enumerate only the $e^{S}$ typical microstates, all having *same* probabilities (namely $1/e^{S}$). This needs $\log_{2}\left(e^{s}\right)=S \cdot \log_{2}(e)$ bits. (of course, this is not a proof, but it works because of CLT induced measure concentration.)

Shannon thus gave an operational meaning to $S$ in terms of "information" and the resources required to communicate an ensemble of messages, where each message represents a sequence of dice throws. Each symbol of the message represents a discrete random variable $X$, attaining a value $x_i$ with probability $p_i$. To simplify notation Shannon introduced the information entropy a discrete random variable $X$:

$$
H(X)\equiv -\left\langle\log_2 p_{i}\right\rangle=-\sum_{i} p_{i} \log_2 p_{i} \;.
$$

The number of bits needed to convey a string of $N$ such random numbers is $N H(X)$ as $N\to \infty$. In our original notation, the entropy $S$ of a sequence of $N$ dice throws is given by $S=N H(\{p\}) \ln(2)$. Notice that the binary log appears in the Information entropy because Shannon cared about bits. 