## Classical information

(*** This section is not yet complete.)

As in the previous lesson, we will begin with a short discussion of classical information.
Once again, the probabilistic and quantum descriptions are very much analogous at a mathematical level, and recognizing how the mathematics works in the familiar setting of classical information is helpful in understanding why
quantum information is described as it is.

### Classical state sets

Let us begin with *classical state sets* of multiple systems.
For simplicity and clarity we will begin by discussing just two systems, and then generalize this discussion to more than two systems.

Specifically, let us suppose that $\mathsf{X}$ is a system having classical state set $\Sigma$ and $\mathsf{Y}$ is a second system having classical state set $\Gamma$.
As in the previous lesson, because we have referred to these sets as *classical state sets*, we assume that $\Sigma$ and $\Gamma$ are finite and nonempty sets.
It could be that $\Sigma = \Gamma$, but this is not required — and, in any case, it will be convenient for the discussion that follows to use different names to refer to these sets.

Imagine that these two systems are placed side-by-side, with $\mathsf{X}$ on the left and $\mathsf{Y}$ on the right, and viewed together as if they form a single system, which will be denoted by $(\mathsf{X},\mathsf{Y})$.
(The notation $\mathsf{XY}$ may also be used if it is more convenient.)
One may then ask: What is the classical state set of this single, joint system $(\mathsf{X},\mathsf{Y})$?

The answer is that the classical state set of $(\mathsf{X},\mathsf{Y})$ is the *Cartesian product* of $\Sigma$ and $\Gamma$, which is the set defined as

\begin{equation}
  \label{equation1}
  \Sigma\times\Gamma = \bigl\{(a,b)\,:\,a\in\Sigma\;\text{and}\;b\in\Gamma\bigr\}.
\end{equation}

To say that $(\mathsf{X},\mathsf{Y})$ is in the classical state $(a,b)\in\Sigma\times\Gamma$ means that $\mathsf{X}$ is in the classical state $a\in\Sigma$ and $\mathsf{Y}$ is in the classical state $b\in\Gamma$;
and if the classical state of $\mathsf{X}$ is $a\in\Sigma$ and the classical state of $\mathsf{Y}$ is $b\in\Gamma$, then the state of the pair $(\mathsf{X},\mathsf{Y})$ is $(a,b)$.
That is, the Cartesian product is precisely the mathematical notion that captures the situation at hand.

For more than two systems, the situation generalizes in a natural way.
For instance, suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, for any positive integer $n$.
The classical state set of the $n$-tuple $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$, viewed as a single system, is then the Cartesian product

\begin{equation}
  \Sigma_1\times\cdots\times\Sigma_n
  = \bigl\{(a_1,\ldots,a_n)\,:\, a_1\in\Sigma_1,\:\ldots,\:a_n\in\Sigma_n\bigr\}.
\end{equation}

It is often convenient to write a classical state of the form $(a_1,\ldots,a_n)$ as a *string* $a_1\cdots a_n$ for the sake of brevity, particularly when the classical state sets $\Sigma_1,\ldots,\Sigma_n$ are associated with sets of *symbols* or *characters*.
Indeed, in theoretical computer science, the notion of a string is formalized in mathematical terms through Cartesian products.
In that context, it is typical that the term *alphabet* is used rather than classical state set, but the definition (i.e., a finite and nonempty set) is precisely the same.

For example, suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_{10}$ are bits, so that the classical state sets of these systems are all the same:

$$
  \Sigma_1 = \cdots = \Sigma_{10} = \{0,1\}.
$$

There are then $2^{10} = 1024$ classical states of the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_{10})$, which are the elements of the set

$$
  \Sigma_1\times\cdots\times\Sigma_{10} = \{0,1\}^{10}.
$$

Written as strings, these classical states look like this:

$$
  \begin{array}{c}
  0000000000\\
  0000000001\\
  0000000010\\
  0000000011\\
  0000000100\\
  \vdots\\[1mm]
  1111111111
  \end{array}
$$

For the classical state $0001001000$, for instance, we see that $\mathsf{X}_4$ and $\mathsf{X}_7$ are in the state $1$, while all of the other systems are in the state $0$.

### Probabilistic states

As was discussed in the previous lesson, a probabilistic state of a system associates a probability with each classical state of that system.
Thus, a probabilistic state of multiple systems together — viewed collectively as if they form a single system — must associate a probability with each element of the Cartesian product of the classical state sets of the individual
systems.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are both bits, so that their corresponding classical state sets are given by $\Sigma = \{0,1\}$ and $\Gamma = \{0,1\}$, respectively, we may have a probabilistic state like this:

\begin{equation}
  \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,0)\bigr) = \frac{1}{2}
\end{equation}

$$
  \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,1)\bigr) = 0
$$

$$
  \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,0)\bigr) = 0
$$

$$
  \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,1)\bigr) = \frac{1}{2}
$$

This probabilistic state is one where both $\mathsf{X}$ and $\mathsf{Y}$ are in random classical states — each is 0 or 1 with probability 1/2 — but the two bits are always in the same classical state.
This is an example of a *correlation* between these systems.

As discussed in the previous lesson, probabilistic states of systems may be represented by probability vectors, which are column vectors whose indices are placed in correspondence with the underlying classical state set of the system
being considered.
To represent a probabilistic state of multiple systems together, where the classical state set of these systems together is given by a Cartesian product, one must therefore decide on an ordering of the elements of this Cartesian
product.

There is a simple convention for doing this (assuming that the individual classical state sets from which the Cartesian product is formed have already been ordered), which is essentially to use *alphabetical ordering*.
Equivalently, we view that the entries in $n$-tuples are listed by significance that decreases from left to right.

For example, the Cartesian product $\{1,2,3\}\times\{0,1\}$ is ordered like this:

$$
  (1,0),\;
  (1,1),\;
  (2,0),\;
  (2,1),\;
  (3,0),\;
  (3,1).
$$

When $n$-tuples are written as strings we observe familiar patterns, such as $\{0,1\}\times\{0,1\}$ being ordered as $00, 01, 10, 11$, and the set $\{0,1\}^{10}$ being ordered as was suggested above.

Thus, the probabilistic state described just above is represented by this probability vector (where the entries are labeled explicitly for the sake of clarity):

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
  \begin{array}{l}
    \leftarrow \text{probability associated with state 00}\\[1mm]
    \leftarrow \text{probability associated with state 01}\\[1mm]
    \leftarrow \text{probability associated with state 10}\\[1mm]
    \leftarrow \text{probability associated with state 11}
  \end{array}
$$

A special type of probabilistic state of multiple systems is one in which the systems are *independent*.
Suppose once again that $\mathsf{X}$ and $\mathsf{Y}$ are systems having classical state sets $\Sigma$ and $\Gamma$, respectively.
A probabilistic state of these two systems represents a situation of independence between these two systems if it is the case that

$$
  \operatorname{Pr}((\mathsf{X},\mathsf{Y}) = (a,b)) 
  = \operatorname{Pr}(\mathsf{X} = a) \operatorname{Pr}(\mathsf{Y} = b),
$$

for every choice of $a\in\Sigma$ and $b\in\Gamma$.
Assuming that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector $u$, this condition is equivalent to the existence of a probability vector $v$ indexed by $\Sigma$ and a probability vector $w$ indexed by $\Gamma$ such that

$$
  u(a,b) = v(a)w(b)
$$

for all $a\in\Sigma$ and $b\in\Gamma$.
Notice that here we have written $u(a,b)$ rather than $u((a,b))$, simply as a matter of readability.
Although the expression $u((a,b))$ more formally represents the situation at hand, where we are referring to the entry of the vector $u$ indexed by the pair $(a,b)$, it is conventional in mathematics that parentheses are eliminated when they do not serve to add clarity or remove ambiguity.

For example, the probabilistic state described previously does not represent independence between the systems $\mathsf{X}$ and $\mathsf{Y}$.
A simple way to argue this is as follows.
Suppose that there did exist probability vectors $v$ and $w$, both indexed by the set $\{0,1\}$, such that the condition just described was satisfied.
It would then necessarily be that

$$
  v(0) w(1) = \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (0,1)\bigr) = 0.
$$

This implies that either $v(0) = 0$ or $w(1) = 0$, by a property known as the *zero-product property* of the real numbers: the only way that the product of two real numbers can be zero is if either or both numbers are themselves equal to zero.
This, however, implies that either $v(0) w(0) = 0$ (in case $v(0) = 0$) or $v(1) w(1) = 0$ (in case $w(1) = 0$).
We see, however, that neither of those equalities can be true because we must have $v(0)w(0)=1/2$ and
$v(1)w(1)=1/2$.
Hence, there do not exist vectors $v$ and $w$ satisfying the property.

On the other hand, the probabilistic state of a pair of bits $(\mathsf{X},\mathsf{Y})$ represented by the vector

$$
  u = \begin{pmatrix}
    \frac{1}{6}\\[2mm]
    \frac{1}{12}\\[2mm]
    \frac{1}{2}\\[2mm]
    \frac{1}{4}
  \end{pmatrix}
$$

is one in which $\mathsf{X}$ and $\mathsf{Y}$ are independent.
Specifically, the condition above required for independence is true for the probability vectors

$$
  v = \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
  \quad\text{and}\quad
  w = \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

This condition of independence can be expressed succinctly through the notion of a *tensor product*.
This is a very general notion that can be expressed quite abstractly and applied to a variety of mathematical structures — but for vectors indexed for Cartesian products, it can be expressed in very simple and concrete terms.
If $u$ is a vector indexed by a set $\Sigma$ and $v$ is a vector indexed by a set $\Gamma$, then the tensor product $u\otimes v$ of these two vectors is the vector defined as

$$
  (u\otimes v)(a,b) = u(a) v(b)
$$

for every $a\in\Sigma$ and $b\in\Gamma$.
That is to say, the condition for independence described previously is equivalent to $u$ being equal to the
*tensor product* of two probability vectors $v$ and $w$:

$$
  u = v\otimes w.
$$

(*** Mention that correlation is defined as a lack of independence.)

(*** Generalize to three or more systems.)

## Measurements

Now let us move on to measurements of multiple systems.
We find, again, that by choosing to view multiple systems together as single systems, we obtain obtain a specification of how measurements work for multiple systems, assuming that *all* of the systems are measured.

For example, if the probabilistic state of two bits $(\mathsf{X},\mathsf{Y})$ is described by the probability vector

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
$$

then the outcome $(0,0)$ is obtained with probability 1/2 and $(1,1)$ is obtained with probability 1/2, and in each case we update the probability vector description of our knowledge accordingly (so that the probabilistic state becomes $|00\rangle$ or $|11\rangle$, respectively).

Suppose, however, that we choose not to measure every system — or perhaps not every system is available to us — and instead just measure a subset of the systems.
Beginning with two systems, let us suppose as usual that $\mathsf{X}$ is a system having classical state set $\Sigma$, $\mathsf{Y}$ is a system having classical state set $\Gamma$, and the two systems $(\mathsf{X},\mathsf{Y})$ together are in some probabilistic state.
Let us consider what happens when we just measure $\mathsf{X}$ and do nothing to $\mathsf{Y}$.

First, we know that the probability to observe a particular classical state $a\in\Sigma$ in system $\mathsf{X}$ must be consistent with the probabilities we would obtain if $\mathsf{Y}$ had also been measured.
That is, we must have

$$
\operatorname{Pr}(\mathsf{X} = a) = \sum_{b\in\Gamma} \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (a,b) \bigr).
$$

This is the formula for the so-called *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ alone.
Notice that this formula makes perfect sense at an intuitive level — and if it were not true, it would mean that the probabilities of obtaining different outcomes for the system $\mathsf{X}$ were somehow influenced by whether or not $\mathsf{Y}$ was also measured, even though $\mathsf{Y}$ is a separate system and possibly in a different location (thereby allowing for superluminal signaling, for instance).

However, given that only $\mathsf{X}$ has been measured, and $\mathsf{Y}$ has not, there will in general still exist uncertainty over the classical state of $\mathsf{Y}$.
Thus, rather than updating our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $|a,b\rangle$ for some selection of $a\in\Sigma$ and $b\in\Gamma$, we must still represent our knowledge of $\mathsf{Y}$ by a probability vector.
The following formula for *conditional probabililies* can be used for this purpose:

$$
\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)
= \frac{\operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (a,b)\bigr)}{\operatorname{Pr}(\mathsf{X} = a)}
$$

Here, the expression $\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)$ denotes the probability that $\mathsf{Y} = b$ *conditioned* on (or *given* that) $\mathsf{X} = a$.
Note that this expression is only defined if $\operatorname{Pr}(\mathsf{X}=a)$ is nonzero — for otherwise we obtain the indeterminate form $\frac{0}{0}$.

To express these formulas in terms of probability vectors, let us assume that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector $u$, whose indices have been placed in correspondence with the Cartessian product $\Sigma\times\Gamma$.
Measuring just the system $\mathsf{X}$ yields each possible outcome with probabilities as follows:

$$
v(a) = \operatorname{Pr}(\mathsf{X} = a) = \sum_{c\in\Gamma} u(a,c).
$$

The probability vector $v$ defined in this way represents the *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ alone.
Having obtained a particular outcome $a\in\Sigma$ of the measurement of $\mathsf{X}$, the probabilistic state of $\mathsf{Y}$ is updated according to the formula for conditional probabilities:

$$
w_a(b) = \frac{u(a,b)}{\sum_{c\in\Gamma} u(a,c)}.
$$

(*** Explain with a couple of simple examples.)



(*** In general for any number of systems.)


## Operations

(*** Not yet written.)