# Multiple systems

The focus of this lesson is on the basics of quantum information when there are *multiple* systems being considered or described.
This is a continuation of the previous lesson's discussion of single quantum systems in isolation.

A simple, yet very important, idea to keep in mind going into this lesson is that one can always choose to view multiple systems *together* as if they form a *single system* — to which the discussion in the previous lesson must then apply.
Indeed, this idea very directly leads to a description of how quantum states, measurements, and operations work for multiple systems.

There is more, however, to understanding multiple quantum systems than to recognize that they may be viewed collectively as single systems.
For instance, we may have multiple quantum systems that are collectively in a particular quantum state, and then choose to measure just one (or a proper subset) of the individual systems.
In general, this will affect the state of the remaining systems, and it is important to understand exactly how when analyzing quantum algorithms and protocols.
An understanding of the sorts of *correlations* among multiple systems — and particularly a type of correlation known as *entanglement* — is also important in quantum information and computation.

## 1. Classical information (incomplete) <a id='classical-info'></a>

As in the previous lesson, we will begin with a short discussion of classical information.
Once again, the probabilistic and quantum descriptions are very much analogous at a mathematical level, and recognizing how the mathematics works in the familiar setting of classical information is helpful in understanding why quantum information is described as it is.

### 1.1 Classical state sets <a id='classical-state-sets'></a>

Let us begin with *classical state sets* of multiple systems.
For simplicity we will begin by discussing just two systems, and then generalize to more than two systems.

Specifically, let us suppose that $\mathsf{X}$ is a system having classical state set $\Sigma$ and $\mathsf{Y}$ is a second system having classical state set $\Gamma$.
As in the previous lesson, because we have referred to these sets as *classical state sets*, we assume that $\Sigma$ and $\Gamma$ are finite and nonempty.
Note that it could be that $\Sigma = \Gamma$, but this is not required — and, in any case, it is helpful to use different names to refer to these sets in the interest of clarity.

Imagine that the two systems $\mathsf{X}$ and $\mathsf{Y}$ are placed side-by-side, with $\mathsf{X}$ on the left and $\mathsf{Y}$ on the right, and viewed together as if they form a single system.
We may denote this new joint system by $(\mathsf{X},\mathsf{Y})$ or $\mathsf{XY}$, depending on our preferences or whichever is more convenient for the case at hand.
One may then ask: What is the classical state set of this single, joint system $(\mathsf{X},\mathsf{Y})$?

The answer is that the classical state set of $(\mathsf{X},\mathsf{Y})$ is the *Cartesian product* of $\Sigma$ and $\Gamma$, which is the set defined as

$$
  \Sigma\times\Gamma = \bigl\{
  (a,b)\,:\,a\in\Sigma\;\text{and}\;b\in\Gamma\bigr\}.
$$

In simple terms, the Cartesian product is the mathematical notion that captures the idea of viewing an element of one set and an element of a second set together as a single element of a single set.
In the case at hand, to say that $(\mathsf{X},\mathsf{Y})$ is in the classical state $(a,b)\in\Sigma\times\Gamma$ means that $\mathsf{X}$ is in the classical state $a\in\Sigma$ and $\mathsf{Y}$ is in the classical state $b\in\Gamma$;
and if the classical state of $\mathsf{X}$ is $a\in\Sigma$ and the classical state of $\mathsf{Y}$ is $b\in\Gamma$, then the classical state of the joint system $(\mathsf{X},\mathsf{Y})$ is $(a,b)$.

For more than two systems, the situation generalizes in a natural way.
Suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, for any positive integer $n$.
The classical state set of the $n$-tuple $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$, viewed as a single joint system, is then the Cartesian product

$$
  \Sigma_1\times\cdots\times\Sigma_n
  = \bigl\{(a_1,\ldots,a_n)\,:\,
  a_1\in\Sigma_1,\:\ldots,\:a_n\in\Sigma_n\bigr\}.
$$

#### Classical states of multiple systems as strings

It is often convenient to write a classical state of the form $(a_1,\ldots,a_n)$ as a *string* $a_1\cdots a_n$ for the sake of brevity, particularly when the classical state sets $\Sigma_1,\ldots,\Sigma_n$ are associated with sets of *symbols* or *characters*.
Indeed, the notion of a string is formalized in mathematical terms through Cartesian products.
It is common that the term *alphabet* is used to refer to the symbols appearing in strings, but the mathematical definition of an alphabet is precisely the same as the definition of a classical state set: it is a finite and nonempty set.

For example, suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_\mathrm{10}$ are bits, so that the classical state sets of these systems are all the same:

$$
  \Sigma_1 = \cdots = \Sigma_{10} = \{0,1\}.
$$

There are then $2^{10} = 1024$ classical states of the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_\mathrm{10})$, which are the elements of the set

$$
  \Sigma_1\times\cdots\times\Sigma_{10} = \{0,1\}^{10}.
$$

Written as strings, these classical states look like this:

$$
  \begin{array}{c}
  0000000000\\
  0000000001\\
  0000000010\\
  0000000011\\
  0000000100\\
  \vdots\\[1mm]
  1111111111
  \end{array}
$$

For the classical state $0001001000$, for instance, we see that $\mathsf{X}_4$ and $\mathsf{X}_7$ are in the state $1$, while all of the other systems are in the state $0$.

### 1.2 Probabilistic states <a id='multiple-systems-probabilistic'></a>

As was discussed in the previous lesson, a probabilistic state associates a probability with each classical state of a system.
Thus, a probabilistic state of multiple systems together — viewed collectively as if they form a single system — associates a probability with each element of the Cartesian product of the classical state sets of the individual systems.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are both bits, so that their corresponding classical state sets are given by $\Sigma = \{0,1\}$ and $\Gamma = \{0,1\}$, respectively, we may have a probabilistic state like this:

$$
  \begin{aligned}
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,0)\bigr) 
    & = \frac{1}{2} \\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,1)\bigr) 
    & = 0\\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,0)\bigr) 
    & = 0\\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,1)\bigr) 
    & = \frac{1}{2}
  \end{aligned}
$$

This probabilistic state is one in which both $\mathsf{X}$ and $\mathsf{Y}$ are in random classical states — each is 0 with probability 1/2 and 1 with probability 1/2 — but the classical states of the two bits are always in agreement.
This is an example of a *correlation* between these systems; correlations are discussed more below.

#### Ordering Cartesian product state sets

As in the previous lesson, probabilistic states of systems are represented by *probability vectors*, which are column vectors whose indices are placed in correspondence with the underlying classical state set of the system being considered.
To represent a probabilistic state of multiple systems, where the classical state set of these systems together is given by a Cartesian product, one must therefore decide on an ordering of the elements of this Cartesian product.

Working under the assumption that the individual classical state sets from which the Cartesian product is formed have already been ordered, there is a simple convention for doing this, which is essentially to use *alphabetical ordering*.
Equivalently, the entries in each $n$-tuple (or, equivalently, the symbols in each string) are viewed as being listed by significance that *decreases from left to right*.

For example, according to this convention, the Cartesian product $\{1,2,3\}\times\{0,1\}$ is ordered like this:

$$
  (1,0),\;
  (1,1),\;
  (2,0),\;
  (2,1),\;
  (3,0),\;
  (3,1).
$$

When $n$-tuples are written as strings and ordered in this way, we observe familiar patterns, such as $\{0,1\}\times\{0,1\}$ being ordered as $00, 01, 10, 11$, and the set $\{0,1\}^{10}$ being ordered as was suggested above.

Thus, the probabilistic state described above is represented by the following probability vector (where the entries are labeled explicitly for the sake of clarity):

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
  \begin{array}{l}
    \leftarrow \text{probability associated with state 00}\\[1mm]
    \leftarrow \text{probability associated with state 01}\\[1mm]
    \leftarrow \text{probability associated with state 10}\\[1mm]
    \leftarrow \text{probability associated with state 11}
  \end{array}
$$

#### Independence and tensor products

A special type of probabilistic state of multiple systems is one in which the systems are *independent*.
Suppose once again that $\mathsf{X}$ and $\mathsf{Y}$ are systems having classical state sets $\Sigma$ and $\Gamma$, respectively.
A probabilistic state of these two systems represents a situation of *independence* between these two systems if it is the case that

$$
  \operatorname{Pr}((\mathsf{X},\mathsf{Y}) = (a,b)) 
  = \operatorname{Pr}(\mathsf{X} = a) \operatorname{Pr}(\mathsf{Y} = b),
$$

for every choice of $a\in\Sigma$ and $b\in\Gamma$.
Intuitively speaking, two systems are independent if the probabilities associated with the classical states of either one of the system are not affected in any way by the classical state of the other system.

Assuming that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector $u$, this condition is equivalent to the existence of a probability vector $v$, indexed by $\Sigma$ and given by

$$
v(a) = \operatorname{Pr}(\mathsf{X} = a)
$$

for each $a\in\Sigma$, and a probability vector $w$, indexed by $\Gamma$ and given by

$$
w(b) = \operatorname{Pr}(\mathsf{Y} = b)
$$

for each $b\in\Gamma$, such that

$$
  u(a,b) = v(a)w(b)
$$

for all $a\in\Sigma$ and $b\in\Gamma$.
(Notice that here we have written $u(a,b)$ rather than $u((a,b))$, simply as a matter of readability: although the expression $u((a,b))$ more formally represents the situation at hand, where we are referring to the entry of the vector $u$ indexed by the pair $(a,b)$, it is conventional in mathematics that parentheses are eliminated when they do not serve to add clarity or remove ambiguity.)

For example, the probabilistic state described previously does not represent independence between the systems $\mathsf{X}$ and $\mathsf{Y}$.
A simple way to argue this is as follows.
Suppose that there did exist probability vectors $v$ and $w$, both indexed by the set 
$\{0,1\}$, such that the condition just described was satisfied.
It would then necessarily be that

$$
  v(0) w(1) = u(0,1) = \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (0,1)\bigr) = 0.
$$

This implies that either $v(0) = 0$ or $w(1) = 0$, by a property known as the *zero-product property* of the real numbers: the only way that the product of two real numbers can be zero is if either or both numbers are themselves equal to zero.
This, however, implies that either $v(0) w(0) = 0$ (in case $v(0) = 0$) or $v(1) w(1) = 0$ (in case $w(1) = 0$).
We see, however, that neither of those equalities can be true because we must have $v(0)w(0)=1/2$ and $v(1)w(1)=1/2.$
Hence, there do not exist vectors $v$ and $w$ satisfying the property required for independence.

On the other hand, the probabilistic state of a pair of bits $(\mathsf{X},\mathsf{Y})$ represented by the vector

$$
  u = \begin{pmatrix}
    \frac{1}{6}\\[2mm]
    \frac{1}{12}\\[2mm]
    \frac{1}{2}\\[2mm]
    \frac{1}{4}
  \end{pmatrix}
$$

is one in which $\mathsf{X}$ and $\mathsf{Y}$ are independent.
Specifically, the condition required for independence is true for the probability vectors

$$
  v = \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
  \quad\text{and}\quad
  w = \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

This condition of independence can be expressed succinctly through the notion of a *tensor product*.
This is a very general notion that can be defined quite abstractly and applied to a variety of mathematical structures — but for vectors indexed for Cartesian products it can be defined in very simple and concrete terms.
If $v$ is a vector indexed by a set $\Sigma$ and $w$ is a vector indexed by a set $\Gamma$, then the tensor product $v\otimes w$ of these two vectors is the vector indexed by $\Sigma\times\Gamma$ and defined as

$$
  (v\otimes w)(a,b) = v(a) w(b)
$$

for every $a\in\Sigma$ and $b\in\Gamma$.
That is to say, the condition for independence described previously is equivalent to $u$ being equal to the *tensor product* of two probability vectors $v$ and $w$:

$$
  u = v\otimes w.
$$

In this situation it is said that $u$ is a *product state* or *product vector*.

Notice that when we use the convention described previously for ordering the elements of Cartesian product sets — meaning alphabetical ordering — we obtain the following specification for the tensor product of two column vectors:

$$
  \begin{pmatrix}
  \alpha_1\\
  \vdots\\
  \alpha_m
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
  \beta_1\\
  \vdots\\
  \beta_k
  \end{pmatrix}
  =
  \begin{pmatrix}
  \alpha_1 \beta_1\\
  \vdots\\
  \alpha_1 \beta_k\\
  \alpha_2 \beta_1\\
  \vdots\\
  \alpha_2 \beta_k\\
  \vdots\\
  \alpha_m \beta_1\\
  \vdots\\
  \alpha_m \beta_k
  \end{pmatrix}
$$

This operation is sometimes referred to specifically as the *Kronecker product*, but for the purposes of this lesson there is little to be gained in distinguishing it from the tensor product.

The tensor product of two vectors has the important property that it is *bilinear*, which means that it is linear in each of the two arguments separately, assuming that the other argument is fixed.
This property can be expressed through these equations:

$$
  \begin{aligned}
    v \otimes (w_1 + w_2) & = v \otimes w_1 + v \otimes w_2\\[2mm]
    v \otimes (\alpha w) & = \alpha (v \otimes w)
  \end{aligned}
$$

and

$$
  \begin{aligned}
    (v_1 + v_2) \otimes w & = v_1 \otimes w + v_2 \otimes w\\[2mm]
    (\alpha v) \otimes w & = \alpha (v \otimes w)
  \end{aligned}
$$

Having defined independence between two systems in this way, we can now be more precise in defining a *correlation* as a *lack of independence*.
For example, the two bits in the probabilistic state represented by the vector

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
$$

are not independent — because the vector cannot be expressed as a tensor product, as was argued previously — and so they are correlated.
 
Once again, this description generalizes naturally to three or more systems.
If $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, then a probabilistic state of the combined system $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$ is a *product state* if the associated probability vector takes the form

$$
u = v_1\otimes \cdots \otimes v_n
$$

for probability vectors $v_1,\ldots,v_n$ describing probabilistic states of $\mathsf{X}_1,\ldots,\mathsf{X}_n$.
Here, the definition of the tensor product generalized in a natural way:

$$
(v_1\otimes \cdots \otimes v_n)(a_1,\ldots,a_n) = v_1(a_1) \cdots v_n(a_n)
$$

for all choices of $a_1\in\Sigma_1, \ldots, a_n\in\Sigma_n$.

Similar to the tensor product of just two vectors, the tensor product of three or more vectors is linear in each of the arguments, again assuming that the other arguments are fixed.
In this case, we say that the tensor product of three or more vectors is *mulitilinear*.

As we did in the case of two systems, we may say that the systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are *independent* when they are in such a probabilistic state, but the term *mutually independent* is more precise:
there happen to be other notions of independence for three or more systems, such as *pairwise independence*, that we will not be concerned with at this time.

As an important aside, we observe the following expression for tensor products of standard basis vectors:

$$
\vert a \rangle \otimes \vert b \rangle = \vert a,b \rangle
$$ 

(where we used the typical convention of dropping unuseful parentheses, rather than writing $\vert (a,b)\rangle$).
Alternatively, using the notation of strings, we have 

$$
\vert a \rangle \otimes \vert b \rangle = \vert ab \rangle.
$$

More generally, for any positive integer $n$ and any classical states $a_1,\ldots,a_n$, we have

$$
\vert a_1 \rangle \otimes \cdots \otimes \vert a_n \rangle = \vert a_1,\ldots,a_n \rangle = \vert a_1 \cdots a_n \rangle.
$$

One final remark on tensor products and the Dirac notation is that it is common that the tensor product symbol $\otimes$ is omitted when taking the tensor product of vectors written as kets.
For example, we often write $\vert a\rangle \vert b \rangle$ and $\vert a_1 \rangle \cdots \vert a_n \rangle$ rather than $\vert a \rangle \otimes \vert b \rangle$ and $\vert a_1 \rangle \otimes \cdots \otimes \vert a_n \rangle$, respectively.
This convention captures the idea that the tensor product is, in some sense, the most natural or default way to take the product of two vectors.

### 1.3 Measurements of probabilistic states

Now let us move on to measurements of probabilistic states of multiple systems.
We find that by choosing to view multiple systems together as single systems, we obtain a specification of how measurements must work for multiple systems — assuming that *all* of the systems are measured.

For example, if the probabilistic state of two bits $(\mathsf{X},\mathsf{Y})$ is described by the probability vector

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
$$

then the outcome $(0,0)$ is obtained with probability 1/2 and $(1,1)$ is obtained with probability 1/2, and in each case we update the probability vector description of our knowledge accordingly (so that the probabilistic state becomes $|00\rangle$ or $|11\rangle$, respectively).

#### Partial measurements

Suppose, however, that we choose not to measure *every* system, but instead we just measure some *proper subset* of the systems.

Beginning with two systems, let us suppose (as usual) that $\mathsf{X}$ is a system having classical state set $\Sigma$, $\mathsf{Y}$ is a system having classical state set $\Gamma$, and the two systems $(\mathsf{X},\mathsf{Y})$ together are in some probabilistic state.
We will consider what happens when we just measure $\mathsf{X}$ and do nothing to $\mathsf{Y}$.

First, we know that the probability to observe a particular classical state $a\in\Sigma$ when just $\mathsf{X}$ is measured must be consistent with the probabilities we would obtain had $\mathsf{Y}$ also been measured.
That is, we must have

$$
  \operatorname{Pr}(\mathsf{X} = a) 
  = \sum_{b\in\Gamma} \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (a,b) \bigr).
$$

This is the formula for the so-called *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ alone.

This formula makes perfect sense at an intuitive level, in the sense that something very strange would have to happen for it to be false: it would mean that the probabilities of obtaining different outcomes when $\mathsf{X}$ is measured could somehow be influenced simply by whether or not $\mathsf{Y}$ was also measured.
If $\mathsf{Y}$ happened to be in a distant location, for instance, this would allow for superluminal signaling, which we immediately reject based on our understanding of physics.

Now, given the assumption that only $\mathsf{X}$ has been measured and $\mathsf{Y}$ has not, there may in general still exist uncertainty over the classical state of $\mathsf{Y}$.
For this reason, rather than updating our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert a,b\rangle$ for some selection of $a\in\Sigma$ and $b\in\Gamma$, we must update our description so that this uncertainty about $\mathsf{Y}$ is properly reflected.
The following *conditional probability* formula can be used for this purpose:

$$
\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)
= \frac{\operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (a,b)\bigr)}{\operatorname{Pr}(\mathsf{X} = a)}.
$$

Here, the expression $\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)$ denotes the probability that $\mathsf{Y} = b$ *conditioned* on (or *given* that) $\mathsf{X} = a$.
Note that this expression is only defined if $\operatorname{Pr}(\mathsf{X}=a)$ is nonzero:
if $\operatorname{Pr}(\mathsf{X}=a) = 0$, we obtain the indeterminate form $\frac{0}{0}$.
This is not a problem because if $\operatorname{Pr}(\mathsf{X}=a) = 0$, then we will never observe $a$ as an outcome of a measurement of $\mathsf{X}$, so we need not be concerned with this possibility.

To express these formulas in terms of probability vectors, let us assume that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector $u$, whose indices have been placed in correspondence with the Cartessian product $\Sigma\times\Gamma$.
Measuring just the system $\mathsf{X}$ alone yields each possible outcome with probabilities as follows:

$$
v(a) = \operatorname{Pr}(\mathsf{X} = a) = \sum_{c\in\Gamma} u(a,c).
$$

As was already suggested, the probability vector $v$ defined in this way represents the *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ by itself.
Having obtained a particular outcome $a\in\Sigma$ of the measurement of $\mathsf{X}$, the probabilistic state of $\mathsf{Y}$ is updated according to the formula for conditional probabilities:

$$
w_a(b) = \frac{u(a,b)}{\sum_{c\in\Gamma} u(a,c)}.
$$

In the event that the measurement of $\mathsf{X}$ resulted in the classical state $a$, we therefore update our description of the probabilistic state of the joint system $(\mathsf{X},\mathsf{Y})$ to $\vert a\rangle \otimes w_a$.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are bits in the probabilistic state

$$
  u = 
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix},
$$

then a measurement of just the bit $\mathsf{X}$ results in the outcomes 0 and 1 with probabilities as follows:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{X} = 0) 
    & = u(0,0) + u(0,1) = \frac{1}{2} + 0 = \frac{1}{2},\\[2mm]
    \operatorname{Pr}(\mathsf{X} = 1) 
    & = u(1,0) + u(1,1) = 0 + \frac{1}{2} = \frac{1}{2}.
  \end{aligned}
$$

If the measurement outcome is 0, then the resulting probabilistic state $w_0$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_0(0) & = \frac{u(0,0)}{u(0,0) + u(0,1)} = \frac{\frac{1}{2}}{\frac{1}{2}} = 1\\[2mm]
    w_0(1) & = \frac{u(0,1)}{u(0,0) + u(0,1)} = \frac{0}{\frac{1}{2}} = 0.
  \end{aligned}
$$

That is, we have $w_0 = \vert 0 \rangle$.
Through a similar calculation, if the outcome of the measurement of $\mathsf{X}$ is 1, the resulting probabilistic state $w_1$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_1(0) & = \frac{u(1,0)}{u(1,0) + u(1,1)} = \frac{0}{\frac{1}{2}} = 0\\[2mm]
    w_1(1) & = \frac{u(1,1)}{u(1,0) + u(1,1)} = \frac{\frac{1}{2}}{\frac{1}{2}} = 1,
  \end{aligned}
$$

and so $w_1 = \vert 1 \rangle$.

Thus, for this particular example, there is no uncertainty remaining about $\mathsf{Y}$ when $\mathsf{X}$ is measured: if we obtain the outcome 0, we update our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert 0 \rangle \otimes \vert 0 \rangle = \vert 00\rangle$, and if we obtain the outcome 1, we update our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert 1 \rangle \otimes \vert 1 \rangle = \vert 11\rangle$.

On the other hand, if $\mathsf{X}$ and $\mathsf{Y}$ are bits in the probabilistic state

$$
  u = 
  \begin{pmatrix}
    \frac{1}{6}\\[2mm]
    \frac{1}{12}\\[2mm]
    \frac{1}{2}\\[2mm]
    \frac{1}{4}
  \end{pmatrix},
$$

then a measurement of just the bit $\mathsf{X}$ results in the outcomes 0 and 1 with probabilities as follows:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{X} = 0) 
    & = u(0,0) + u(0,1) = \frac{1}{6} + \frac{1}{12} = \frac{1}{4} \\[2mm]
    \operatorname{Pr}(\mathsf{X} = 1) 
    & = u(1,0) + u(1,1) = \frac{1}{2} + \frac{1}{4} = \frac{3}{4}.
  \end{aligned}
$$

If the measurement outcome is 0, then the resulting probabilistic state $w_0$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_0(0) & = \frac{u(0,0)}{u(0,0) + u(0,1)} 
    = \frac{\frac{1}{6}}{\frac{1}{6} + \frac{1}{12}} = \frac{2}{3} \\[2mm]
    w_0(1) & = \frac{u(0,1)}{u(0,0) + u(0,1)} 
    = \frac{\frac{1}{12}}{\frac{1}{6} + \frac{1}{12}} = \frac{1}{3},
  \end{aligned}
$$

which is to say that

$$
w_0 = \begin{pmatrix}
\frac{2}{3} \\[2mm]
\frac{1}{3}
\end{pmatrix}.
$$

Through a similar calculation, if the outcome of the measurement of $\mathsf{X}$ is 1, the resulting probabilistic state $w_1$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_1(0) & = \frac{u(1,0)}{u(1,0) + u(1,1)} 
    = \frac{\frac{1}{2}}{\frac{1}{2} + \frac{1}{4}} = \frac{2}{3}\\[2mm]
    w_1(1) & = \frac{u(1,1)}{u(1,0) + u(1,1)} 
    = \frac{\frac{1}{4}}{\frac{1}{2} +\frac{1}{4}} = \frac{1}{3},
  \end{aligned}
$$

which is to say that

$$
  w_1 = 
  \begin{pmatrix}
    \frac{2}{3} \\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

This is not a surprise.
Recall that $\mathsf{X}$ and $\mathsf{Y}$ are independent in this example: we have 

$$
u = 
  \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix},
$$

and so naturally the probabilities for the two possible outcomes of the measurement of $\mathsf{X}$ are described by the probability vector

$$
  \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
$$

as we have calculated, and in either case the resulting probabilistic state of $\mathsf{Y}$ is described by the probability vector

$$
  \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

That is, knowing that $\mathsf{X}$ and $\mathsf{Y}$ are independent in this example, we did not really need to go through the trouble of performing the calculations above — but doing so served as a good example and reality check.

The sorts of calculations just described, where the probabilistic state of one system conditioned on another system taking a particular state, can be performed directly using the Dirac notation.
To illustrate how the method works, let us consider a new example where the classical state set of $\mathsf{X}$ is $\Sigma = \{1,2,3\}$, the classical state set of $\mathsf{Y}$ is $\Gamma = \{0,1\}$, and the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is

$$
  u = \frac{1}{2}  \vert 1,0 \rangle
    + \frac{1}{12} \vert 1,1 \rangle
    + \frac{1}{6}  \vert 2,1 \rangle
    + \frac{1}{12} \vert 3,0 \rangle
    + \frac{1}{6}  \vert 3,1 \rangle,
$$

which we may alternatively write as a column vector

$$
  u = 
  \begin{pmatrix}
    \frac{1}{2}\\
    \frac{1}{12}\\
    0\\
    \frac{1}{6}\\
    \frac{1}{12}\\
    \frac{1}{6}
  \end{pmatrix}.
$$

This time let us suppose that the *second* system $\mathsf{Y}$ is measured.
Our goal will be to determine the probabilities of the two possible outcomes (0 and 1), and to calculate what the resulting probabilistic state of $\mathsf{X}$ is for the two outcomes.

Using the bilinearity of the tensor product, and specifically the fact that it is linear in the *first* argument, we may rewrite the vector $u$ as follows:

$$
  u = \biggl( \frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle\biggr)
  \otimes \vert 0\rangle
  + \biggl( \frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle 
  + \frac{1}{6} \vert 3 \rangle\biggr) \otimes \vert 1\rangle.
$$

Specifically, we have isolated the distinct standard basis vectors for the system being measured (which in this example is the second system $\mathsf{Y}$), collecting all of the terms for the first system as is required to do this.
A moment's thought reveals that this is always possible, regardless of what vector we started with.

The probabilities for the two outcomes when $\mathsf{Y}$ is measured are now easily inferred:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{Y} = 0) & = \frac{1}{2} + \frac{1}{12} = \frac{7}{12}\\[2mm]
    \operatorname{Pr}(\mathsf{Y} = 1) & = \frac{1}{12} + \frac{1}{6} + \frac{1}{6} 
    = \frac{5}{12}.
  \end{aligned}
$$

Moreover, the probabilistic state of $\mathsf{X}$, conditioned on each possible outcome, can also be quickly inferred by simply *normalizing* the vectors in parentheses by dividing by the associated probability just calculated, so that these vectors become probability vectors.
That is, conditioned on the measurement of $\mathsf{Y}$ being 0, the probabilistic state of $\mathsf{X}$ becomes

$$
 \frac{\frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle}{\frac{7}{12}}
 = \frac{6}{7} \vert 1 \rangle + \frac{1}{7} \vert 3 \rangle,
$$

and conditioned on the measurement of $\mathsf{Y}$ being 1, the probabilistic state of
$\mathsf{X}$ becomes

$$
  \frac{\frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle 
  + \frac{1}{6} \vert 3 \rangle}{\frac{5}{12}}
  = \frac{1}{5} \vert 1 \rangle + \frac{2}{5} \vert 2 \rangle + \frac{2}{5} \vert 3 \rangle.
$$

(*** In general for any number of systems.)

### 1.4 Operations on probabilistic states

To conclude this discussion of classical information for multiple systems, we consider operations on multiple systems in probabilistic states. Once again, by viewing multiple systems together as single systems, we are provided with guidance by the previous lesson.

Returning to the typical set-up for two systems, where $\mathsf{X}$ and $\mathsf{Y}$ have classical state sets $\Sigma$ and $\Gamma$, respectively, we can consider classical operations on the joint system $(\mathsf{X},\mathsf{Y})$.
According to the previous lesson, any such operation, whether deterministic or probabilistic, is represented by a stochastic matrix whose rows and columns are indexed by the Cartesian product $\Sigma\times\Gamma$.

For example, suppose that $\mathsf{X}$ and $\mathsf{Y}$ are bits, and consider an operation with the following description:

<p style="padding-left: 5em; padding-right: 5em;">
   If $\mathsf{X} = 1$, then perform a NOT operation on 
   $\mathsf{Y}$, otherwise do nothing.
</p>

This is a deterministic operation known as a *controlled-NOT* operation, where $\mathsf{X}$ is the *control* bit that determines whether or not the *NOT operation* should or should not be applied to the *target* bit $\mathsf{Y}$.
The matrix representation of this operation is as follows:

$$
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 1 & 0 & 0\\[2mm]
0 & 0 & 0 & 1\\[2mm]
0 & 0 & 1 & 0
\end{pmatrix}
$$

<p style="padding-left: 5em; padding-right: 5em;">
    With probability 1/2, set $\mathsf{Y}$ to be equal to $\mathsf{X}$, 
    otherwise do nothing.
</p>


The matrix representation of this operation is as follows:

$$
\begin{pmatrix}
1 & \frac{1}{2} & 0 & 0\\[2mm]
0 & \frac{1}{2} & 0 & 0\\[2mm]
0 & 0 & \frac{1}{2} & 0\\[2mm]
0 & 0 & \frac{1}{2} & 1
\end{pmatrix}
=
\frac{1}{2}
\begin{pmatrix}
1 & 1 & 0 & 0\\[2mm]
0 & 0 & 0 & 0\\[2mm]
0 & 0 & 0 & 0\\[2mm]
0 & 0 & 1 & 1
\end{pmatrix}
+
\frac{1}{2}
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 1 & 0 & 0\\[2mm]
0 & 0 & 1 & 0\\[2mm]
0 & 0 & 0 & 1
\end{pmatrix}
$$




## 2. Quantum information

(*** Not yet written.)