# Multiple systems

The focus of this lesson is on the basics of quantum information when there are *multiple* systems being considered or described.
This is a continuation of the previous lesson's discussion of single quantum systems in isolation.

A simple, yet very important, idea to keep in mind going into this lesson is that one can always choose to view multiple systems *together* as if they form a *single system* — to which the discussion in the previous lesson must then apply.
Indeed, this idea very directly leads to a description of how quantum states, measurements, and operations work for multiple systems.

There is more, however, to understanding multiple quantum systems than to recognize that they may be viewed collectively as single systems.
For instance, we may have multiple quantum systems that are collectively in a particular quantum state, and then choose to measure just one (or a proper subset) of the individual systems.
In general, this will affect the state of the remaining systems, and it is important to understand exactly how when analyzing quantum algorithms and protocols.
An understanding of the sorts of *correlations* among multiple systems — and particularly a type of correlation known as *entanglement* — is also important in quantum information and computation.

## 1. Classical information <a id='multiple-systems-classical-info'></a>

As in the previous lesson, we will begin with a discussion of classical information.
Once again, the probabilistic and quantum descriptions are very much analogous at a mathematical level, and recognizing how the mathematics works in the familiar setting of classical information is helpful in understanding why quantum information is described as it is.

### 1.1 Classical state sets <a id='multiple-systems-classical-state-sets'></a>

Let us begin with *classical state sets* of multiple systems.
For simplicity we will begin by discussing just two systems, and then generalize to more than two systems.

Specifically, let us suppose that $\mathsf{X}$ is a system having classical state set $\Sigma$ and $\mathsf{Y}$ is a second system having classical state set $\Gamma$.
As in the previous lesson, because we have referred to these sets as *classical state sets*, we assume that $\Sigma$ and $\Gamma$ are finite and nonempty.
It could be that $\Sigma = \Gamma$, but this is not required — and, in any case, it is helpful to use different names to refer to these sets in the interest of clarity.

Imagine that the two systems $\mathsf{X}$ and $\mathsf{Y}$ are placed side-by-side, with $\mathsf{X}$ on the left and $\mathsf{Y}$ on the right, and viewed together as if they form a single system.
We may denote this new joint system by $(\mathsf{X},\mathsf{Y})$ or $\mathsf{XY}$, depending on our preferences or whichever is more convenient for the case at hand.
One may then ask: What is the classical state set of this single, joint system $(\mathsf{X},\mathsf{Y})$?

The answer is that the classical state set of $(\mathsf{X},\mathsf{Y})$ is the *Cartesian product* of $\Sigma$ and $\Gamma$, which is the set defined as

$$
  \Sigma\times\Gamma = \bigl\{(a,b)\,:\,a\in\Sigma\;\text{and}\;b\in\Gamma\bigr\}.
$$

In simple terms, the Cartesian product is the mathematical notion that captures the idea of viewing an element of one set and an element of a second set together as a single element of a single set.
In the case at hand, to say that $(\mathsf{X},\mathsf{Y})$ is in the classical state $(a,b)\in\Sigma\times\Gamma$ means that $\mathsf{X}$ is in the classical state $a\in\Sigma$ and $\mathsf{Y}$ is in the classical state $b\in\Gamma$;
and if the classical state of $\mathsf{X}$ is $a\in\Sigma$ and the classical state of $\mathsf{Y}$ is $b\in\Gamma$, then the classical state of the joint system $(\mathsf{X},\mathsf{Y})$ is $(a,b)$.

For more than two systems, the situation generalizes in a natural way.
Suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, for any positive integer $n$.
The classical state set of the $n$-tuple $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$, viewed as a single joint system, is then the Cartesian product

$$
  \Sigma_1\times\cdots\times\Sigma_n
  = \bigl\{(a_1,\ldots,a_n)\,:\,
  a_1\in\Sigma_1,\:\ldots,\:a_n\in\Sigma_n\bigr\}.
$$

#### Classical states of multiple systems as strings

It is often convenient to write a classical state of the form $(a_1,\ldots,a_n)$ as a *string* $a_1\cdots a_n$ for the sake of brevity, particularly in the very typical situation that the classical state sets $\Sigma_1,\ldots,\Sigma_n$ are associated with sets of *symbols* or *characters*.
Indeed, the notion of a string, which is a fundamentally important concept in computer science, is formalized in mathematical terms through Cartesian products.
The term *alphabet* is commonly used to refer to sets of symbols used to form strings, but the mathematical definition of an alphabet is precisely the same as the definition of a classical state set: it is a finite and nonempty set.

For example, suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_\mathrm{10}$ are bits, so that the classical state sets of these systems are all the same:

$$
  \Sigma_1 = \cdots = \Sigma_{10} = \{0,1\}.
$$

There are then $2^{10} = 1024$ classical states of the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_\mathrm{10})$, which are the elements of the set

$$
  \Sigma_1\times\cdots\times\Sigma_{10} = \{0,1\}^{10}.
$$

Written as strings, these classical states look like this:

$$
  \begin{array}{c}
  0000000000\\
  0000000001\\
  0000000010\\
  0000000011\\
  0000000100\\
  \vdots\\[1mm]
  1111111111
  \end{array}
$$

For the classical state $0001001000$, for instance, we see that $\mathsf{X}_4$ and $\mathsf{X}_7$ are in the state $1$, while all of the other systems are in the state $0$.

### 1.2 Probabilistic states <a id='multiple-systems-probabilistic'></a>

As was discussed in the previous lesson, a probabilistic state associates a probability with each classical state of a system.
Thus, a probabilistic state of multiple systems together — viewed collectively as if they form a single system — associates a probability with each element of the Cartesian product of the classical state sets of the individual systems.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are both bits, so that their corresponding classical state sets are given by $\Sigma = \{0,1\}$ and $\Gamma = \{0,1\}$, respectively, we may have a probabilistic state like this:

$$
  \begin{aligned}
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,0)\bigr) 
    & = \frac{1}{2} \\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,1)\bigr) 
    & = 0\\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,0)\bigr) 
    & = 0\\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,1)\bigr) 
    & = \frac{1}{2}
  \end{aligned}
$$

This probabilistic state is one in which both $\mathsf{X}$ and $\mathsf{Y}$ are in random classical states — each is 0 with probability 1/2 and 1 with probability 1/2 — but the classical states of the two bits are always in agreement.
This is an example of a *correlation* between these systems.

#### Ordering Cartesian product state sets

Probabilistic states of systems are represented by probability vectors, which are column vectors whose indices are placed in correspondence with the underlying classical state set of the system being considered.
To represent a probabilistic state of multiple systems as a probability vector, where the classical state set of these systems together is given by a Cartesian product, one must therefore decide on an ordering of the elements of this Cartesian product.

Working under the assumption that the individual classical state sets from which the Cartesian product is formed have already been ordered, there is a simple convention for doing this, which is essentially to use *alphabetical ordering*.
That is, the entries in each $n$-tuple (or, equivalently, the symbols in each string) are viewed as being ordered by significance that *decreases from left to right*.

For example, according to this convention, the Cartesian product $\{1,2,3\}\times\{0,1\}$ is ordered like this:

$$
  (1,0),\;
  (1,1),\;
  (2,0),\;
  (2,1),\;
  (3,0),\;
  (3,1).
$$

When $n$-tuples are written as strings and ordered in this way, we observe familiar patterns, such as $\{0,1\}\times\{0,1\}$ being ordered as $00, 01, 10, 11$, and the set $\{0,1\}^{10}$ being ordered as was suggested above.

Thus, the probabilistic state described above is represented by the following probability vector (where the entries are labeled explicitly for the sake of clarity):

$$
  u = 
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
  \begin{array}{l}
    \leftarrow \text{probability associated with state 00}\\[1mm]
    \leftarrow \text{probability associated with state 01}\\[1mm]
    \leftarrow \text{probability associated with state 10}\\[1mm]
    \leftarrow \text{probability associated with state 11}
  \end{array}
  \label{eq:correlatedbits} \tag{1.1}
$$

#### Independence and tensor products

A special type of probabilistic state of multiple systems is one in which the systems are *independent*.

Suppose once again that $\mathsf{X}$ and $\mathsf{Y}$ are systems having classical state sets $\Sigma$ and $\Gamma$, respectively.
A probabilistic state of these two systems represents a situation of *independence* between these two systems if it is the case that

$$
  \operatorname{Pr}((\mathsf{X},\mathsf{Y}) = (a,b)) 
  = \operatorname{Pr}(\mathsf{X} = a) \operatorname{Pr}(\mathsf{Y} = b),
$$

for every choice of $a\in\Sigma$ and $b\in\Gamma$.
Intuitively speaking, two systems are independent if the probabilities associated with the classical states of either one of the system are not affected in any way by the classical state of the other system.

(*** Need to clarify below how indexing into vectors with classical states works.)

Assuming that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector $u$, this condition is equivalent to the existence of a probability vector $v$, indexed by $\Sigma$ and given by

$$
v(a) = \operatorname{Pr}(\mathsf{X} = a)
$$

for each $a\in\Sigma$, and a probability vector $w$, indexed by $\Gamma$ and given by

$$
w(b) = \operatorname{Pr}(\mathsf{Y} = b)
$$

for each $b\in\Gamma$, such that

$$
  u(a,b) = v(a)w(b)
  \tag{1.2}
$$

for all $a\in\Sigma$ and $b\in\Gamma$.
(Notice that here we have written $u(a,b)$ rather than $u((a,b))$, simply as a matter of readability: although the expression $u((a,b))$ more formally represents the situation at hand, where we are referring to the entry of the vector $u$ indexed by the pair $(a,b)$, it is conventional in mathematics that parentheses are eliminated when they do not serve to add clarity or remove ambiguity.)

For example, the probabilistic state $(1.1)$ does not represent independence between the systems $\mathsf{X}$ and $\mathsf{Y}$.
A simple way to argue this is as follows.
Suppose that there did exist probability vectors $v$ and $w$, both indexed by the set $\{0,1\}$, satisfying the condition $(1.2)$ for every choice of $a$ and $b$.
It would then necessarily be that

$$
  v(0) w(1) = u(0,1) = \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (0,1)\bigr) = 0.
$$

This implies that either $v(0) = 0$ or $w(1) = 0$, by a property known as the *zero-product property* of the real numbers: the only way that the product of two real numbers can be zero is if either or both numbers are themselves equal to zero.
This, however, implies that either $v(0) w(0) = 0$ (in case $v(0) = 0$) or $v(1) w(1) = 0$ (in case $w(1) = 0$).
We see, however, that neither of those equalities can be true because we must have $v(0)w(0)=1/2$ and $v(1)w(1)=1/2.$
Hence, there do not exist vectors $v$ and $w$ satisfying the property required for independence.

On the other hand, the probabilistic state of a pair of bits $(\mathsf{X},\mathsf{Y})$ represented by the vector

$$
  u = \begin{pmatrix}
    \frac{1}{6}\\[2mm]
    \frac{1}{12}\\[2mm]
    \frac{1}{2}\\[2mm]
    \frac{1}{4}
  \end{pmatrix}
$$

is one in which $\mathsf{X}$ and $\mathsf{Y}$ are independent.
Specifically, the condition required for independence is true for the probability vectors

$$
  v = \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
  \quad\text{and}\quad
  w = \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

This condition of independence can be expressed succinctly through the notion of a *tensor product*.
This is a very general notion that can be defined quite abstractly and applied to a variety of mathematical structures — but for vectors indexed for Cartesian products it can be defined in very simple and concrete terms.
If $v$ is a vector indexed by a set $\Sigma$ and $w$ is a vector indexed by a set $\Gamma$, then the tensor product $v\otimes w$ of these two vectors is the vector indexed by $\Sigma\times\Gamma$ and defined as

$$
  (v\otimes w)(a,b) = v(a) w(b)
$$

for every $a\in\Sigma$ and $b\in\Gamma$.
That is, the condition $(1.2)$ is true for every choice of $a$ and $b$ if and only if $u$ is equal to the tensor product of $v$ and $w$:

$$
  u = v\otimes w.
$$

In this situation it is said that $u$ is a *product state* or *product vector*.

Notice that when we use the convention described previously for ordering the elements of Cartesian product sets — meaning alphabetical ordering — we obtain the following specification for the tensor product of two column vectors:

$$
  \begin{pmatrix}
  \alpha_1\\
  \vdots\\
  \alpha_m
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
  \beta_1\\
  \vdots\\
  \beta_k
  \end{pmatrix}
  =
  \begin{pmatrix}
  \alpha_1 \beta_1\\
  \vdots\\
  \alpha_1 \beta_k\\
  \alpha_2 \beta_1\\
  \vdots\\
  \alpha_2 \beta_k\\
  \vdots\\
  \alpha_m \beta_1\\
  \vdots\\
  \alpha_m \beta_k
  \end{pmatrix}
$$

This operation is sometimes referred to specifically as the *Kronecker product*, but for the purposes of this lesson there is little to be gained in distinguishing it from the tensor product.

The tensor product of two vectors has the important property that it is *bilinear*, which means that it is linear in each of the two arguments separately, assuming that the other argument is fixed.
This property can be expressed through these equations:

$$
  \begin{aligned}
    v \otimes (w_1 + w_2) & = v \otimes w_1 + v \otimes w_2\\[2mm]
    v \otimes (\alpha w) & = \alpha (v \otimes w)
  \end{aligned}
$$

and

$$
  \begin{aligned}
    (v_1 + v_2) \otimes w & = v_1 \otimes w + v_2 \otimes w\\[2mm]
    (\alpha v) \otimes w & = \alpha (v \otimes w)
  \end{aligned}
$$

Having defined independence between two systems in this way, we can now be more precise in defining a *correlation* as a *lack of independence*.
For example, the two bits in the probabilistic state represented by the vector

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
$$

are not independent — because the vector cannot be expressed as a tensor product, as was argued previously — and so they are correlated.
 
Once again, this description generalizes naturally to three or more systems.
If $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, then a probabilistic state of the combined system $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$ is a *product state* if the associated probability vector takes the form

$$
u = v_1\otimes \cdots \otimes v_n
$$

for probability vectors $v_1,\ldots,v_n$ describing probabilistic states of $\mathsf{X}_1,\ldots,\mathsf{X}_n$.
Here, the definition of the tensor product generalized in a natural way:

$$
(v_1\otimes \cdots \otimes v_n)(a_1,\ldots,a_n) = v_1(a_1) \cdots v_n(a_n)
$$

for all choices of $a_1\in\Sigma_1, \ldots, a_n\in\Sigma_n$.

Similar to the tensor product of just two vectors, the tensor product of three or more vectors is linear in each of the arguments, again assuming that the other arguments are fixed.
In this case, we say that the tensor product of three or more vectors is *mulitilinear*.

As we did in the case of two systems, we may say that the systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are *independent* when they are in a probabilistic state, but the term *mutually independent* is more precise:
there happen to be other notions of independence for three or more systems, such as *pairwise independence*, that we will not be concerned with at this time.

As an important aside, we observe the following expression for tensor products of standard basis vectors:

$$
\vert a \rangle \otimes \vert b \rangle = \vert a,b \rangle
$$ 

(where we used the typical convention of dropping unuseful parentheses, rather than writing $\vert (a,b)\rangle$).
Alternatively, using the notation of strings, we have 

$$
\vert a \rangle \otimes \vert b \rangle = \vert ab \rangle.
$$

More generally, for any positive integer $n$ and any classical states $a_1,\ldots,a_n$, we have

$$
\vert a_1 \rangle \otimes \cdots \otimes \vert a_n \rangle = \vert a_1,\ldots,a_n \rangle = \vert a_1 \cdots a_n \rangle.
$$

One final remark on tensor products and the Dirac notation is that it is common that the tensor product symbol $\otimes$ is omitted when taking the tensor product of vectors written as kets.
For example, we often write $\vert a\rangle \vert b \rangle$ and $\vert a_1 \rangle \cdots \vert a_n \rangle$ rather than $\vert a \rangle \otimes \vert b \rangle$ and $\vert a_1 \rangle \otimes \cdots \otimes \vert a_n \rangle$, respectively.
This convention captures the idea that the tensor product is, in some sense, the most natural or default way to take the product of two vectors.

### 1.3 Measurements of probabilistic states <a id='multiple-systems-probabilistic-measurement'></a>

Now let us move on to measurements of probabilistic states of multiple systems.
We find that by choosing to view multiple systems together as single systems, we obtain a specification of how measurements must work for multiple systems — assuming that *all* of the systems are measured.

For example, if the probabilistic state of two bits $(\mathsf{X},\mathsf{Y})$ is described by the probability vector

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
$$

then the outcome $(0,0)$ is obtained with probability 1/2 and $(1,1)$ is obtained with probability 1/2, and in each case we update the probability vector description of our knowledge accordingly (so that the probabilistic state becomes $|00\rangle$ or $|11\rangle$, respectively).

Suppose, however, that we choose not to measure *every* system, but instead we just measure some *proper subset* of the systems.
This will result in a measurement outcome for each measurement that is performed, and will also, in general, affect our knowledge of the remaining systems.

#### Partial measurements for two systems

Beginning with two systems, let us suppose (as usual) that $\mathsf{X}$ is a system having classical state set $\Sigma$, $\mathsf{Y}$ is a system having classical state set $\Gamma$, and the two systems $(\mathsf{X},\mathsf{Y})$ together are in some probabilistic state.
We will consider what happens when we just measure $\mathsf{X}$ and do nothing to $\mathsf{Y}$.

First, we know that the probability to observe a particular classical state $a\in\Sigma$ when just $\mathsf{X}$ is measured must be consistent with the probabilities we would obtain had $\mathsf{Y}$ also been measured.
That is, we must have

$$
  \operatorname{Pr}(\mathsf{X} = a) 
  = \sum_{b\in\Gamma} \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (a,b) \bigr).
$$

This is the formula for the so-called *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ alone.

This formula makes perfect sense at an intuitive level, in the sense that something very strange would have to happen for it to be false: it would mean that the probabilities of obtaining different outcomes when $\mathsf{X}$ is measured could somehow be influenced simply by whether or not $\mathsf{Y}$ was also measured.
If $\mathsf{Y}$ happened to be in a distant location, for instance, this would allow for superluminal signaling, which we immediately reject based on our understanding of physics.

Now, given the assumption that only $\mathsf{X}$ has been measured and $\mathsf{Y}$ has not, there may in general still exist uncertainty over the classical state of $\mathsf{Y}$.
For this reason, rather than updating our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert a,b\rangle$ for some selection of $a\in\Sigma$ and $b\in\Gamma$, we must update our description so that this uncertainty about $\mathsf{Y}$ is properly reflected.
The following *conditional probability* formula reflects this uncertainty:

$$
\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)
= \frac{\operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (a,b)\bigr)}{\operatorname{Pr}(\mathsf{X} = a)}.
$$

Here, the expression $\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)$ denotes the probability that $\mathsf{Y} = b$ *conditioned* on (or *given* that) $\mathsf{X} = a$.
Note that this expression is only defined if $\operatorname{Pr}(\mathsf{X}=a)$ is nonzero:
if $\operatorname{Pr}(\mathsf{X}=a) = 0$, we obtain the indeterminate form $\frac{0}{0}$.
This is not a problem because if $\operatorname{Pr}(\mathsf{X}=a) = 0$, then we will never observe $a$ as an outcome of a measurement of $\mathsf{X}$, so we need not be concerned with this possibility.

To express these formulas in terms of probability vectors, let us assume that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector $u$, whose indices have been placed in correspondence with the Cartesian product $\Sigma\times\Gamma$.
Measuring just the system $\mathsf{X}$ alone yields each possible outcome with probabilities as follows:

$$
v(a) = \operatorname{Pr}(\mathsf{X} = a) = \sum_{c\in\Gamma} u(a,c).
$$

As was already suggested, the probability vector $v$ defined in this way represents the *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ by itself.
Having obtained a particular outcome $a\in\Sigma$ of the measurement of $\mathsf{X}$, the probabilistic state of $\mathsf{Y}$ is updated according to the formula for conditional probabilities:

$$
w_a(b) = \frac{u(a,b)}{\sum_{c\in\Gamma} u(a,c)}.
$$

In the event that the measurement of $\mathsf{X}$ resulted in the classical state $a$, we therefore update our description of the probabilistic state of the joint system $(\mathsf{X},\mathsf{Y})$ to $\vert a\rangle \otimes w_a$.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are bits in the probabilistic state

$$
  u = 
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix},
$$

then a measurement of just the bit $\mathsf{X}$ results in the outcomes 0 and 1 with probabilities as follows:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{X} = 0) 
    & = u(0,0) + u(0,1) = \frac{1}{2} + 0 = \frac{1}{2},\\[2mm]
    \operatorname{Pr}(\mathsf{X} = 1) 
    & = u(1,0) + u(1,1) = 0 + \frac{1}{2} = \frac{1}{2}.
  \end{aligned}
$$

If the measurement outcome is 0, then the resulting probabilistic state $w_0$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_0(0) & = \frac{u(0,0)}{u(0,0) + u(0,1)} = \frac{\frac{1}{2}}{\frac{1}{2}} = 1\\[2mm]
    w_0(1) & = \frac{u(0,1)}{u(0,0) + u(0,1)} = \frac{0}{\frac{1}{2}} = 0.
  \end{aligned}
$$

That is, we have $w_0 = \vert 0 \rangle$.
Through a similar calculation, if the outcome of the measurement of $\mathsf{X}$ is 1, the resulting probabilistic state $w_1$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_1(0) & = \frac{u(1,0)}{u(1,0) + u(1,1)} = \frac{0}{\frac{1}{2}} = 0\\[2mm]
    w_1(1) & = \frac{u(1,1)}{u(1,0) + u(1,1)} = \frac{\frac{1}{2}}{\frac{1}{2}} = 1,
  \end{aligned}
$$

and so $w_1 = \vert 1 \rangle$.

Thus, for this particular example, there is no uncertainty remaining about $\mathsf{Y}$ when $\mathsf{X}$ is measured: if we obtain the outcome 0, we update our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert 0 \rangle \otimes \vert 0 \rangle = \vert 00\rangle$, and if we obtain the outcome 1, we update our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert 1 \rangle \otimes \vert 1 \rangle = \vert 11\rangle$.

On the other hand, if $\mathsf{X}$ and $\mathsf{Y}$ are bits in the probabilistic state

$$
  u = 
  \begin{pmatrix}
    \frac{1}{6}\\[2mm]
    \frac{1}{12}\\[2mm]
    \frac{1}{2}\\[2mm]
    \frac{1}{4}
  \end{pmatrix},
$$

then a measurement of just the bit $\mathsf{X}$ results in the outcomes 0 and 1 with probabilities as follows:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{X} = 0) 
    & = u(0,0) + u(0,1) = \frac{1}{6} + \frac{1}{12} = \frac{1}{4} \\[2mm]
    \operatorname{Pr}(\mathsf{X} = 1) 
    & = u(1,0) + u(1,1) = \frac{1}{2} + \frac{1}{4} = \frac{3}{4}.
  \end{aligned}
$$

If the measurement outcome is 0, then the resulting probabilistic state $w_0$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_0(0) & = \frac{u(0,0)}{u(0,0) + u(0,1)} 
    = \frac{\frac{1}{6}}{\frac{1}{6} + \frac{1}{12}} = \frac{2}{3} \\[2mm]
    w_0(1) & = \frac{u(0,1)}{u(0,0) + u(0,1)} 
    = \frac{\frac{1}{12}}{\frac{1}{6} + \frac{1}{12}} = \frac{1}{3},
  \end{aligned}
$$

which is to say that

$$
w_0 = \begin{pmatrix}
\frac{2}{3} \\[2mm]
\frac{1}{3}
\end{pmatrix}.
$$

Through a similar calculation, if the outcome of the measurement of $\mathsf{X}$ is 1, the resulting probabilistic state $w_1$ of $\mathsf{Y}$ is given by

$$
  \begin{aligned}
    w_1(0) & = \frac{u(1,0)}{u(1,0) + u(1,1)} 
    = \frac{\frac{1}{2}}{\frac{1}{2} + \frac{1}{4}} = \frac{2}{3}\\[2mm]
    w_1(1) & = \frac{u(1,1)}{u(1,0) + u(1,1)} 
    = \frac{\frac{1}{4}}{\frac{1}{2} +\frac{1}{4}} = \frac{1}{3},
  \end{aligned}
$$

which is to say that

$$
  w_1 = 
  \begin{pmatrix}
    \frac{2}{3} \\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

This is not a surprise.
Recall that $\mathsf{X}$ and $\mathsf{Y}$ are independent in this example: we have 

$$
u = 
  \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix},
$$

and so naturally the probabilities for the two possible outcomes of the measurement of $\mathsf{X}$ are described by the probability vector

$$
  \begin{pmatrix}
    \frac{1}{4}\\[2mm]
    \frac{3}{4}
  \end{pmatrix}
$$

as we have calculated, and in either case the resulting probabilistic state of $\mathsf{Y}$ is described by the probability vector

$$
  \begin{pmatrix}
    \frac{2}{3}\\[2mm]
    \frac{1}{3}
  \end{pmatrix}.
$$

That is, knowing that $\mathsf{X}$ and $\mathsf{Y}$ are independent in this example, we did not really need to go through the trouble of performing the calculations above — but doing so served as both an example and a reality check.

The sorts of calculations just described, where the probabilistic state of one system conditioned on another system taking a particular state, can be performed directly using the Dirac notation.
To illustrate how the method works, let us consider a new example where the classical state set of $\mathsf{X}$ is $\Sigma = \{1,2,3\}$, the classical state set of $\mathsf{Y}$ is $\Gamma = \{0,1\}$, and the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is

$$
  u = \frac{1}{2}  \vert 1,0 \rangle
    + \frac{1}{12} \vert 1,1 \rangle
    + \frac{1}{6}  \vert 2,1 \rangle
    + \frac{1}{12} \vert 3,0 \rangle
    + \frac{1}{6}  \vert 3,1 \rangle,
$$

which we may alternatively write as a column vector

$$
  u = 
  \begin{pmatrix}
    \frac{1}{2}\\
    \frac{1}{12}\\
    0\\
    \frac{1}{6}\\
    \frac{1}{12}\\
    \frac{1}{6}
  \end{pmatrix}.
$$

This time let us suppose that the *second* system $\mathsf{Y}$ is measured.
Our goal will be to determine the probabilities of the two possible outcomes (0 and 1), and to calculate what the resulting probabilistic state of $\mathsf{X}$ is for the two outcomes.

Using the bilinearity of the tensor product, and specifically the fact that it is linear in the *first* argument, we may rewrite the vector $u$ as follows:

$$
  u = \biggl( \frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle\biggr)
  \otimes \vert 0\rangle
  + \biggl( \frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle 
  + \frac{1}{6} \vert 3 \rangle\biggr) \otimes \vert 1\rangle.
$$

What we have done is to isolated the distinct standard basis vectors for the system being measured (which in this example is the second system $\mathsf{Y}$), collecting all of the terms for the first system as is required to do this.
A moment's thought reveals that this is always possible, regardless of what vector we started with.

The probabilities for the two outcomes when $\mathsf{Y}$ is measured are now easily inferred:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{Y} = 0) & = \frac{1}{2} + \frac{1}{12} = \frac{7}{12}\\[2mm]
    \operatorname{Pr}(\mathsf{Y} = 1) & = \frac{1}{12} + \frac{1}{6} + \frac{1}{6} 
    = \frac{5}{12}.
  \end{aligned}
$$

Moreover, the probabilistic state of $\mathsf{X}$, conditioned on each possible outcome, can also be quickly inferred by simply *normalizing* the vectors in parentheses by dividing by the associated probability just calculated, so that these vectors become probability vectors.
That is, conditioned on the measurement of $\mathsf{Y}$ being 0, the probabilistic state of $\mathsf{X}$ becomes

$$
 \frac{\frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle}{\frac{7}{12}}
 = \frac{6}{7} \vert 1 \rangle + \frac{1}{7} \vert 3 \rangle,
$$

and conditioned on the measurement of $\mathsf{Y}$ being 1, the probabilistic state of
$\mathsf{X}$ becomes

$$
  \frac{\frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle 
  + \frac{1}{6} \vert 3 \rangle}{\frac{5}{12}}
  = \frac{1}{5} \vert 1 \rangle + \frac{2}{5} \vert 2 \rangle + \frac{2}{5} \vert 3 \rangle.
$$

#### Partial measurements for three or more systems

The preceding discussion and technique can be generalized to three or more systems by grouping the systems into two categories: those systems that are measured and those that are not.
This effectively reduces partial measurements for three or more systems to the case where one of two systems is measured.

For example, here is a probabilistic state of 5 systems $\mathsf{X}_1,\ldots,\mathsf{X}_5$, all sharing the same classical state set $\{\clubsuit,\diamondsuit,\heartsuit,\spadesuit\}$:

$$
\begin{gathered}
\frac{1}{7} 
\vert\heartsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\spadesuit\rangle
+
\frac{2}{7} 
\vert\diamondsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\clubsuit\rangle
+
\frac{1}{7} 
\vert\spadesuit\rangle \vert\spadesuit\rangle \vert\diamondsuit\rangle \vert\diamondsuit\rangle \vert\clubsuit\rangle
\\
+
\frac{2}{7} 
\vert\heartsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\heartsuit\rangle \vert\heartsuit\rangle
+
\frac{1}{7} 
\vert\spadesuit\rangle \vert\heartsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\clubsuit\rangle.
\end{gathered}
$$

In this example, we've omitted the tensor product symbols; they're implicit between the kets, as was suggested above. We will consider the situation in which the first and third systems are measured, and the remaining systems are left alone. 

Conceptually speaking, we can simply imagine that the first and third systems form a single compound system that gets measured, while the remaining systems form a second compound system that is not measured, and then follow the prescription described previously for determining the probabilities for the different outcomes as well as the probabilistic states of the remaining systems conditioned on each possible output.

Unfortunately, given that the systems that are measured are interspersed with the ones that are not, we face a hurtle in writing down the expressions needed to perform these calculations.
One way to proceed is to subscript the kets to indicate which systems they refer to, and to give ourselves the freedom to change their ordering, as we will now describe.

First, the probability vector above can alternatively be written as

$$
\begin{gathered}
\frac{1}{7} 
\vert\heartsuit\rangle_1 \vert\clubsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+
\frac{2}{7} 
\vert\diamondsuit\rangle_1 \vert\clubsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5
+
\frac{1}{7} 
\vert\spadesuit\rangle_1 \vert\spadesuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5\\
+
\frac{2}{7} 
\vert\heartsuit\rangle_1 \vert\clubsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
+
\frac{1}{7} 
\vert\spadesuit\rangle_1 \vert\heartsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5.
\end{gathered}
$$

Nothing here has changed except that each ket now has a subscript indicating which system it corresponds to.
Here we have used the subscripts $1,\ldots,5$, but the names of the systems themselves could also be used (in a situation where we have system names such as $\mathsf{X}$, $\mathsf{Y}$, and $\mathsf{Z}$, for instance).

We can then re-ordered the kets and collect terms as follows:

$$
\begin{aligned}
& 
\frac{1}{7} 
\vert\heartsuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+
\frac{2}{7} 
\vert\diamondsuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5
+
\frac{1}{7} 
\vert\spadesuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\spadesuit\rangle_2 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5 \\
& \quad +
\frac{2}{7} 
\vert\heartsuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
+
\frac{1}{7} 
\vert\spadesuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\heartsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5\\[2mm]
& \hspace{1.5cm} = \vert\heartsuit\rangle_1 \vert\diamondsuit\rangle_3 
\biggl(
\frac{1}{7} \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+ \frac{2}{7} \vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
\biggr)
+ \vert\diamondsuit\rangle_1 \vert\diamondsuit\rangle_3 
\biggl(
\frac{2}{7} \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5 
\biggr)\\
& \hspace{1.5cm} \quad + \vert\spadesuit\rangle_1 \vert\diamondsuit\rangle_3
\biggl(
\frac{1}{7} \vert\spadesuit\rangle_2 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5
+ \frac{1}{7} \vert\heartsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5\biggr).
\end{aligned}
$$

(The tensor products are still implicit, even when parentheses are used as in this example.)

We now see that if the systems $\mathsf{X}_1$ and $\mathsf{X}_3$ are measured, the (nonzero) probabilities of the different outcomes are as follow:

$$
\begin{aligned}
\operatorname{Pr}\bigl((\mathsf{X}_1,\mathsf{X}_3) = (\heartsuit,\diamondsuit)\bigr)
& = \frac{1}{7} + \frac{2}{7} = \frac{3}{7}\\[2mm]
\operatorname{Pr}\bigl((\mathsf{X}_1,\mathsf{X}_3) = (\diamondsuit,\diamondsuit)\bigr)
& = \frac{2}{7}\\[2mm]
\operatorname{Pr}\bigl((\mathsf{X}_1,\mathsf{X}_3) = (\spadesuit,\diamondsuit)\bigr)
& = \frac{1}{7} + \frac{1}{7} = \frac{2}{7}.
\end{aligned}
$$

Conditioned on the event that $(\mathsf{X}_1,\mathsf{X}_3) = (\heartsuit,\diamondsuit)$, for instance, we have that the remaining systems $\mathsf{X}_2$, $\mathsf{X}_4$, and $\mathsf{X}_5$ are in the probabilistic state

$$
\frac{
\frac{1}{7} 
\vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+ 
\frac{2}{7} 
\vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5}
{\frac{3}{7}}
= 
\frac{1}{3} 
\vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+ 
\frac{2}{3} 
\vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5.
$$

The probabilistic state of $\mathsf{X}_2$, $\mathsf{X}_4$, and $\mathsf{X}_5$ for the other possible measurement outcomes can be determined in an analogous way.

Now, it must be understood that the tensor product is not commutative: if $u$ and $v$ are vectors, then $u\otimes v$ is different from $v\otimes u$ in general.
For instance, 
$\vert\heartsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\spadesuit\rangle$
is a different vector than 
$\vert\heartsuit\rangle \vert\diamondsuit\rangle \vert\clubsuit\rangle \vert\spadesuit\rangle \vert\spadesuit\rangle$.
The technique just described of re-ordering kets should not be interpreted as suggesting otherwise.
Rather, for the sake of performing calculations and expressing the results, we are simply making a decision that it is more convenient to collect the systems $\mathsf{X}_1,\ldots,\mathsf{X}_5$ together as $(\mathsf{X}_1,\mathsf{X}_3,\mathsf{X}_2,\mathsf{X}_4,\mathsf{X}_5)$ rather than $(\mathsf{X}_1,\mathsf{X}_2,\mathsf{X}_3,\mathsf{X}_4,\mathsf{X}_5)$.
The subscripts on the kets serve to keep the order straight.

Analogously, in the closely related but simpler setting of Cartesian products and ordered pairs, if $a$ and $b$ are different classical states, then $(a,b)$ and $(b,a)$ are also different.
Nevertheless, saying that the classical state of two bits $(\mathsf{X},\mathsf{Y})$ is $(1,0)$ is equivalent to
saying that the classical state of $(\mathsf{Y},\mathsf{X})$ is $(0,1)$; when every system has its own unique name, it doesn't really matter what order we choose to list them, so long as the ordering is made clear.


### 1.4 Operations on probabilistic states <a id='multiple-systems-probabilistic-operations'></a>

To conclude this discussion of classical information for multiple systems, we will consider operations on multiple systems that are in probabilistic states.
Similar to measurements, we can view multiple systems collectively as forming single, compound systems and look to the previous lesson on single systems to see how this works.

Returning to the typical set-up where we have two systems $\mathsf{X}$ and $\mathsf{Y}$ having classical state sets $\Sigma$ and $\Gamma$, for instance, we can consider classical operations on the joint system $(\mathsf{X},\mathsf{Y})$.
Based on the previous lesson and the discussion above, we conclude that any such operation is represented by a stochastic matrix whose rows and columns are indexed by the Cartesian product $\Sigma\times\Gamma$.

For example, suppose that $\mathsf{X}$ and $\mathsf{Y}$ are bits, and consider an operation with the following description:

<p style="padding-left: 5em; padding-right: 5em;">
   If $\mathsf{X} = 1$, then perform a NOT operation on 
   $\mathsf{Y}$, otherwise do nothing.
</p>

This is a deterministic operation known as a *controlled-NOT* operation, where $\mathsf{X}$ is the *control* bit that determines whether or not a NOT operation should or should not be applied to the *target* bit $\mathsf{Y}$.
Here is the matrix representation of this operation:

$$
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 1 & 0 & 0\\[2mm]
0 & 0 & 0 & 1\\[2mm]
0 & 0 & 1 & 0
\end{pmatrix}
$$

Its action on standard basis states is as follows:

$$
\begin{aligned}
\vert 00 \rangle & \mapsto \vert 00 \rangle\\
\vert 01 \rangle & \mapsto \vert 01 \rangle\\
\vert 10 \rangle & \mapsto \vert 11 \rangle\\
\vert 11 \rangle & \mapsto \vert 10 \rangle
\end{aligned}
$$

Another example is the operation having this description:

<p style="padding-left: 5em; padding-right: 5em;">
    With probability 1/2, set $\mathsf{Y}$ to be equal to $\mathsf{X}$, 
    otherwise do nothing.
</p>

The matrix representation of this operation is as follows:

$$
\begin{pmatrix}
1 & \frac{1}{2} & 0 & 0\\[2mm]
0 & \frac{1}{2} & 0 & 0\\[2mm]
0 & 0 & \frac{1}{2} & 0\\[2mm]
0 & 0 & \frac{1}{2} & 1
\end{pmatrix}
=
\frac{1}{2}
\begin{pmatrix}
1 & 1 & 0 & 0\\[2mm]
0 & 0 & 0 & 0\\[2mm]
0 & 0 & 0 & 0\\[2mm]
0 & 0 & 1 & 1
\end{pmatrix}
+
\frac{1}{2}
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 1 & 0 & 0\\[2mm]
0 & 0 & 1 & 0\\[2mm]
0 & 0 & 0 & 1
\end{pmatrix}
$$

The action of this operation on standard basis vectors is as follows:
$$
\begin{aligned}
\vert 00 \rangle & \mapsto \vert 00 \rangle\\[1mm]
\vert 01 \rangle & \mapsto \frac{1}{2} \vert 00 \rangle + \frac{1}{2}\vert 01\rangle\\[1mm]
\vert 10 \rangle & \mapsto \frac{1}{2} \vert 11 \rangle + \frac{1}{2}\vert 10\rangle\\[1mm]
\vert 11 \rangle & \mapsto \vert 10 \rangle
\end{aligned}
$$


In both examples, we are simply viewing two systems together as a single system and proceeding as in the previous lesson.

The same thing can be done for any number of systems.
(*** Example of addition modulo $8$ for 3 bits. Explain that there are several ways to write this.)

$$
\begin{pmatrix}
  0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
  1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
  0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
  0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
  0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
  0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
  0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
  0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
  0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0
\end{pmatrix}
$$

$$
\begin{aligned}
  & \vert 001 \rangle \langle 000 \vert
    + \vert 010 \rangle \langle 001 \vert
    + \vert 011 \rangle \langle 010 \vert
    + \vert 100 \rangle \langle 011 \vert\\[1mm]
  & \quad + \vert 101 \rangle \langle 100 \vert
    + \vert 110 \rangle \langle 101 \vert
    + \vert 111 \rangle \langle 110 \vert
    + \vert 000 \rangle \langle 111 \vert
\end{aligned}
$$

$$
\sum_{k = 0}^{7} \vert (k+1) \bmod 8 \rangle \langle k \vert
$$


#### Independent operations

Now suppose that we have multiple systems and we perform separate operations on these separate systems.

For example, taking our usual set-up of two systems $\mathsf{X}$ and $\mathsf{Y}$ having classical state sets $\Sigma$ and $\Gamma$, respectively, let us suppose that we perform one operation on $\mathsf{X}$ and, completely independently, another operation on $\mathsf{Y}$.
As we know from the previous lesson, these operations are represented by stochastic matrices — and to be precise, let us say that the operation on $\mathsf{X}$ is represented by the matrix $M$ and the operation on $\mathsf{Y}$ is represented by the matrix $N$.
Thus, the rows and columns of $M$ correspond to the elements of $\Sigma$ and the rows and columns of $N$ correspond to the elements of $\Gamma$.
A natural question to ask is this: if we view $\mathsf{X}$ and $\mathsf{Y}$ together as a single, compound system $(\mathsf{X},\mathsf{Y})$, what is the matrix representation of the combined action of the two operations on this compound system?

The answer to this question is that the combined action is represented by the tensor product $M\otimes N$.
(Tensor products represent *independence*, this time between operations.)
Here the tensor product is between two matrices rather than two vectors, but the definition is analogous: the matrix $M\otimes N$ has rows and columns indexed by the Cartesian product $\Sigma\times\Gamma$, and is defined as follows:

$$
(M\otimes N)((a,b),(c,d)) = M(a,c) N(b,d).
$$

An alternative, but equivalent, way to describe $M\otimes N$ is that it is the unique matrix that satisfies the equation

$$
(M \otimes N) (v\otimes w) = (M v) \otimes (N w)
$$

for every possible choice of vectors $v$ and $w$, where the entries of $v$ correspond to the elements of $\Sigma$ and the entries of $w$ correspond to $\Gamma$.

(*** Explain why the two ways of thinking about tensor products of matrices are equivalent? Could also include an exercise to this effect.)

Following the convention described previously for ordering the elements of Cartesian products, we can write the tensor product of two matrices explicitly as follows:

$$
\begin{gathered}
  \begin{pmatrix}
    \alpha_{1,1} & \cdots & \alpha_{1,m} \\
    \vdots & \ddots & \vdots \\
    \alpha_{m,1} & \cdots & \alpha_{m,m}
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    \beta_{1,1} & \cdots & \beta_{1,k} \\
    \vdots & \ddots & \vdots\\
    \beta_{k,1} & \cdots & \beta_{k,k}
  \end{pmatrix}
  \hspace{6cm}\\[2mm]
  \hspace{1cm}
 =
  \begin{pmatrix}
    \alpha_{1,1}\beta_{1,1} & \cdots & \alpha_{1,1}\beta_{1,k} & & 
    \alpha_{1,m}\beta_{1,1} & \cdots & \alpha_{1,m}\beta_{1,k} \\
    \vdots & \ddots & \vdots & \hspace{2mm}\cdots\hspace{2mm} & \vdots & \ddots & \vdots \\
    \alpha_{1,1}\beta_{k,1} & \cdots & \alpha_{1,1}\beta_{k,k} & & 
    \alpha_{1,m}\beta_{k,1} & \cdots & \alpha_{1,m}\beta_{k,k} \\[2mm]
    & \vdots & & \ddots & & \vdots & \\[2mm]
    \alpha_{m,1}\beta_{1,1} & \cdots & \alpha_{m,1}\beta_{1,k} & & 
    \alpha_{m,m}\beta_{1,1} & \cdots & \alpha_{m,m}\beta_{1,k} \\
    \vdots & \ddots & \vdots & \hspace{2mm}\cdots\hspace{2mm} & \vdots & \ddots & \vdots \\
    \alpha_{m,1}\beta_{k,1} & \cdots & \alpha_{m,1}\beta_{k,k} & & 
    \alpha_{m,m}\beta_{k,1} & \cdots & \alpha_{m,m}\beta_{k,k}
  \end{pmatrix}
\end{gathered}
$$

For example, let us recall the probabilistic operation on a single bit from the previous lesson:
if the classical state of the bit is 0, it is left alone; and if the classical state of the bit is 1, it is flipped to 0 with probability $1/2$.
As we observed, this operation is represented by the matrix

$$
  \begin{pmatrix}
    1 & \frac{1}{2}\\[1mm]
    0 & \frac{1}{2}
  \end{pmatrix},
$$

If this operation is performed on a bit $\mathsf{X}$, and a NOT operation is (independently) performed on a second bit $\mathsf{Y}$, then the joint operation on the compound system $(\mathsf{X},\mathsf{Y})$ has the matrix representation

$$
  \begin{pmatrix}
    1 & \frac{1}{2}\\[1mm]
    0 & \frac{1}{2}
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    0 & 1\\[1mm]
    1 & 0
  \end{pmatrix}
  =
  \begin{pmatrix}
    0 & 1 & 0 & \frac{1}{2} \\[1mm]
    1 & 0 & \frac{1}{2} & 0 \\[1mm]
    0 & 0 & 0 & \frac{1}{2} \\[1mm]
    0 & 0 & \frac{1}{2} & 0
  \end{pmatrix}.
$$

By inspection, we see that this is a stochastic matrix.
This will always be the case: the tensor product of two or more stochastic matrices is always stochastic.

A common situation that we encounter is one in which one operation is performed on one system and *nothing* is done to another.
In such a case, exactly the same prescription is followed, noting that *doing nothing* is represented by the identity matrix.
For example, resetting the bit $\mathsf{X}$ to the 0 state and doing nothing to $\mathsf{Y}$ yields the probabilistic (and in fact deterministic) operation on $(\mathsf{X},\mathsf{Y})$ represented by the matrix

$$
  \begin{pmatrix}
    1 & 1\\[1mm]
    0 & 0
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    1 & 0\\[1mm]
    0 & 1
  \end{pmatrix}
  =
  \begin{pmatrix}
    1 & 0 & 1 & 0 \\[1mm]
    0 & 1 & 0 & 1 \\[1mm]
    0 & 0 & 0 & 0 \\[1mm]
    0 & 0 & 0 & 0
  \end{pmatrix}.
$$

## 2. Quantum information <a id='multiple-systems-quantum-info'></a>

We are now prepared to move on to quantum information in the setting of multiple systems.
Much like the previous lesson on single systems, the mathematical description of quantum information for multiple systems is quite similar to the probabilistic case and makes use of similar concepts and techniques.

### 2.1 Quantum states <a id='multiple-systems-quantum-states'></a>

Multiple systems can be viewed collectively as single, compound systems;
we already observed this in the probabilistic setting and the quantum setting is analogous.
Quantum states of multiple systems are therefore represented by column vectors having complex number entries and Euclidean norm equal to 1 — just like quantum states of single systems — but this time the indices of the quantum state vectors are placed in correspondence with the Cartesian product of the classical state sets associated with each of the individual systems.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are qubits, so that their classical state sets are both equal to the binary alphabet $\{0,1\}$, then the classical state set of the pair of qubits $(\mathsf{X},\mathsf{Y})$, viewed collectively as a single system, is given by the Cartesian product $\{0,1\}\times\{0,1\}$ — and by representing pairs of binary values as binary strings of length 2, we may associate this Cartesian product set with the set 
$\{00,01,10,11\}$.
The following vectors, expressed in Dirac notation, are therefore all examples of quantum state vectors of the pair $(\mathsf{X},\mathsf{Y})$:

$$
 \frac{1}{2} \vert 00 \rangle
 + \frac{i}{2} \vert 01\rangle
 - \frac{1}{2} \vert 10\rangle
 - \frac{i}{2} \vert 11\rangle, \quad
 \frac{1}{\sqrt{2}} \vert 00\rangle 
   + \frac{1}{\sqrt{2}} \vert 11\rangle, \quad \text{and} \quad
 \vert 01 \rangle.
$$


#### Tensor products of quantum state vectors

Similar to the probabilistic case, tensor products of quantum state vectors are also quantum state vectors.

Suppose first that $v$ is a quantum state vector of a system $\mathsf{X}$ having classical state set $\Sigma$
and $w$ is a quantum state vector of a system $\mathsf{Y}$ having classical state sets $\Gamma$.
The indices of the vector $v$ therefore correspond to the elements of $\Sigma$ while the indices of $w$ correspond to
$\Gamma$.
The tensor product $v\otimes w$ is then a quantum state vector of the joint system $(\mathsf{X},\mathsf{Y})$.

As in the probabilistic setting, we refer to a state of this form as a *product state*.
Once again, it represents a state of *independence* between the systems $\mathsf{X}$ and $\mathsf{Y}$.
Intuitively speaking, we may think of the systems $(\mathsf{X},\mathsf{Y})$ being in a product state $v\otimes w$ as if $\mathsf{X}$ is in the quantum state $v$, $\mathsf{Y}$ is in the quantum state $w$, and the states of the two systems have nothing to do with one another.

The fact that the tensor product vector $v\otimes w$ is indeed a quantum state vector is consistent with the Euclidean norm being *multiplicative* with respect to tensor products:

$$
\begin{aligned}
  \| v\otimes w \| 
  & = \sqrt{ \sum_{(a,b)\in\Sigma\times\Gamma} \vert (v\otimes w)(a,b) \vert^2} \\[1mm]
  & = \sqrt{ \sum_{a\in\Sigma} \sum_{b\in\Gamma} \vert v(a) w(b) \vert^2} \\[1mm]
  & = \sqrt{ \biggl(\sum_{a\in\Sigma} \vert v(a) \vert^2\biggr)\biggl(\sum_{b\in\Gamma} \vert w(b) \vert^2\biggr)} 
  \\[1mm]
  & = \| v\| \| w\|.
\end{aligned}
$$

Thus, because $v$ and $w$ are quantum state vectors, we have $\|v\| = 1$ and $\|w\| = 1$, and therefore
$\|v\otimes w\| = 1$, so $v\otimes w$ is also a quantum state vector.

More generally, if $v_1,\ldots,v_n$ are quantum state vectors of systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$, then
$v_1\otimes\cdots\otimes v_n$ is a quantum state vector representing a *product state* of the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$.
Again, we know that $v_1\otimes\cdots\otimes v_n$ must be a quantum state vector because

$$
\| v_1\otimes\cdots\otimes v_n \| = \|v_1\| \cdots \|v_n\| = 1^n = 1.
$$

#### Entangled systems

Just like in the probabilistic setting, not all quantum state vectors of two or more systems, considered as a single joint system, are product states.
For example, the quantum state vector

$$
  \frac{1}{\sqrt{2}} \vert 00\rangle + \frac{1}{\sqrt{2}} \vert 11\rangle
$$

is not a product state.
To reason this, we may follow exactly the same argument that we used to prove that the probabilistic state represented by the vector

$$
  \frac{1}{2} \vert 00\rangle + \frac{1}{2} \vert 11\rangle, \label{eq:phi-plus} \tag{2.1}
$$

which is the same vector as in $(1.1)$ but expressed using the Dirac notation, is not a product vector.
That is, if it were the case that there were two qubit quantum state vectors $v$ and $w$ for which

$$
  v\otimes w = \frac{1}{\sqrt{2}} \vert 00\rangle + \frac{1}{\sqrt{2}} \vert 11\rangle,
$$

then it would necessarily be the case that $(v\otimes w)(0,1) = v(0) w(1) = 0$, implying that $v(0) = 0$ or $w(1) = 0$
(or both), contradicting the observation that $(v\otimes w)(0,0) = v(0) w(0)$ and $(v\otimes w)(1,1) = v(1)w(1)$ are both nonzero.
(Specifically, both quantities are $1/\sqrt{2}$ in this case, but what is important for the sake of the argument is that both quantities are nonzero.)

Thus, the quantum state vector $(2.1)$ represents a *correlation* between two systems, and specifically we say that the systems are *entangled*.
Entanglement is a quintessential feature of quantum information that will be discussed in much greater detail in subsequent lessons.
Entanglement can be complicated, particularly for the sorts of noisy quantum states that can be described in the general, density matrix formulation of quantum information that was mentioned in Lesson 1 — but for quantum state vectors in the simplified formulation that we are focusing on in this unit, entanglement is equivalent to correlation.
That is, any quantum state vector that is not a product vector represents an entangled state.

In contrast, the quantum state vector

$$
   \frac{1}{2} \vert 00 \rangle
 + \frac{i}{2} \vert 01\rangle
 - \frac{1}{2} \vert 10\rangle
 - \frac{i}{2} \vert 11\rangle,
$$

hand, which was the first example of a quantum state of two qubits mentioned above, is a product state:
we have

$$
  \biggl( \frac{1}{\sqrt{2}} \vert 0\rangle - \frac{1}{\sqrt{2}} \vert 1\rangle\biggr)
  \otimes 
  \biggl( \frac{1}{\sqrt{2}} \vert 0\rangle + \frac{i}{\sqrt{2}} \vert 1\rangle\biggr)
  = \frac{1}{2} \vert 00 \rangle
  + \frac{i}{2} \vert 01\rangle
  - \frac{1}{2} \vert 10\rangle
  - \frac{i}{2} \vert 11\rangle.
$$

#### Examples of multiple qubit states

Let's take a look at a few more examples of quantum states of multiple-qubit systems.

The following four two-qubit states, which are conventionally named $\vert \phi^+ \rangle$, $\vert \phi^- \rangle$, 
$\vert \psi^+ \rangle$, and $\vert \psi^- \rangle$, are known as the *Bell states*, named after John Bell. (*** Link info on John Bell.)

$$
\begin{aligned}
  \vert \phi^+ \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 0 \rangle 
                         + \frac{1}{\sqrt{2}} \vert 1\rangle \vert 1 \rangle \\[1mm]
  \vert \phi^- \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 0 \rangle 
                         - \frac{1}{\sqrt{2}} \vert 1\rangle \vert 1 \rangle \\[1mm]
  \vert \psi^+ \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 1 \rangle 
                         + \frac{1}{\sqrt{2}} \vert 1\rangle \vert 0 \rangle \\[1mm]
  \vert \psi^- \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 1 \rangle 
                         - \frac{1}{\sqrt{2}} \vert 1\rangle \vert 0 \rangle
\end{aligned}
$$

There are a few alternative ways to express these vectors.
For example, focusing on just the first state $\vert \phi^+\rangle$, we may use the fact that 
$\vert a\rangle \vert b\rangle = \vert ab\rangle$ (for any classical states $a$ and $b$) to instead write

$$
\vert \phi^+ \rangle = \frac{1}{\sqrt{2}} \vert 00 \rangle + \frac{1}{\sqrt{2}} \vert 11 \rangle.
$$

Thus, $\vert\phi^+\rangle$ is the same quantum state vector that we just encountered above, but now we see it as just one member of this very important collection.

Alternatively, we may choose to write the tensor product symbol explicitly like this:

$$
\vert \phi^+ \rangle 
= \frac{1}{\sqrt{2}} \vert 0\rangle\otimes\vert 0 \rangle + \frac{1}{\sqrt{2}} \vert 1\rangle\otimes \vert 1 \rangle.
$$

Presuming that $\vert \phi^+ \rangle$ is being viewed as a quantum state of two qubits named $\mathsf{X}$ and $\mathsf{Y}$, we may subscript the kets to indicate which ones correspond to each of these two qubits, like this:

$$
\vert \phi^+ \rangle 
= \frac{1}{\sqrt{2}} \vert 0\rangle_{\mathsf{X}} \vert 0 \rangle_{\mathsf{Y}} 
+ \frac{1}{\sqrt{2}} \vert 1\rangle_{\mathsf{X}} \vert 1 \rangle_{\mathsf{Y}}.
$$                         

Naturally, different names for these qubits could be chosen and used as subscripts in the same way.

Finally, following exactly the same convention discussed previously for ordering Cartesian products, we may write the vector $\vert\phi^+\rangle$ explicitly as a column vector:

$$
\vert \phi^+ \rangle = 
\begin{pmatrix}
  \frac{1}{\sqrt{2}}\\
  0\\
  0\\
  \frac{1}{\sqrt{2}}
\end{pmatrix}.
$$

Depending upon the context in which it appears, one of these expressions may be preferred — but they are all equivalent in the sense that they refer to the same vector.
Analogous expressions may be used for the other three Bell states.


Notice that the same argument that establishes that $\vert\phi^+\rangle$ is not a product state reveals that none of the other Bell states is a product state either — all four of the Bell states represent entanglement between two qubits.

The collection of all four Bell states 
$\{\vert \phi^+ \rangle, \vert \phi^- \rangle, \vert \psi^+ \rangle, \vert \psi^+ \rangle\}$ 
is known as the *Bell basis*; any quantum state vector of two qubits, or indeed any complex vector at all having entries corresponding to the four classical states of two bits, can be expressed as a linear combination of the four Bell states.
For example,

$$
\vert 0 \rangle \vert 0 \rangle
= \frac{1}{\sqrt{2}} \vert \phi^+\rangle + \frac{1}{\sqrt{2}} \vert \phi^-\rangle.
$$

(*** Possible exercise: express other states, such as $\vert 01\rangle$ and $\vert+\rangle \vert+\rangle$, as linear combinations of Bell states.)

One example of a quantum state vector of three qubits $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ is the *GHZ state*
(so named in honor of Daniel Greenberger, Michael Horne, and Anton Zeilinger, who first studied some of its properties):

$$
\frac{1}{\sqrt{2}} \vert 0\rangle \vert 0 \rangle \vert 0\rangle +
\frac{1}{\sqrt{2}} \vert 1\rangle \vert 1 \rangle \vert 1\rangle
$$

Another example of a three-qubit state is the W state:

$$
\frac{1}{\sqrt{3}} \vert 0\rangle \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 0\rangle \vert 1 \rangle \vert 0\rangle +
\frac{1}{\sqrt{3}} \vert 1\rangle \vert 0 \rangle \vert 0\rangle 
$$

Neither of these states is a product state, meaning that they cannot be written as a tensor product of three qubit quantum state vectors.
(*** Possible problem: ask readers to argue this. I don't see how to auto-grade this sort of question, through.)

We will examine these two states further when we discuss partial measurements of quantum states of multiple systems.

#### Additional examples

(*** Include a few more examples. Some possibilities follow.)

$$
\frac{3}{5} \vert 0\rangle \vert \heartsuit \rangle
- \frac{4i}{5} \vert 1\rangle \vert \spadesuit \rangle
$$

$$
  \begin{aligned}
      & \frac{1}{\sqrt{6}} \vert 0 \rangle \vert 1 \rangle \vert 2 \rangle
        - \frac{1}{\sqrt{6}} \vert 0 \rangle \vert 2 \rangle \vert 1 \rangle
        + \frac{1}{\sqrt{3}} \vert 1 \rangle \vert 2 \rangle \vert 0 \rangle \\
      & \quad - \frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0 \rangle \vert 2 \rangle
        + \frac{1}{\sqrt{3}} \vert 2 \rangle \vert 0 \rangle \vert 1 \rangle
        -  \frac{1}{\sqrt{3}} \vert 2 \rangle \vert 1 \rangle \vert 0 \rangle
  \end{aligned}
$$

### 2.2 Measurements of quantum states <a id='multiple-systems-quantum-measurements'></a>

Measurements — more specifically *standard basis measurements* — of quantum states of single systems were discussed in the previous lesson: if a system having classical state set $\Sigma$ is in a quantum state represented by the vector $u$, and that system is measured (with respect to a standard basis measurement), then each classical state $a\in\Sigma$ results with probability $\vert u(a)\vert^2$.

Just as we saw earlier in this lesson when we discussed measurements of probabilistic states of multiple systems, this tells us what happens when we have a quantum state of multiple systems and *every* system is measured.
To be precise, let us suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively.
We may then view $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$ collectively as a single system whose classical state set is the Cartesian product $\Sigma_1\times\cdots\times\Sigma_n$.
If the quantum state of this system is represented by the quantum state vector $u$, and every one of the systems is measured, then each $n$-tuple $(a_1,\ldots,a_n)\in\Sigma_1\times\cdots\times\Sigma_n$ is obtained with probability
$\vert u(a_1,\ldots,a_n)\vert^2$.

For example, if systems $\mathsf{X}$ and $\mathsf{Y}$ are jointly in the quantum state

$$
\frac{3}{5} \vert 0\rangle \vert \heartsuit \rangle
- \frac{4i}{5} \vert 1\rangle \vert \spadesuit \rangle,
$$

then measuring both systems with respect to a standard basis measurement yields the outcome $(0,\heartsuit)$ with probability $9/25$ and the outcome $(1,\spadesuit)$ with probability $16/25$.

#### Partial measurements for two systems

Now let us suppose that we have multiple systems that are jointly in a quantum state, and we measure a proper subset of the systems.

(*** Discuss an example with two systems.)
(*** Mention that marginal states don't work in the simplified description of quantum information: we need the density matrix formalism to do this.)





#### Partial measurements for three or more systems

(*** Go through a few examples, including a product state, the GHZ state, and the W state.)


$$
\frac{1}{\sqrt{3}} \vert 0\rangle \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 0\rangle \vert 1 \rangle \vert 0\rangle +
\frac{1}{\sqrt{3}} \vert 1\rangle \vert 0 \rangle \vert 0\rangle 
= \vert 0 \rangle \biggl(
\frac{1}{\sqrt{3}} \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0\rangle\biggr)
+ \vert 1 \rangle \biggl(\frac{1}{\sqrt{3}}\vert 0\rangle \vert 0\rangle\biggr)
$$

The probability that a measurement of the first qubit results in the outcome 0 is therefore equal to

$$
\biggl\| 
\frac{1}{\sqrt{3}} \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0\rangle
\biggr\|^2 = \frac{2}{3},
$$

and conditioned upon the measurement producing this outcome, the quantum state of the second and third qubits becomes

$$
  \frac{
    \frac{1}{\sqrt{3}} \vert 0 \rangle \vert 1\rangle +
    \frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0\rangle
  }{
    \sqrt{\frac{2}{3}}
  }
= \frac{1}{\sqrt{2}} \vert 0 \rangle \vert 1\rangle + \frac{1}{\sqrt{2}} \vert 1 \rangle \vert 0\rangle 
= \vert \psi^+\rangle.
$$



### 2.3 Operations on quantum states <a id='multiple-systems-quantum-operations'></a>

(*** Tensor products of unitary operations are unitary.)

$$
  (U_1 \otimes \cdots \otimes U_n)(V_1\otimes\cdots\otimes V_n)
  = (U_1 V_1) \otimes \cdots \otimes (U_n V_n)
$$

$$
  (M \otimes N)^{\dagger} = M^{\dagger} \otimes N^{\dagger}
$$

$$
\begin{aligned}
  & (U_1 \otimes \cdots \otimes U_n)^{\dagger} (U_1\otimes\cdots\otimes U_n) \\
  & \quad = (U_1^{\dagger} \otimes \cdots \otimes U_n^{\dagger}) (U_1\otimes\cdots\otimes U_n) \\
  & \quad = (U_1^{\dagger} U_1) \otimes \cdots \otimes (U_n^{\dagger} U_n)\\
  & \quad = \mathbb{1}_{\mathsf{X}_1} \otimes \cdots \otimes \mathbb{1}_{\mathsf{X}_n}\\
  & \quad = \mathbb{1}_{\mathsf{X}_1\cdots\mathsf{X}_n}
\end{aligned}
$$


$$
\begin{aligned}
  & (U \otimes V)^{\dagger} (U\otimes V) \\
  & \quad = (U^{\dagger} \otimes V^{\dagger}) (U\otimes V) \\
  & \quad = (U^{\dagger} U) \otimes (V^{\dagger} V)\\
  & \quad = \mathbb{1}_{\mathsf{X}} \otimes \mathbb{1}_{\mathsf{Y}}\\
  & \quad = \mathbb{1}_{\mathsf{XY}}
\end{aligned}
$$


$$
  (U \otimes V)^{\dagger} (U\otimes V) 
      = (U^{\dagger} \otimes V^{\dagger}) (U\otimes V)
      = (U^{\dagger} U) \otimes (V^{\dagger} V)
      = \mathbb{1}_{\mathsf{X}} \otimes \mathbb{1}_{\mathsf{Y}}
      = \mathbb{1}_{\mathsf{XY}}
$$

#### Important examples of operations on multiple qubits

Let us suppose that $\mathsf{X}$ is a qubit and $\mathsf{Y}$ is an arbitrary system: so $\mathsf{Y}$ could also be a qubit, a joint system of multiple qubits, or a completely different system such as one having 17 classical states.



 - Swap operation
   
$$
\operatorname{SWAP} \vert a \rangle \vert b \rangle = \vert b \rangle \vert a \rangle
$$

Written explicitly as a matrix, we have

$$
\operatorname{SWAP} =
\begin{pmatrix}
  1 & 0 & 0 & 0\\
  0 & 0 & 1 & 0\\
  0 & 1 & 0 & 0\\
  0 & 0 & 0 & 1
\end{pmatrix}
$$


 - Controlled unitary operations
   
 

For every unitary operation $U$ acting on the system $\mathsf{Y}$, a *controlled* $U$ operation is a unitary operation on the pair $(\mathsf{X},\mathsf{Y})$ defined as follows:

$$ 
  cU = 
  \vert 0\rangle \langle 0\vert \otimes \mathbb{1}_{\mathsf{Y}} + \vert 1\rangle \langle 1\vert \otimes U
$$

Written explicitly as a matrix, this operation is as follows:

$$
\begin{pmatrix}
  \mathbb{1}_{\mathsf{Y}} & 0\\
  0 & U
\end{pmatrix}
$$

(This expression is an example of a *block matrix*, which is essentially a matrix of matrices.
In this case...
It should be interpreted as the single matrix we obtain...
Explain.)

For example, if $\mathsf{Y}$ is a qubit and we write $Z = \sigma_z$ to denote the Pauli-z operation, then the controlled-$Z$ operation is given by

$$
  \mathrm{c}Z = 
  \vert 0\rangle \langle 0\vert \otimes \mathbb{1}_{\mathsf{Y}} + \vert 1\rangle \langle 1\vert \otimes Z
  = 
  \begin{pmatrix}
    1 & 0 & 0 & 0\\
    0 & 1 & 0 & 0\\
    0 & 0 & 1 & 0\\
    0 & 0 & 0 & -1
  \end{pmatrix}
$$

$$
  \mathrm{c}X = 
  \vert 0\rangle \langle 0\vert \otimes \mathbb{1}_{\mathsf{Y}} + \vert 1\rangle \langle 1\vert \otimes X
  = 
  \begin{pmatrix}
    1 & 0 & 0 & 0\\
    0 & 1 & 0 & 0\\
    0 & 0 & 0 & 1\\
    0 & 0 & 1 & 0
  \end{pmatrix}
$$


If instead we take $\mathsf{Y}$ to be two qubits, and we take $U$ to be the swap operation between these two qubits, we obtain this operation:

$$
  \mathrm{cSWAP} = 
  \begin{pmatrix}
    1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
    0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
    0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
    0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\  
    0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
    0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
    0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
    0 & 0 & 0 & 0 & 0 & 0 & 0 & 1
  \end{pmatrix}
$$

This operation is also known as a *Fredkin operation* (or, more commonly, a *Fredkin gate*), named for
Edward Fredkin.

$$
  \operatorname{cSWAP} \vert a \rangle \vert b \rangle \vert c \rangle 
  = \begin{cases}
  \vert a \rangle \vert b \rangle \vert c \rangle & a = 0\\
  \vert a \rangle \vert c \rangle \vert b \rangle & a = 1
  \end{cases}
$$
  

$$
\begin{aligned}
  \operatorname{cSWAP} \vert 0 \rangle \vert b \rangle \vert c \rangle 
  & = \vert 0 \rangle \vert b \rangle \vert c \rangle \\[1mm]
  \operatorname{cSWAP} \vert 1 \rangle \vert b \rangle \vert c \rangle 
  & = \vert 1 \rangle \vert c \rangle \vert b \rangle
\end{aligned}
$$
  



A controlled-controlled-NOT operation, which we may denote $\mathrm{cc}X$, is called a *Toffoli gate*, named for
Tommaso Toffoli.

$$
  \mathrm{cc}X = 
  \begin{pmatrix}
    1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
    0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
    0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
    0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
    0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
    0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
    0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
    0 & 0 & 0 & 0 & 0 & 0 & 1 & 0
  \end{pmatrix}
$$

 