# Multiple systems

The focus of this lesson is on the basics of quantum information when there are *multiple* systems being considered or described.
This is a continuation of the previous lesson's discussion of single quantum systems in isolation.

A simple yet critically important idea to keep in mind going into this lesson is that one can always choose to view multiple systems *together* as if they form a *single system* — to which the discussion in the previous lesson must then apply.
Indeed, this idea very directly leads to a description of how quantum states, measurements, and operations work for multiple systems.

There is more, however, to understanding multiple quantum systems than to recognize that they may be viewed collectively as single systems.
For instance, we may have multiple quantum systems that are collectively in a particular quantum state, and then choose to measure just one (or a proper subset) of the individual systems.
In general, this will affect the state of the remaining systems, and it is important to understand exactly how when analyzing quantum algorithms and protocols.
An understanding of the sorts of *correlations* among multiple systems — and particularly a type of correlation known as *entanglement* — is also important in quantum information and computation.

## 1. Classical information <a id='multiple-systems-classical-info'></a>

As in the previous lesson, we will begin with a discussion of classical information.
Once again, the probabilistic and quantum descriptions are mathematically similar, and recognizing how the mathematics works in the familiar setting of classical information is helpful in understanding why quantum information is described in the way that it is.

### 1.1 Classical state sets <a id='multiple-systems-classical-state-sets'></a>

Let us begin with *classical state sets* of multiple systems.
For simplicity we will begin by discussing just two systems, and then generalize to more than two systems.

Specifically, let us suppose that $\mathsf{X}$ is a system having classical state set $\Sigma$ and $\mathsf{Y}$ is a second system having classical state set $\Gamma$.
As in the previous lesson, because we have referred to these sets as *classical state sets*, we assume that $\Sigma$ and $\Gamma$ are finite and nonempty.
It could be that $\Sigma = \Gamma$, but this is not required — and, in any case, it is helpful to use different names to refer to these sets in the interest of clarity.

Imagine that the two systems $\mathsf{X}$ and $\mathsf{Y}$ are placed side-by-side, with $\mathsf{X}$ on the left and $\mathsf{Y}$ on the right, and viewed together as if they form a single system.
We may denote this new joint system by $(\mathsf{X},\mathsf{Y})$ or $\mathsf{XY}$, depending on our preferences or whichever is more convenient for the case at hand.
One may then ask: What is the classical state set of this single, joint system $(\mathsf{X},\mathsf{Y})$?

The answer is that the classical state set of $(\mathsf{X},\mathsf{Y})$ is the *Cartesian product* of $\Sigma$ and $\Gamma$, which is the set defined as

$$
  \Sigma\times\Gamma = \bigl\{(a,b)\,:\,a\in\Sigma\;\text{and}\;b\in\Gamma\bigr\}.
$$

In simple terms, the Cartesian product is the mathematical notion that captures the idea of viewing an element of one set and an element of a second set together as a single element of a single set.
In the case at hand, to say that $(\mathsf{X},\mathsf{Y})$ is in the classical state $(a,b)\in\Sigma\times\Gamma$ means that $\mathsf{X}$ is in the classical state $a\in\Sigma$ and $\mathsf{Y}$ is in the classical state $b\in\Gamma$;
and if the classical state of $\mathsf{X}$ is $a\in\Sigma$ and the classical state of $\mathsf{Y}$ is $b\in\Gamma$, then the classical state of the joint system $(\mathsf{X},\mathsf{Y})$ is $(a,b)$.

For more than two systems, the situation generalizes in a natural way.
Suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, for any positive integer $n$.
The classical state set of the $n$-tuple $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$, viewed as a single joint system, is then the Cartesian product

$$
  \Sigma_1\times\cdots\times\Sigma_n
  = \bigl\{(a_1,\ldots,a_n)\,:\,
  a_1\in\Sigma_1,\:\ldots,\:a_n\in\Sigma_n\bigr\}.
$$

#### Classical states of multiple systems as strings

It is often convenient to write a classical state of the form $(a_1,\ldots,a_n)$ as a *string* $a_1\cdots a_n$ for the sake of brevity, particularly in the very typical situation that the classical state sets $\Sigma_1,\ldots,\Sigma_n$ are associated with sets of *symbols* or *characters*.
Indeed, the notion of a string, which is a fundamentally important concept in computer science, is formalized in mathematical terms through Cartesian products.
The term *alphabet* is commonly used to refer to sets of symbols used to form strings, but the mathematical definition of an alphabet is precisely the same as the definition of a classical state set: it is a finite and nonempty set.

For example, suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_\mathrm{10}$ are bits, so that the classical state sets of these systems are all the same:

$$
  \Sigma_1 = \Sigma_2 = \cdots = \Sigma_{10} = \{0,1\}.
$$

There are then $2^{10} = 1024$ classical states of the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_\mathrm{10})$, which are the elements of the set

$$
  \Sigma_1\times\Sigma_2\times\cdots\times\Sigma_{10} = \{0,1\}^{10}.
$$

Written as strings, these classical states look like this:

$$
  \begin{array}{c}
  0000000000\\
  0000000001\\
  0000000010\\
  0000000011\\
  0000000100\\
  \vdots\\[1mm]
  1111111111
  \end{array}
$$

For the classical state $0001010000$, for instance, we see that $\mathsf{X}_4$ and $\mathsf{X}_6$ are in the state $1$, while all of the other systems are in the state $0$.

### 1.2 Probabilistic states <a id='multiple-systems-probabilistic'></a>

As was discussed in the previous lesson, a probabilistic state associates a probability with each classical state of a system.
Thus, a probabilistic state of multiple systems together — viewed collectively as if they form a single system — associates a probability with each element of the Cartesian product of the classical state sets of the individual systems.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are both bits, so that their corresponding classical state sets are given by $\Sigma = \{0,1\}$ and $\Gamma = \{0,1\}$, we may have a probabilistic state like this:

$$
  \begin{aligned}
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,0)\bigr) 
    & = \frac{1}{2} \\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (0,1)\bigr) 
    & = 0\\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,0)\bigr) 
    & = 0\\[2mm]
    \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) = (1,1)\bigr) 
    & = \frac{1}{2}
  \end{aligned}
$$

This probabilistic state is one in which both $\mathsf{X}$ and $\mathsf{Y}$ are random bits — each is 0 with probability 1/2 and 1 with probability 1/2 — but the classical states of the two bits are always in agreement.
This is an example of a *correlation* between these systems.

#### Ordering Cartesian product state sets

Probabilistic states of systems are represented by probability vectors, which are column vectors whose indices are placed in correspondence with the underlying classical state set of the system being considered.
To represent a probabilistic state of multiple systems as a probability vector, where the classical state set of these systems together is given by a Cartesian product, one must therefore decide on an ordering of the elements of this Cartesian product.

Working under the assumption that the individual classical state sets from which the Cartesian product is formed have already been ordered, there is a simple convention for doing this, which is essentially to use *alphabetical ordering*.
That is, the entries in each $n$-tuple (or, equivalently, the symbols in each string) are viewed as being ordered by significance that *decreases from left to right*.

For example, according to this convention, the Cartesian product $\{1,2,3\}\times\{0,1\}$ is ordered like this:

$$
  (1,0),\;
  (1,1),\;
  (2,0),\;
  (2,1),\;
  (3,0),\;
  (3,1).
$$

When $n$-tuples are written as strings and ordered in this way, we observe familiar patterns, such as $\{0,1\}\times\{0,1\}$ being ordered as $00, 01, 10, 11$, and the set $\{0,1\}^{10}$ being ordered as was suggested above.

Thus, the probabilistic state described above is represented by the following probability vector (where the entries are labeled explicitly for the sake of clarity):

$$
  \begin{pmatrix}
    \frac{1}{2}\\[1mm]
    0\\[1mm]
    0\\[1mm]
    \frac{1}{2}
  \end{pmatrix}
  \begin{array}{l}
    \leftarrow \text{probability associated with state 00}\\[1mm]
    \leftarrow \text{probability associated with state 01}\\[1mm]
    \leftarrow \text{probability associated with state 10}\\[1mm]
    \leftarrow \text{probability associated with state 11}
  \end{array}
  \label{eq:correlatedbits} \tag{1.1}
$$

#### Independence for two systems

A special type of probabilistic state of two systems is one in which the systems are *independent*.

Suppose once again that $\mathsf{X}$ and $\mathsf{Y}$ are systems having classical state sets $\Sigma$ and $\Gamma$, respectively.
A probabilistic state of these two systems represents a situation of *independence* between these two systems if it is the case that

$$
  \operatorname{Pr}((\mathsf{X},\mathsf{Y}) = (a,b)) 
  = \operatorname{Pr}(\mathsf{X} = a) \operatorname{Pr}(\mathsf{Y} = b),
  \tag{1.2}
$$

for every choice of $a\in\Sigma$ and $b\in\Gamma$.
Intuitively speaking, two systems are independent if the probabilities we associate with each classical state of either one of the two systems have no dependence on the classical states of the other system.

Let us suppose that a given probabilistic state of the system $(\mathsf{X},\mathsf{Y})$ is described by a probability vector, which we may express using the Dirac notation as 

$$
\vert \psi \rangle = \sum_{(a,b) \in \Sigma\times\Gamma} p_{a,b} \vert a b\rangle.
$$

The condition (1.2) for independence is then equivalent to the existence of two probability vectors

$$
\vert \phi \rangle = \sum_{a\in\Sigma} q_a \vert a \rangle \quad\text{and}\quad
\vert \pi \rangle = \sum_{b\in\Gamma} r_b \vert b \rangle,
\tag{1.3}
$$

representing the probabilities associated with the classical states of $\mathsf{X}$ and $\mathsf{Y}$, respectively, such that

$$
p_{a,b} = q_a r_b
\tag{1.4}
$$

for all $a\in\Sigma$ and $b\in\Gamma$.

For example, the probabilistic state of a pair of bits $(\mathsf{X},\mathsf{Y})$ represented by the vector

$$
  \vert \psi \rangle
  = \frac{1}{6} \vert 00 \rangle 
  + \frac{1}{12} \vert 01 \rangle 
  + \frac{1}{2} \vert 10 \rangle 
  + \frac{1}{4} \vert 11 \rangle
$$

is one in which $\mathsf{X}$ and $\mathsf{Y}$ are independent.
Specifically, the condition required for independence is true for the probability vectors

$$
  \vert \phi \rangle = \frac{1}{4} \vert 0 \rangle + \frac{3}{4} \vert 1 \rangle
  \quad\text{and}\quad
  \vert \pi \rangle = \frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle.
$$

On the other hand, the probabilistic state $(1.1)$, which we may write as

$$
  \frac{1}{2} \vert 00 \rangle + \frac{1}{2} \vert 11 \rangle, 
  \tag{1.5}
$$

does not represent independence between the systems $\mathsf{X}$ and $\mathsf{Y}$.
A simple way to argue this is as follows.
Suppose that there did exist probability vectors $\vert \phi\rangle$ and $\vert \pi\rangle$, as in equation (1.3) above, for which the condition $(1.4)$ is satisfied for every choice of $a$ and $b$.
It would then necessarily be that

$$
  q_0 r_1 = \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (0,1)\bigr) = 0.
$$

This implies that either $q_0 = 0$ or $r_1 = 0$, by a property known as the *zero-product property* of the real numbers: the only way that the product of two real numbers can be zero is if either or both numbers are themselves equal to zero.
This, however, implies that either $q_0 r_0 = 0$ (in case $q_0 = 0$) or $q_1 r_1 = 0$ (in case $r_1 = 0$).
We see, however, that neither of those equalities can be true because we must have $q_0 r_0 = 1/2$ and 
$q_1 r_1 = 1/2.$
Hence, there do not exist vectors $\vert\phi\rangle$ and $\vert\pi\rangle$ satisfying the property required for independence.

Having defined independence between two systems in this way, we can now be more precise in defining a correlation as a *lack of independence*.
For example, because the two bits in the probabilistic state represented by the vector $(1.5)$ are not independent, they are correlated.

#### Tensor products

The condition of independence just described can be expressed succinctly through the notion of a *tensor product*.
This is a general notion that can be defined quite abstractly and applied to a variety of mathematical structures, but for vectors whose indices correspond to Cartesian products it can be defined in very simple and concrete terms: for two vectors

$$
\vert \phi \rangle = \sum_{a\in\Sigma} \alpha_a \vert a \rangle
\quad\text{and}\quad
\vert \pi \rangle = \sum_{b\in\Gamma} \beta_b \vert b \rangle,
$$

the tensor product $\vert \phi \rangle \otimes \vert \pi \rangle$ of these two vectors is the vector

$$
  \vert \phi \rangle \otimes \vert \pi \rangle
  = \sum_{(a,b)\in\Sigma\times\Gamma} \alpha_a \beta_b \vert ab\rangle.
$$

Equivalently, the vector $\vert \psi \rangle = \vert \phi \rangle \otimes \vert \pi \rangle$ is defined by the equation

$$
\langle ab \vert \psi \rangle = \langle a \vert \phi \rangle \langle b \vert \pi \rangle
$$

being true for every $a\in\Sigma$ and $b\in\Gamma$.

Thus, the condition $(1.4)$ is true for every choice of $a$ and $b$ if and only if $\vert \psi\rangle$ is equal to the tensor product of $\vert \phi\rangle$ and $\vert \pi \rangle$:

$$
  \vert \psi \rangle = \vert \phi \rangle \otimes \vert \pi \rangle.
$$

In this situation it is said that $\vert \psi \rangle$ is a *product state* or *product vector*.

It is common when using the Dirac notation that the tensor product symbol $\otimes$ is omitted when taking the tensor product of vectors written as kets.
For example, we often write $\vert \phi \rangle \vert \pi \rangle$ rather than $\vert \phi \rangle \otimes \vert \pi \rangle$.
This convention captures the idea that the tensor product is, in some sense, the most natural or default way to take the product of two vectors.
Although it is less common, the notation $\vert \phi\otimes\pi\rangle$
is also sometimes used to refer to the tensor product
$\vert \phi \rangle \otimes \vert \pi \rangle$.

When we use the convention described earlier for ordering the elements of Cartesian product sets — meaning alphabetical ordering — we obtain the following specification for the tensor product of two column vectors:

$$
  \begin{pmatrix}
  \alpha_1\\
  \vdots\\
  \alpha_m
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
  \beta_1\\
  \vdots\\
  \beta_k
  \end{pmatrix}
  =
  \begin{pmatrix}
  \alpha_1 \beta_1\\
  \vdots\\
  \alpha_1 \beta_k\\
  \alpha_2 \beta_1\\
  \vdots\\
  \alpha_2 \beta_k\\
  \vdots\\
  \alpha_m \beta_1\\
  \vdots\\
  \alpha_m \beta_k
  \end{pmatrix}.
$$

This operation is sometimes referred to specifically as the *Kronecker product*, but for the purposes of this lesson there is little to be gained in distinguishing it from the tensor product.

As an important aside, we observe the following expression for tensor products of standard basis vectors:

$$
\vert a \rangle \otimes \vert b \rangle = \vert ab \rangle.
$$

Alternatively, writing $(a,b)$ as an ordered pair rather than a string, we could write

$$
\vert a \rangle \otimes \vert b \rangle = \vert (a,b) \rangle,
$$ 

but it is more common to write

$$
\vert a \rangle \otimes \vert b \rangle = \vert a,b \rangle,
$$ 

following a standard convention in mathematics that parentheses are eliminated when they do not serve to add clarity or remove ambiguity.

The tensor product of two vectors has the important property that it is *bilinear*, which means that it is linear in each of the two arguments separately, assuming that the other argument is fixed.
This property can be expressed through these equations:

1. Linearity in the first argument:

$$
  \begin{aligned}
    \bigl(\vert\psi_1\rangle + \vert\psi_2\rangle\bigr)    
    \otimes \vert\phi\rangle 
    & = 
    \vert\psi_1\rangle \otimes \vert\phi\rangle
    + 
    \vert\psi_2\rangle \otimes \vert\phi\rangle \\[1mm]
    \bigl(\alpha \vert \psi \rangle\bigr) \otimes 
    \vert \phi \rangle 
    & =
    \alpha \bigl(\vert \psi \rangle \otimes 
    \vert \phi \rangle \bigr)
  \end{aligned}
$$

2. Linearity in the second argument:

$$
  \begin{aligned}
    \vert \psi \rangle \otimes 
    \bigl(\vert \phi_1 \rangle + \vert \phi_2 \rangle \bigr) 
    & = 
    \vert \psi \rangle \otimes 
    \vert \phi_1 \rangle + 
    \vert \psi \rangle \otimes \vert \phi_2 \rangle\\[1mm]
    \vert \psi \rangle \otimes 
    \bigl(\alpha \vert \phi \rangle \bigr) 
    & = \alpha \bigl(\vert \psi \rangle \otimes 
    \vert \phi \rangle \bigr)
  \end{aligned}
$$

Considering the second equation in each of these pairs of equations, 
we see that scalars "float freely" within tensor products:

$$
\bigl(\alpha \vert \psi \rangle\bigr) \otimes \vert \phi \rangle
= \vert \psi \rangle \otimes \bigl(\alpha \vert \phi \rangle \bigr)
= \alpha \bigl(\vert \psi \rangle \otimes \vert \phi \rangle \bigr).
$$

There is therefore no ambiguity in simply writing 
$\alpha\vert \psi \rangle \otimes \vert \phi \rangle$, or alternatively
$\alpha\vert \psi \rangle \vert \phi \rangle$ or
$\alpha\vert \psi \otimes \phi \rangle$, to refer to this vector.

#### Independence and tensor products for three or more systems

The notions of independence and tensor products generalize to three or more systems, in the sense that will now be discussed.

If $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively, then a probabilistic state of the combined system $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$ is a *product state* if the associated probability vector takes the form

$$
  \vert \psi \rangle = \vert \phi_1 \rangle \otimes \cdots \otimes 
  \vert \phi_n \rangle
$$

for probability vectors $\vert \phi_1 \rangle,\ldots,\vert \phi_n\rangle$ describing probabilistic states of $\mathsf{X}_1,\ldots,\mathsf{X}_n$.

Here, the definition of the tensor product generalizes in a natural way:
the vector $\vert \psi \rangle = \vert \phi_1 \rangle \otimes \cdots \otimes \vert \phi_n \rangle$ is defined by the equation

$$
  \langle a_1 \cdots a_n \vert \psi \rangle
  = \langle a_1 \vert \phi_1 \rangle \cdots
  \langle a_n \vert \phi_n \rangle
$$

being true for every $a_1\in\Sigma_1, \ldots a_n\in\Sigma_n$.
A different, but equivalent, way to define the tensor product of three or more vectors is recursively in terms of tensor products of two vectors:

$$
  \vert \phi_1 \rangle \otimes \cdots \otimes 
  \vert \phi_n \rangle
  = 
  \bigl(\vert \phi_1 \rangle \otimes \cdots \otimes \vert \phi_{n-1}
  \rangle\bigr) \otimes \vert \phi_n \rangle,
$$

assuming $n\geq 3$.

Similar to the tensor product of just two vectors, the tensor product of three or more vectors is linear in each of the arguments individually, assuming that all of the other arguments are fixed.
In this case, we say that the tensor product of three or more vectors is *mulitilinear*.

As we did in the case of two systems, we could say that the systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are *independent* when they are in a product state, but the term *mutually independent* is more precise.
There happen to be other notions of independence for three or more systems, such as *pairwise independence*, that we will not be concerned with at this time.

Generalizing the observation earlier concerning tensor products of standard basis vectors, for any positive integer $n$ and any classical states $a_1,\ldots,a_n$ we have

$$
\vert a_1 \rangle \otimes \cdots \otimes \vert a_n \rangle 
= \vert a_1 \cdots a_n \rangle
= \vert a_1,\ldots,a_n \rangle.
$$

### 1.3 Measurements of probabilistic states <a id='multiple-systems-probabilistic-measurement'></a>

Now let us move on to measurements of probabilistic states of multiple systems.
By choosing to view multiple systems together as single systems, we immediately obtain a specification of how measurements must work for multiple systems — provided that *all* of the systems are measured.

For example, if the probabilistic state of two bits
$(\mathsf{X},\mathsf{Y})$ is described by the probability vector

$$
  \frac{1}{2} \vert 00 \rangle + \frac{1}{2} \vert 11 \rangle, \tag{1.6}
$$

then the outcome $00$ — meaning $0$ for the measurement of $\mathsf{X}$ and $0$ for the measurement of $\mathsf{Y}$ — is obtained with probability 1/2 and $11$ is obtained with probability 1/2.
In each case we update the probability vector description of our knowledge accordingly, so that the probabilistic state becomes $|00\rangle$ or $|11\rangle$, respectively.

#### Partial measurements

Suppose, however, that we choose not to measure *every* system, but instead we just measure some *proper subset* of the systems.
This will result in a measurement outcome for each measurement that is performed, and will also (in general) affect our knowledge of the remaining systems.

Let us focus on the case of two systems, one of which is measured.
The more general situation, in which some subset of any collection of systems is measured, effectively reduces to the case of two systems if we form two joint systems consisting of those systems that are measured and those that are not. 

To be precise, let us suppose (as usual) that $\mathsf{X}$ is a system having classical state set $\Sigma$, $\mathsf{Y}$ is a system having classical state set $\Gamma$, and the two systems $(\mathsf{X},\mathsf{Y})$ together are in some probabilistic state.
We will consider what happens when we just measure $\mathsf{X}$ and do nothing to $\mathsf{Y}$.
The situation where just $\mathsf{Y}$ is measured and $\mathsf{X}$ is not is handled symmetrically, in an analogous way.

First, we know that the probability to observe a particular classical state $a\in\Sigma$ when just $\mathsf{X}$ is measured must be consistent with the probabilities we would obtain had $\mathsf{Y}$ also been measured.
That is, we must have

$$
  \operatorname{Pr}(\mathsf{X} = a) 
  = \sum_{b\in\Gamma} \operatorname{Pr}\bigl( (\mathsf{X},\mathsf{Y}) 
  = (a,b) \bigr).
$$

This is the formula for the so-called *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ alone.

This formula makes perfect sense at an intuitive level, in the sense that something very strange would have to happen for it to be wrong.
If the formula was not correct, it would mean that the probabilities of obtaining different outcomes when $\mathsf{X}$ is measured could somehow be influenced simply by whether or not $\mathsf{Y}$ was also measured, but not on the actual outcome of this measurement of $\mathsf{Y}$.
If $\mathsf{Y}$ happened to be in a distant location, for instance, this would allow for superluminal signaling, which we immediately reject based on our understanding of physics.

Now, given the assumption that only $\mathsf{X}$ has been measured and $\mathsf{Y}$ has not, there may in general still exist uncertainty over the classical state of $\mathsf{Y}$.
For this reason, rather than updating our description of the probabilistic state of $(\mathsf{X},\mathsf{Y})$ to $\vert ab\rangle$ for some selection of $a\in\Sigma$ and $b\in\Gamma$, we must update our description so that this uncertainty about $\mathsf{Y}$ is properly reflected.
The following *conditional probability* formula reflects this uncertainty:

$$
  \operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)
  = \frac{
    \operatorname{Pr}\bigl((\mathsf{X},\mathsf{Y}) = (a,b)\bigr)
  }{
    \operatorname{Pr}(\mathsf{X} = a)
  }.
$$

Here, the expression 
$\operatorname{Pr}(\mathsf{Y} = b \,|\, \mathsf{X} = a)$ 
denotes the probability that $\mathsf{Y} = b$ *conditioned* on (or *given* that) $\mathsf{X} = a$.

Note that the expression above is only defined if $\operatorname{Pr}(\mathsf{X}=a)$ is nonzero:
if $\operatorname{Pr}(\mathsf{X}=a) = 0$, we obtain the indeterminate form $\frac{0}{0}$.
This is not a problem because if $\operatorname{Pr}(\mathsf{X}=a) = 0$, then we will never observe $a$ as an outcome of a measurement of $\mathsf{X}$, so we need not be concerned with this possibility.

To express these formulas in terms of probability vectors, let us assume that the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is described by a probability vector 

$$
  \vert \psi \rangle 
  = \sum_{(a,b)\in\Sigma\times\Gamma}
  p_{a,b} \vert ab\rangle.
$$

Measuring just the system $\mathsf{X}$ alone yields each possible outcome with probabilities as follows:

$$
  \operatorname{Pr}(\mathsf{X} = a) = \sum_{b\in\Gamma} p_{a,b}.
$$

As was already suggested, the probabilistic state obtained in this way represents the *reduced* (or *marginal*) probabilistic state of $\mathsf{X}$ by itself.
Having obtained a particular outcome $a\in\Sigma$ of the measurement of $\mathsf{X}$, the probabilistic state of $\mathsf{Y}$ is updated according to the formula for conditional probabilities, so that it is represented by this probability vector:

$$
  \vert \pi_a \rangle 
  = \frac{
    \sum_{b\in\Gamma} p_{a,b} \vert b\rangle
  }{
    \sum_{c\in\Gamma} p_{a,c}
  }.
$$

In the event that the measurement of $\mathsf{X}$ resulted in the classical state $a$, we therefore update our description of the probabilistic state of the joint system $(\mathsf{X},\mathsf{Y})$ to $\vert a\rangle \otimes \vert \pi_a\rangle$.

One way to think about this definition of $\vert \pi_a\rangle$ is to see it as a *normalization* of the vector $\sum_{b\in\Gamma} p_{a,b} \vert b\rangle$, where we divide by the sum of the entries in this vector to obtain a probability vector.
This normalization effectively accounts for a conditioning on the event that the measurement of $\mathsf{X}$ has resulted in the outcome $a$. 

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are bits in the probabilistic state

$$
  \vert \psi \rangle 
  = \frac{1}{2} \vert 00 \rangle + \frac{1}{2} \vert 11 \rangle,
$$

then a measurement of just the bit $\mathsf{X}$ results in the outcomes 0 and 1 with probabilities as follows:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{X} = 0) 
    & = \langle 00\vert \psi\rangle + \langle 01\vert \psi\rangle 
    = \frac{1}{2} + 0 = \frac{1}{2},\\[2mm]
    \operatorname{Pr}(\mathsf{X} = 1) 
    & = \langle 10\vert \psi\rangle + \langle 11\vert \psi\rangle  
    = 0 + \frac{1}{2} = \frac{1}{2}.
  \end{aligned}
$$

If the measurement outcome is 0, then the probabilistic state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 0\rangle \otimes
  \frac{
    \frac{1}{2} \vert 0 \rangle + 0 \vert 1 \rangle
  }{
    \frac{1}{2} + 0
  } 
  = \vert 0 0 \rangle.
$$

Through a similar calculation, if the outcome of the measurement of $\mathsf{X}$ is 1, the probabilistic state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 1\rangle \otimes \frac{
    0 \vert 0 \rangle + \frac{1}{2} \vert 1 \rangle
  }{
    0 + \frac{1}{2}
  } 
  = \vert 1 1 \rangle.
$$

Thus, for this particular example, there is no uncertainty remaining about $\mathsf{Y}$ when $\mathsf{X}$ is measured.

On the other hand, if $\mathsf{X}$ and $\mathsf{Y}$ are bits in the probabilistic state

$$
  \vert \psi \rangle
  = \frac{1}{6} \vert 00 \rangle
  + \frac{1}{12}  \vert 01 \rangle
  + \frac{1}{2} \vert 10 \rangle
  + \frac{1}{4}  \vert 11 \rangle,
$$

then a measurement of just the bit $\mathsf{X}$ results in the outcomes 0 and 1 with probabilities as follows:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{X} = 0) 
    & = \langle 00 \vert \psi\rangle + \langle 01 \vert \psi \rangle
    = \frac{1}{6} + \frac{1}{12} = \frac{1}{4} \\[2mm]
    \operatorname{Pr}(\mathsf{X} = 1) 
    & = \langle 10 \vert \psi\rangle + \langle 11 \vert \psi \rangle
    = \frac{1}{2} + \frac{1}{4} = \frac{3}{4}.
  \end{aligned}
$$

If the measurement outcome is 0, then the the probabilistic state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 0\rangle \otimes \frac{
    \frac{1}{6} \vert 0 \rangle + \frac{1}{12} \vert 1 \rangle
  }{
    \frac{1}{6} + \frac{1}{12}
  }
  = \vert 0\rangle \otimes\biggl(\frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle\biggr).
$$

Through a similar calculation, if the outcome of the measurement of $\mathsf{X}$ is 1, the probabilistic state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 1\rangle \otimes
  \frac{
    \frac{1}{2} \vert 0 \rangle + \frac{1}{4} \vert 1 \rangle
  }{
    \frac{1}{2} + \frac{1}{4}
  }
  = \vert 1 \rangle \otimes \biggl(\frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle\biggr).
$$

In both cases, we see that the system $\mathsf{Y}$ is left in the probabilistic state

$$
  \frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle.
$$

This is not a surprise.
Recall that $\mathsf{X}$ and $\mathsf{Y}$ are independent in this example: we have 

$$
  \vert\psi\rangle =
  \biggl( \frac{1}{4} \vert 0 \rangle + \frac{3}{4} \vert 1 \rangle
  \biggr)
  \otimes
  \biggl( \frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle
  \biggr),
$$

and so naturally the probabilities for the two possible outcomes of the measurement of $\mathsf{X}$ are described by the probability vector

$$
  \frac{1}{4} \vert 0 \rangle + \frac{3}{4} \vert 1 \rangle
$$

as we have calculated, and in either case the resulting probabilistic state of $\mathsf{Y}$ is described by the probability vector

$$
  \frac{2}{3} \vert 0 \rangle + \frac{1}{3} \vert 1 \rangle.
$$

That is, knowing that $\mathsf{X}$ and $\mathsf{Y}$ are independent in this example, we did not really need to go through the trouble of performing the calculations above — but doing so served as both an example and a reality check.

#### Calculating partial measurements using the Dirac notation

The sorts of calculations just described, where a measurement is performed on a subset of a collection of systems, can be performed directly using the Dirac notation.

To illustrate how the method works, let us consider a new example where the classical state set of $\mathsf{X}$ is $\Sigma = \{1,2,3\}$, the classical state set of $\mathsf{Y}$ is $\Gamma = \{0,1\}$, and the probabilistic state of $(\mathsf{X},\mathsf{Y})$ is

$$
  \vert \psi \rangle 
  = \frac{1}{2}  \vert 1,0 \rangle
  + \frac{1}{12} \vert 1,1 \rangle
  + \frac{1}{6}  \vert 2,1 \rangle
  + \frac{1}{12} \vert 3,0 \rangle
  + \frac{1}{6}  \vert 3,1 \rangle
  =
  \begin{pmatrix}
    \frac{1}{2}\\
    \frac{1}{12}\\
    0\\
    \frac{1}{6}\\
    \frac{1}{12}\\
    \frac{1}{6}
  \end{pmatrix}.
$$

This time let us suppose that the *second* system $\mathsf{Y}$ is measured.
Our goal will be to determine the probabilities of the two possible outcomes (0 and 1), and to calculate what the resulting probabilistic state of $\mathsf{X}$ is for the two outcomes.

Using the bilinearity of the tensor product, and specifically the fact that it is linear in the *first* argument, we may rewrite the vector $\vert \psi \rangle$ as follows:

$$
  \vert \psi \rangle 
  = \biggl( \frac{1}{2} \vert 1 \rangle 
  + \frac{1}{12} \vert 3 \rangle\biggr)
  \otimes \vert 0\rangle
  + \biggl( \frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle 
  + \frac{1}{6} \vert 3 \rangle\biggr) \otimes \vert 1\rangle.
$$

What we have done is to isolated the distinct standard basis vectors for the system being measured (which in this example is the second system $\mathsf{Y}$), collecting all of the terms for the first system as is required to do this.
A moment's thought reveals that this is always possible, regardless of what vector we started with.

The probabilities for the two outcomes when $\mathsf{Y}$ is measured are now easily inferred:

$$
  \begin{aligned}
    \operatorname{Pr}(\mathsf{Y} = 0)
    & = \frac{1}{2} + \frac{1}{12} = \frac{7}{12}\\[2mm]
    \operatorname{Pr}(\mathsf{Y} = 1) 
    & = \frac{1}{12} + \frac{1}{6} + \frac{1}{6} = \frac{5}{12}.
  \end{aligned}
$$

Moreover, the probabilistic state of $\mathsf{X}$, conditioned on each possible outcome, can also be quickly inferred by simply *normalizing* the vectors in parentheses by dividing by the associated probability just calculated, so that these vectors become probability vectors.
That is, conditioned on the measurement of $\mathsf{Y}$ being 0, the probabilistic state of $\mathsf{X}$ becomes

$$
 \frac{\frac{1}{2} \vert 1 \rangle + \frac{1}{12} \vert 3 \rangle}{\frac{7}{12}}
 = \frac{6}{7} \vert 1 \rangle + \frac{1}{7} \vert 3 \rangle,
$$

and conditioned on the measurement of $\mathsf{Y}$ being 1, the probabilistic state of
$\mathsf{X}$ becomes

$$
  \frac{\frac{1}{12} \vert 1 \rangle + \frac{1}{6} \vert 2\rangle 
  + \frac{1}{6} \vert 3 \rangle}{\frac{5}{12}}
  = \frac{1}{5} \vert 1 \rangle + \frac{2}{5} \vert 2 \rangle + \frac{2}{5} \vert 3 \rangle.
$$

### 1.4 Operations on probabilistic states <a id='multiple-systems-probabilistic-operations'></a>

To conclude this discussion of classical information for multiple systems, we will consider operations on multiple systems that are in probabilistic states.
Similar to measurements, we can view multiple systems collectively as forming single, compound systems and look to the previous lesson on single systems to see how this works.

Returning to the typical set-up where we have two systems $\mathsf{X}$ and $\mathsf{Y}$ having classical state sets $\Sigma$ and $\Gamma$, for instance, we can consider classical operations on the joint system $(\mathsf{X},\mathsf{Y})$.
Based on the previous lesson and the discussion above, we conclude that any such operation is represented by a stochastic matrix whose rows and columns are indexed by the Cartesian product $\Sigma\times\Gamma$.

For example, suppose that $\mathsf{X}$ and $\mathsf{Y}$ are bits, and consider an operation with the following description:

<p style="padding-left: 5em; padding-right: 5em;">
   If $\mathsf{X} = 1$, then perform a NOT operation on 
   $\mathsf{Y}$, otherwise do nothing.
</p>

This is a deterministic operation known as a *controlled-NOT* operation, where $\mathsf{X}$ is the *control* bit that determines whether or not a NOT operation should or should not be applied to the *target* bit $\mathsf{Y}$.
Here is the matrix representation of this operation:

$$
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 1 & 0 & 0\\[2mm]
0 & 0 & 0 & 1\\[2mm]
0 & 0 & 1 & 0
\end{pmatrix}.
$$

Its action on standard basis states is as follows:

$$
\begin{aligned}
\vert 00 \rangle & \mapsto \vert 00 \rangle\\
\vert 01 \rangle & \mapsto \vert 01 \rangle\\
\vert 10 \rangle & \mapsto \vert 11 \rangle\\
\vert 11 \rangle & \mapsto \vert 10 \rangle
\end{aligned}
$$

If we were to exchange the roles of $\mathsf{X}$ and $\mathsf{Y}$, taking $\mathsf{Y}$ to be the control bit and $\mathsf{X}$ to be the target bit, then the matrix representation of the operation would become

$$
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 0 & 0 & 1\\[2mm]
0 & 0 & 1 & 0\\[2mm]
0 & 1 & 0 & 0
\end{pmatrix}
$$

and its action on standard basis states would be like this:

$$
\begin{aligned}
\vert 00 \rangle & \mapsto \vert 00 \rangle\\
\vert 01 \rangle & \mapsto \vert 11 \rangle\\
\vert 10 \rangle & \mapsto \vert 10 \rangle\\
\vert 11 \rangle & \mapsto \vert 01 \rangle
\end{aligned}
$$

Another example is the operation having this description:

<p style="padding-left: 5em; padding-right: 5em;">
    With probability 1/2, set $\mathsf{Y}$ to be equal to $\mathsf{X}$, 
    otherwise do nothing.
</p>

The matrix representation of this operation is as follows:

$$
\begin{pmatrix}
1 & \frac{1}{2} & 0 & 0\\[2mm]
0 & \frac{1}{2} & 0 & 0\\[2mm]
0 & 0 & \frac{1}{2} & 0\\[2mm]
0 & 0 & \frac{1}{2} & 1
\end{pmatrix}
=
\frac{1}{2}
\begin{pmatrix}
1 & 1 & 0 & 0\\[2mm]
0 & 0 & 0 & 0\\[2mm]
0 & 0 & 0 & 0\\[2mm]
0 & 0 & 1 & 1
\end{pmatrix}
+
\frac{1}{2}
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 1 & 0 & 0\\[2mm]
0 & 0 & 1 & 0\\[2mm]
0 & 0 & 0 & 1
\end{pmatrix}.
$$

The action of this operation on standard basis vectors is as follows:

$$
\begin{aligned}
\vert 00 \rangle & \mapsto \vert 00 \rangle\\[1mm]
\vert 01 \rangle & \mapsto \frac{1}{2} \vert 00 \rangle + \frac{1}{2}\vert 01\rangle\\[1mm]
\vert 10 \rangle & \mapsto \frac{1}{2} \vert 11 \rangle + \frac{1}{2}\vert 10\rangle\\[1mm]
\vert 11 \rangle & \mapsto \vert 11 \rangle
\end{aligned}
$$

In these examples, we are simply viewing two systems together as a single system and proceeding as in the previous lesson.

The same thing can be done for any number of systems.
For example, imagine that we have three bits, and we imagine incrementing modulo $8$ — meaning that we think about the three bits as encoding a number between $0$ and $7$ using binary notation, add $1$, and then take the remainder after dividing by $8$.
We can write this operation like this:

$$
\begin{aligned}
  & \vert 001 \rangle \langle 000 \vert
    + \vert 010 \rangle \langle 001 \vert
    + \vert 011 \rangle \langle 010 \vert
    + \vert 100 \rangle \langle 011 \vert\\[1mm]
  & \quad + \vert 101 \rangle \langle 100 \vert
    + \vert 110 \rangle \langle 101 \vert
    + \vert 111 \rangle \langle 110 \vert
    + \vert 000 \rangle \langle 111 \vert
\end{aligned}
$$

We could also write it like this:

$$
\sum_{k = 0}^{7} \vert (k+1) \bmod 8 \rangle \langle k \vert,
$$

assuming that we have agreed that a number $j\in\{0,1,\ldots,7\}$ inside of a ket, as in $\vert j \rangle$, refers to the three bit binary encoding of the number $j$.

We can also express this operation as a matrix like this:

$$
\begin{pmatrix}
  0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
  1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
  0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
  0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
  0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
  0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
  0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
  0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
  0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0
\end{pmatrix}.
$$


#### Independent operations and tensor products of matrices

Now suppose that we have multiple systems and we perform separate operations on these separate systems.

For example, taking our usual set-up of two systems $\mathsf{X}$ and $\mathsf{Y}$ having classical state sets $\Sigma$ and $\Gamma$, respectively, let us suppose that we perform one operation on $\mathsf{X}$ and, completely independently, another operation on $\mathsf{Y}$.
As we know from the previous lesson, these operations are represented by stochastic matrices — and to be precise, let us say that the operation on $\mathsf{X}$ is represented by the matrix $M$ and the operation on $\mathsf{Y}$ is represented by the matrix $N$.
Thus, the rows and columns of $M$ have indices that are placed in correspondence with the elements of $\Sigma$ and, likewise, the rows and columns of $N$ correspond to the elements of $\Gamma$.
A natural question to ask is this: if we view $\mathsf{X}$ and $\mathsf{Y}$ together as a single, compound system $(\mathsf{X},\mathsf{Y})$, what is the matrix that represents the combined action of the two operations on this compound system?

The answer to this question is that the combined action is represented by the tensor product $M\otimes N$ — tensor products represent *independence*, this time between operations.
Here the tensor product is between two matrices rather than two vectors, but the definition is analogous.
To be precise, the matrix $M\otimes N$ is defined by the equation

$$
\langle ac \vert M \otimes N \vert bd\rangle
= 
\langle a \vert M \vert b\rangle
\langle c \vert N \vert d\rangle
$$

being true for every selection of $a,b\in\Sigma$ and $b,d\in\Gamma$.

An alternative, but equivalent, way to describe $M\otimes N$ is that it is the unique matrix that satisfies the equation

$$
  (M \otimes N)
  \bigl( \vert \phi \rangle \otimes \vert \pi \rangle \bigr)
  = \bigl(M \vert\phi\rangle\bigr) \otimes 
  \bigl(N \vert\pi\rangle\bigr)
$$

for every possible choice of vectors $\vert \phi\rangle$ and $\vert \pi\rangle$, assuming that the indices of $\vert \phi\rangle$ correspond to the elements of $\Sigma$ and the indices of $\vert \pi\rangle$ correspond to $\Gamma$.

(*** Exercise that helps learners to see that the two descriptions are equivalent.)

Following the convention described previously for ordering the elements of Cartesian products, we can also write the tensor product of two matrices explicitly as follows:

$$
\begin{gathered}
  \begin{pmatrix}
    \alpha_{1,1} & \cdots & \alpha_{1,m} \\
    \vdots & \ddots & \vdots \\
    \alpha_{m,1} & \cdots & \alpha_{m,m}
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    \beta_{1,1} & \cdots & \beta_{1,k} \\
    \vdots & \ddots & \vdots\\
    \beta_{k,1} & \cdots & \beta_{k,k}
  \end{pmatrix}
  \hspace{6cm}\\[2mm]
  \hspace{1cm}
 =
  \begin{pmatrix}
    \alpha_{1,1}\beta_{1,1} & \cdots & \alpha_{1,1}\beta_{1,k} & & 
    \alpha_{1,m}\beta_{1,1} & \cdots & \alpha_{1,m}\beta_{1,k} \\
    \vdots & \ddots & \vdots & \hspace{2mm}\cdots\hspace{2mm} & \vdots & \ddots & \vdots \\
    \alpha_{1,1}\beta_{k,1} & \cdots & \alpha_{1,1}\beta_{k,k} & & 
    \alpha_{1,m}\beta_{k,1} & \cdots & \alpha_{1,m}\beta_{k,k} \\[2mm]
    & \vdots & & \ddots & & \vdots & \\[2mm]
    \alpha_{m,1}\beta_{1,1} & \cdots & \alpha_{m,1}\beta_{1,k} & & 
    \alpha_{m,m}\beta_{1,1} & \cdots & \alpha_{m,m}\beta_{1,k} \\
    \vdots & \ddots & \vdots & \hspace{2mm}\cdots\hspace{2mm} & \vdots & \ddots & \vdots \\
    \alpha_{m,1}\beta_{k,1} & \cdots & \alpha_{m,1}\beta_{k,k} & & 
    \alpha_{m,m}\beta_{k,1} & \cdots & \alpha_{m,m}\beta_{k,k}
  \end{pmatrix}
\end{gathered}
$$

Tensor products of three or more matrices are defined in an analogous way.
If $M_1, \ldots, M_n$ are matrices whose indices correspond to classical state sets $\Sigma_1,\ldots,\Sigma_n$, then the tensor product $M_1\otimes\cdots\otimes M_n$ is defined by the condition that

$$
\langle a_1\cdots a_n \vert M_1\otimes\cdots\otimes M_n \vert b_1\cdots b_n\rangle
=
\langle a_1 \vert M_1 \vert b_1 \rangle \cdots\langle a_n \vert M_n \vert b_n \rangle
$$

for every choice of classical states $a_1,b_1\in\Sigma_1,\ldots,a_n,b_n\in\Sigma_n$.

Alternatively, we could also define the tensor product of three or more matrices recursively, in terms of tensor products of two matrices, similar to what we observed for vectors.

The tensor product of matrices is sometimes said to be *multiplicative* because the equation

$$
  (M_1\otimes\cdots\otimes M_n)(N_1\otimes\cdots\otimes N_n)
  = (M_1 N_1)\otimes\cdots\otimes (M_n N_n)
$$

is always true, for any choice of matrices $M_1,\ldots,M_n$ and $N_1,\ldots,N_n$, provided that the products $M_1 N_1, \ldots, M_n N_n$ make sense.

Let us take a look at an example, which recalls a probabilistic operation on a single bit from the previous lesson:
if the classical state of the bit is 0, it is left alone; and if the classical state of the bit is 1, it is flipped to 0 with probability $1/2$.
As we observed, this operation is represented by the matrix

$$
  \begin{pmatrix}
    1 & \frac{1}{2}\\[1mm]
    0 & \frac{1}{2}
  \end{pmatrix},
$$

If this operation is performed on a bit $\mathsf{X}$, and a NOT operation is (independently) performed on a second bit $\mathsf{Y}$, then the joint operation on the compound system $(\mathsf{X},\mathsf{Y})$ has the matrix representation

$$
  \begin{pmatrix}
    1 & \frac{1}{2}\\[1mm]
    0 & \frac{1}{2}
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    0 & 1\\[1mm]
    1 & 0
  \end{pmatrix}
  =
  \begin{pmatrix}
    0 & 1 & 0 & \frac{1}{2} \\[1mm]
    1 & 0 & \frac{1}{2} & 0 \\[1mm]
    0 & 0 & 0 & \frac{1}{2} \\[1mm]
    0 & 0 & \frac{1}{2} & 0
  \end{pmatrix}.
$$

By inspection, we see that this is a stochastic matrix.
This will always be the case: the tensor product of two or more stochastic matrices is always stochastic.

A common situation that we encounter is one in which one operation is performed on one system and *nothing* is done to another.
In such a case, exactly the same prescription is followed, noting that *doing nothing* is represented by the identity matrix.
For example, resetting the bit $\mathsf{X}$ to the 0 state and doing nothing to $\mathsf{Y}$ yields the probabilistic (and in fact deterministic) operation on $(\mathsf{X},\mathsf{Y})$ represented by the matrix

$$
  \begin{pmatrix}
    1 & 1\\[1mm]
    0 & 0
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    1 & 0\\[1mm]
    0 & 1
  \end{pmatrix}
  =
  \begin{pmatrix}
    1 & 0 & 1 & 0 \\[1mm]
    0 & 1 & 0 & 1 \\[1mm]
    0 & 0 & 0 & 0 \\[1mm]
    0 & 0 & 0 & 0
  \end{pmatrix}.
$$



## 2. Quantum information <a id='multiple-systems-quantum-info'></a>

We are now prepared to move on to quantum information in the setting of multiple systems.
Much like in the previous lesson on single systems, the mathematical description of quantum information for multiple systems is quite similar to the probabilistic case and makes use of similar concepts and techniques.

### 2.1 Quantum states <a id='multiple-systems-quantum-states'></a>

Multiple systems can be viewed collectively as single, compound systems.
We have already observed this in the probabilistic setting, and the quantum setting is completely analogous.
That is, quantum states of multiple systems are represented by column vectors having complex number entries and Euclidean norm equal to 1 — just like quantum states of single systems — but this time the indices of the quantum state vectors are placed in correspondence with the Cartesian product of the classical state sets associated with each of the individual systems.

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are qubits, so that their classical state sets are both equal to the binary alphabet $\{0,1\}$, then the classical state set of the pair of qubits $(\mathsf{X},\mathsf{Y})$, viewed collectively as a single system, is given by the Cartesian product $\{0,1\}\times\{0,1\}$ — and by representing pairs of binary values as binary strings of length 2, we may associate this Cartesian product set with the set 
$\{00,01,10,11\}$.
The following vectors are all examples of quantum state vectors of the pair $(\mathsf{X},\mathsf{Y})$:

$$
  \frac{1}{2} \vert 00 \rangle
  + \frac{i}{2} \vert 01\rangle
  - \frac{1}{2} \vert 10\rangle
  - \frac{i}{2} \vert 11\rangle, \quad
  \frac{3}{5} \vert 00\rangle - \frac{4}{5} \vert 11\rangle, 
  \quad \text{and} \quad
  \vert 01 \rangle.
$$

<!-- ::: q-block.exercise -->

### Code exercise

In the previous page, we saw Qiskit had a built in class, `Statevector`, for representing quantum states. In this page, we'll recreate a simpler version of this class ourselves. Coding operations of the `Statevector` class will require you to fully understand those operations.

In the cell below, we create a bare bones `Statevector`. In this simple implementation, we'll store each possible amplitude in a list (i.e. our column vector). We've also defined two methods: `draw`, and `probabilities`. We've completed the `draw` method for you, but you'll need to fill in the `probabilities` method yourself. The two cells after the `Statevector` definition show an example of these methods in use.

Complete the `probabilities` method in the cell below.

<!-- ::: -->

In [22]:
from numpy import sqrt, abs
class Statevector:
    """This class represents quantum state vectors"""
    def __init__(self, amplitudes):
        """Set up state vector.
        Args:
            amplitudes (list): A list of amplitudes
        """
        self.amplitudes = amplitudes

    def draw(self):
        """Print the state vector's amplitudes"""
        print(self.amplitudes)

    def probabilities(self):
        """Get probability of measuring each basis state.
        Returns:
            (list) Probability of measuring each basis state
        """
        # Your code here
        return probabilities

In [12]:
sv = Statevector([0, 0, 0, 1])
sv.draw()  # Should print [0, 0, 0, 1]

[0, 0, 0, 1]


In [16]:
sv = Statevector([.5, .5, 0, 1/sqrt(2)])
sv.probabilities()  # should return ~[.25, .25, 0, .5]

[0.25, 0.25, 0, 0.4999999999999999]

<!-- ::: q-block.reminder -->

### Exercise solution

<details>
    <summary>Completing the <code>probabilities</code> method</summary>
    Include the line:

    pre
      |    probabilities = [abs(amp)**2 for amp in self.amplitudes]

in the <code>probabilities</code> method. To get the probability of measuring a state, we square the magnitude of that state's amplitude. This line of code iterates through <code>self.amplitudes</code> and makes a new list containing the magnitude (<code>abs</code>) squared of each amplitude.
</details>

<!-- ::: -->

#### Tensor products of quantum state vectors

Similar to what we have for probability vectors, tensor products of quantum state vectors are also quantum state vectors.

Suppose that $\vert \phi \rangle$ is a quantum state vector of a system $\mathsf{X}$ having classical state set $\Sigma$ and $\vert \psi \rangle$ is a quantum state vector of a system $\mathsf{Y}$ having classical state sets $\Gamma$.
The indices of the vector $\vert \phi \rangle$ therefore correspond to the elements of $\Sigma$ while the indices of $\vert \psi \rangle$ correspond to $\Gamma$.
The tensor product $\vert \phi \rangle \otimes \vert \psi \rangle$, which may alternatively be written as
$\vert \phi \rangle \vert \psi \rangle$ or as $\vert \phi \otimes \psi \rangle,$ is then a quantum state vector of the joint system $(\mathsf{X},\mathsf{Y})$.
As in the probabilistic setting, we refer to a state of this form as a *product state*.

A product state of this form represents *independence* between the systems $\mathsf{X}$ and $\mathsf{Y}$.
Intuitively speaking, we may think of the systems $(\mathsf{X},\mathsf{Y})$ being in a product state $\vert \phi \rangle \otimes \vert \psi \rangle$ as if $\mathsf{X}$ is in the quantum state $\vert \phi \rangle$, $\mathsf{Y}$ is in the quantum state $\vert \psi \rangle$, and the states of the two systems have nothing to do with one another.

The fact that the tensor product vector $\vert \phi \rangle \otimes \vert \psi \rangle$ is indeed a quantum state vector is consistent with the Euclidean norm being *multiplicative* with respect to tensor products:

$$
\begin{aligned}
  \bigl\| \vert \phi \rangle \otimes \vert \psi \rangle \bigr\| 
  & = \sqrt{ 
    \sum_{(a,b)\in\Sigma\times\Gamma} 
    \bigl\vert\langle ab \vert \phi\otimes\psi \rangle \bigr\vert^2
  }\\[1mm]
  & = \sqrt{ 
    \sum_{a\in\Sigma} \sum_{b\in\Gamma}
    \bigl\vert\langle a \vert \phi \rangle 
    \langle b \vert \psi \rangle \bigr\vert^2
  }\\[1mm]
  & = \sqrt{ 
    \biggl(\sum_{a\in\Sigma} 
    \bigl\vert \langle a \vert \phi \rangle \bigr\vert^2
    \biggr)
    \biggl(\sum_{b\in\Gamma} 
    \bigl\vert \langle b \vert \psi \rangle \bigr\vert^2
    \biggr)
  }\\[1mm]
  & = \bigl\| 
    \vert \phi \rangle \bigr\| \bigl\| \vert \psi \rangle 
  \bigr\|.
\end{aligned}
$$

Thus, because $\vert \phi \rangle$ and $\vert \psi \rangle$ are quantum state vectors, we have $\|\vert \phi \rangle\| = 1$ and $\|\vert \psi \rangle\| = 1$, and therefore
$\|\vert \phi \rangle \otimes \vert \psi \rangle\| = 1$, so $\vert \phi \rangle \otimes \vert \psi \rangle$ is also a quantum state vector.

This discussion may be generalized to more than two systems.
If $\vert \psi_1 \rangle,\ldots,\vert \psi_n \rangle$ are quantum state vectors of systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$, then
$\vert \psi_1 \rangle\otimes\cdots\otimes \vert \psi_n \rangle$ is a quantum state vector representing a *product state* of the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$.
Again, we know that $\vert \psi_1 \rangle\otimes\cdots\otimes \vert \psi_n \rangle$ must be a quantum state vector because

$$
  \bigl\| 
  \vert \psi_1 \rangle\otimes\cdots\otimes \vert \psi_n \rangle 
  \bigr\| 
  = \bigl\|\vert \psi_1 \rangle\bigl\| \cdots 
  \bigl\|\vert \psi_n \rangle \bigr\| = 1^n = 1.
$$

<!-- ::: q-block.exercise -->

### Code exercise

In the code cell below, we extend our `Statevector` class to include a `tensor` method. This method accepts another `Statevector` as an argument, and returns a new `Statevector` that is the tensor product of itself and the other statevector. Complete the `tensor` method in the code cell below.

_Hint: You'll need to iterate through `self.amplitudes` and `other.amplitudes`_

<!-- ::: -->

In [21]:
class Statevector(Statevector):
    def tensor(self, other):
        """Return new Statevector, which is tensor product
        of `self` (LHS) and `other` (RHS)"""
        new_amplitudes = []
        # Your code here
        new_statevector = Statevector(new_amplitudes)
        return new_statevector

In [20]:
sv1 = Statevector([0, 1])
sv2 = Statevector([1/sqrt(2), -1/sqrt(2)])
sv1.tensor(sv2).draw()  # should print ~[0, 0, 0.707, -0.707]

[0.0, -0.0, 0.7071067811865475, -0.7071067811865475]


<!-- ::: q-block.reminder -->

### Exercise solution

<details>
    <summary>Completing the <code>tensor</code> method</summary>
    The following code sets up `new_amplitudes` such that returning `Statevector(new_amplitude)` is correct output.

    pre
      |        new_amplitudes = []
      |        for amp1 in self.amplitudes:
      |            for amp2 in other.amplitudes:
      |                new_amplitudes.append(amp1 * amp2)
</details>

<!-- ::: -->


#### Entangled states

Not all quantum state vectors of multiple systems are product states.
For example, the quantum state vector

$$
  \frac{1}{\sqrt{2}} \vert 00\rangle + \frac{1}{\sqrt{2}} \vert 11\rangle
  \tag{2.1}
$$

of two qubits is not a product state.
To reason this, we may follow exactly the same argument that we used to prove that the probabilistic state represented by the vector $(1.5)$ is not a product vector.

That is, if $(2.1)$ was a product state, there would exist two qubit quantum state vectors $\vert\phi\rangle$ and $\vert\pi\rangle$ for which

$$
  \vert\phi\rangle\otimes\vert\pi\rangle 
  = \frac{1}{\sqrt{2}} \vert 00\rangle 
  + \frac{1}{\sqrt{2}} \vert 11\rangle.
$$

But then it would necessarily be the case that 

$$
  \langle 0 \vert \phi\rangle 
  \langle 1 \vert \pi\rangle
  = \langle 01 \vert \phi\otimes\pi\rangle
  = 0
$$

implying that $\langle 0 \vert \phi\rangle = 0$ or 
$\langle 1 \vert \pi\rangle = 0$ (or both), contradicting the observation that 

$$
  \langle 0 \vert \phi\rangle \langle 0 \vert \pi\rangle
  = \langle 00 \vert \phi\otimes\pi\rangle 
  = \frac{1}{\sqrt{2}}
$$

and

$$
  \langle 1 \vert \phi\rangle \langle 1 \vert \pi\rangle
  = \langle 11 \vert \phi\otimes\pi\rangle 
  = \frac{1}{\sqrt{2}}
$$

are both nonzero.
The fact that these quantities are both $1/\sqrt{2}$ is not important to this argument — what is important is that both quantities are nonzero.
Thus, for instance, the quantum state

$$
  \frac{3}{5} \vert 00\rangle + \frac{4}{5} \vert 11\rangle
$$

is also not a product state, by the same arguement.

It follows that the quantum state vector $(2.1)$ represents a *correlation* between two systems, and specifically we say that the systems are *entangled*.

Entanglement is a quintessential feature of quantum information that will be discussed in much greater detail in later lessons.
Entanglement can be complicated, particularly for the sorts of noisy quantum states that can be described in the general, density matrix formulation of quantum information that was mentioned in Lesson 1 — but for quantum state vectors in the simplified formulation that we are focusing on in this unit, entanglement is equivalent to correlation.
That is, any quantum state vector that is not a product vector represents an entangled state.

In contrast, the quantum state vector

$$
   \frac{1}{2} \vert 00\rangle
 + \frac{i}{2} \vert 01\rangle
 - \frac{1}{2} \vert 10\rangle
 - \frac{i}{2} \vert 11\rangle
$$

is an example of a product state:

$$
  \frac{1}{2} \vert 00\rangle
  + \frac{i}{2} \vert 01\rangle
  - \frac{1}{2} \vert 10\rangle
  - \frac{i}{2} \vert 11\rangle
  =
  \biggl( 
    \frac{1}{\sqrt{2}}\vert 0\rangle - \frac{1}{\sqrt{2}}\vert 1\rangle
  \biggr)
  \otimes 
  \biggl( 
    \frac{1}{\sqrt{2}}\vert 0\rangle + \frac{i}{\sqrt{2}}\vert 1\rangle
  \biggr).
$$

Hence, this state is not entangled.


#### Bell states

We will now take a look as some important examples of multiple-qubit quantum states, beginning with the *Bell states*.
These are the following four two-qubit states:

$$
\begin{aligned}
  \vert \phi^+ \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 0 \rangle 
                         + \frac{1}{\sqrt{2}} \vert 1\rangle \vert 1 \rangle \\[1mm]
  \vert \phi^- \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 0 \rangle 
                         - \frac{1}{\sqrt{2}} \vert 1\rangle \vert 1 \rangle \\[1mm]
  \vert \psi^+ \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 1 \rangle 
                         + \frac{1}{\sqrt{2}} \vert 1\rangle \vert 0 \rangle \\[1mm]
  \vert \psi^- \rangle & = \frac{1}{\sqrt{2}} \vert 0\rangle \vert 1 \rangle 
                         - \frac{1}{\sqrt{2}} \vert 1\rangle \vert 0 \rangle
\end{aligned}
$$

The Bell states are so-named in honor of John Bell. (*** Link info on John Bell.)

There are a few alternative ways to express these vectors that you may encounter.
Focusing on just the first state $\vert \phi^+\rangle$, we have the following alternative expressions:

1. We may use the fact that $\vert a\rangle \vert b\rangle = \vert ab\rangle$ (for any classical states $a$ and $b$) to instead write

  $$
  \vert \phi^+ \rangle = \frac{1}{\sqrt{2}} \vert 00 \rangle + \frac{1}{\sqrt{2}} \vert 11 \rangle.
  $$

  We have already encountered this state a couple of times, and now we see it as one member of this important   collection.

2. We may choose to write the tensor product symbol explicitly like this:

$$
\vert \phi^+ \rangle 
= \frac{1}{\sqrt{2}} \vert 0\rangle\otimes\vert 0 \rangle + \frac{1}{\sqrt{2}} \vert 1\rangle\otimes \vert 1 \rangle.
$$

3. Presuming that $\vert \phi^+ \rangle$ is being viewed as a quantum state of two qubits named $\mathsf{X}$ and $\mathsf{Y}$, we may subscript the kets to indicate which ones correspond to each of these two qubits, like this:

  $$
  \vert \phi^+ \rangle 
  = \frac{1}{\sqrt{2}} \vert 0\rangle_{\mathsf{X}} \vert 0 \rangle_{\mathsf{Y}} 
  + \frac{1}{\sqrt{2}} \vert 1\rangle_{\mathsf{X}} \vert 1 \rangle_{\mathsf{Y}}.
  $$                         

  Naturally, different names for these qubits could be chosen and used as subscripts in the same way.

4. Finally, following exactly the same convention discussed previously for ordering Cartesian products, we may write the vector $\vert\phi^+\rangle$ explicitly as a column vector:

$$
\vert \phi^+ \rangle = 
\begin{pmatrix}
  \frac{1}{\sqrt{2}}\\
  0\\
  0\\
  \frac{1}{\sqrt{2}}
\end{pmatrix}.
$$

Depending upon the context in which it appears, one of these expressions may be preferred — but they are all equivalent in the sense that they refer to the same vector.
Analogous expressions may be used for the other three Bell states.

Notice that the same argument that establishes that $\vert\phi^+\rangle$ is not a product state reveals that none of the other Bell states is a product state either — all four of the Bell states represent entanglement between two qubits.

The collection of all four Bell states

$$
  \bigl\{\vert \phi^+ \rangle, \vert \phi^- \rangle, \vert \psi^+ \rangle, \vert \psi^+ \rangle\bigr\}
$$

is known as the *Bell basis*; any quantum state vector of two qubits, or indeed any complex vector at all having entries corresponding to the four classical states of two bits, can be expressed as a linear combination of the four Bell states.
For example,

$$
  \vert 0 0 \rangle
  = \frac{1}{\sqrt{2}} \vert \phi^+\rangle 
  + \frac{1}{\sqrt{2}} \vert \phi^-\rangle.
$$

The following code defines simple arithmetic operations for the `Statevector` class. This allows us to add two `Statevector` objects together, and multiply / divide `Statevector` objects by scalars.

In [41]:
class Statevector(Statevector):
    def __add__(self, other):
        """Return new Statevector, which is sum of `self` (LHS) and
        `other` (RHS)."""
        new_amplitudes = []
        for i in range(len(self.amplitudes)):
            new_amplitudes.append(
                self.amplitudes[i] + other.amplitudes[i]
            )
        new_statevector = Statevector(new_amplitudes)
        return new_statevector

    def __sub__(self, other):
        """Return new Statevector, which is `self` (LHS) subtracted
        by `other` (RHS)."""
        new_amplitudes = []
        for i in range(len(self.amplitudes)):
            new_amplitudes.append(
                self.amplitudes[i] - other.amplitudes[i]
            )
        new_statevector = Statevector(new_amplitudes)
        return new_statevector

    def __mul__(self, scalar):
        """Define behaviour for `Statevector()*scalar`."""
        new_amplitudes = [scalar*amp for amp in self.amplitudes]
        return Statevector(new_amplitudes)

    def __rmul__(self, scalar):
        """Define behaviour for `scalar*Statevector()`.
        This is the same as `__mul__` as scalar multiplication commutes"""
        return self.__mul__(scalar)

    def __truediv__(self, scalar):
        """Define behaviour for `Statevector()/scalar`."""
        return self.__mul__(1/scalar)

<!-- ::: q-block.exercise -->

### Code exercise

The code cell below expresses $|00\rangle$ as $\frac{1}{\sqrt{2}}(|\phi^+\rangle + |\phi^-\rangle)$. Try to express other states, such as $\vert 01\rangle$ and $\vert+\rangle \vert+\rangle$, as linear combinations of Bell states. As a harder exercise, can you find an algorithm that takes any `Statevector` and finds a combination of Bell states that produces it?

<!-- ::: -->

In [42]:
phi_plus  = Statevector([1/sqrt(2), 0,          0,         1/sqrt(2)])
phi_minus = Statevector([1/sqrt(2), 0,          0,        -1/sqrt(2)])
psi_plus  = Statevector([ 0,        1/sqrt(2),  1/sqrt(2), 0])
psi_minus = Statevector([ 0,        1/sqrt(2), -1/sqrt(2), 0])

# Create |00> state vector
sv = (phi_plus + phi_minus)/sqrt(2)
sv.draw()

[0.9999999999999998, 0.0, 0.0, 0.0]


<!-- ::: q-block.reminder -->

### Exercise solution

<details>
    <summary>Algorithm to find Bell state coefficients</summary>
First we write each computational basis state as a linear combination of Bell bases:

    pre
      |# Bell basis key: [phi+, phi-, psi+, psi-]
      |sv00 = Statevector([1/sqrt(2),  1/sqrt(2),  0,          0        ])
      |sv01 = Statevector([0,          0,          1/sqrt(2),  1/sqrt(2)])
      |sv10 = Statevector([0,          0,          1/sqrt(2), -1/sqrt(2)])
      |sv11 = Statevector([1/sqrt(2), -1/sqrt(2), 0,           0        ])

In the code above, we use the <code>Statevector</code> class to represent each computational basis state, but using the Bell basis. You can view each amplitude in our state vector as the amplitude of the system being in that Bell state. E.g.: $|00\rangle = \tfrac{1}{\sqrt{2}}|\phi^+\rangle + \tfrac{1}{\sqrt{2}}|\phi^-\rangle$, so its amplitudes are $[\tfrac{1}{\sqrt{2}}, \tfrac{1}{\sqrt{2}}, 0, 0]$. Next, we create a mapping that converts a computational basis state to its corresponding Bell basis state vector.

    pre
      |# Map: computational basis state => Bell basis states
      |comp_to_bell = [sv00, sv01, sv10, sv11]

With this in place, we can sum the Bell-basis <code>Statevectors</code> for each computational basis state, weighted by that state's amplitude. The result is our input <code>Statevector</code>, but written in the Bell basis.

    pre
      |def get_bell_coefficients(state_vector):
      |    """Takes a `Statevector` and returns coefficients that,
      |    when multiplied by |phi+>, |phi->, |psi+>, and |psi->
      |    respectively, equals that state vector."""
      |    bell_coefficients = Statevector([0, 0, 0, 0])
      |    for index, amp in enumerate(state_vector):
      |        bell_coefficients += amp * comp_to_bell[index]
      |    return bell_coefficients.amplitudes

For example:
    
    pre
      |get_bell_coefficients([0, 0, 1, 0])
      |# Should return: ~[0, 0, 0.707, -0.707]

</details>

<!-- ::: -->

#### GHZ and W states

Next we will consider two interesting examples of states of three qubits.

The first example, which we will consider represents a quantum of three qubits $(\mathsf{X},\mathsf{Y},\mathsf{Z})$, is the *GHZ state* (so named in honor of Daniel Greenberger, Michael Horne, and Anton Zeilinger, who first studied some of its properties):

$$
  \frac{1}{\sqrt{2}} \vert 0\rangle \vert 0 \rangle \vert 0\rangle +
  \frac{1}{\sqrt{2}} \vert 1\rangle \vert 1 \rangle \vert 1\rangle.
$$

The second example is the so-called W state:

$$
  \frac{1}{\sqrt{3}} \vert 0\rangle \vert 0 \rangle \vert 1\rangle +
  \frac{1}{\sqrt{3}} \vert 0\rangle \vert 1 \rangle \vert 0\rangle +
  \frac{1}{\sqrt{3}} \vert 1\rangle \vert 0 \rangle \vert 0\rangle. 
$$

Neither of these states is a product state, meaning that they cannot be written as a tensor product of three qubit quantum state vectors.
(*** Possible problem: ask readers to argue this. I don't see how to auto-grade this sort of question.)

We will examine both of these two states further when we discuss partial measurements of quantum states of multiple systems.

#### Additional examples

The examples of quantum states of multiple systems we have seen so far are states of two or three qubits, but we can also have quantum states of multiple systems having different classical state sets.

For example, here is a quantum state of three systems, $\mathsf{X}$, $\mathsf{Y}$, and $\mathsf{Z}$, where the classical state set of $\mathsf{X}$ is the binary alphabet (so $\mathsf{X}$ is a qubit) and the classical state set of $\mathsf{Y}$ and $\mathsf{Z}$ is
$\{\clubsuit,\diamondsuit,\heartsuit,\spadesuit\}$:

$$
  \frac{1}{2} \vert 0 \rangle \vert \heartsuit\rangle 
  \vert \heartsuit \rangle
  +
  \frac{1}{2} \vert 1 \rangle \vert \spadesuit\rangle 
  \vert \heartsuit \rangle
  -
  \frac{1}{\sqrt{2}} \vert 0 \rangle \vert \heartsuit\rangle 
  \vert \diamondsuit \rangle.
$$

And, here is an example of a quantum state of three systems
$(\mathsf{X}, \mathsf{Y}, \mathsf{Z})$, where $\mathsf{X}$, $\mathsf{Y}$, and $\mathsf{Z}$ all share the same classical state set $\{0,1,2\}$:

$$
  \frac{
    \vert 0 \rangle \vert 1 \rangle \vert 2 \rangle
    - \vert 0 \rangle \vert 2 \rangle \vert 1 \rangle
    + \vert 1 \rangle \vert 2 \rangle \vert 0 \rangle
    - \vert 1 \rangle \vert 0 \rangle \vert 2 \rangle
    + \vert 2 \rangle \vert 0 \rangle \vert 1 \rangle
    - \vert 2 \rangle \vert 1 \rangle \vert 0 \rangle
  }{\sqrt{6}}.
$$

Systems having the classical state set $\{0,1,2\}$ are often called
*trits* or, assuming we consider the possibility that they are in quantum states, *qutrits*.
The term *qudit* is often used to refer to a system having classical state set $\{0,\ldots,d-1\}$ for an arbitrary choice of $d$.

### 2.2 Measurements of quantum states <a id='multiple-systems-quantum-measurements'></a>

Measurements — more specifically *standard basis measurements* — of quantum states of single systems were discussed in the previous lesson: if a system having classical state set $\Sigma$ is in a quantum state represented by the vector $\vert \psi \rangle$, and that system is measured (with respect to a standard basis measurement), then each classical state $a\in\Sigma$ appears with probability $\vert \langle a \vert \psi \rangle\vert^2$.
This tells us what happens when we have a quantum state of multiple systems and *every* system is measured.

To be precise, let us suppose that $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are systems having classical state sets $\Sigma_1,\ldots,\Sigma_n$, respectively.
We may then view $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$ collectively as a single system whose classical state set is the Cartesian product $\Sigma_1\times\cdots\times\Sigma_n$.
If a quantum state of this system is represented by the quantum state vector $\vert\psi\rangle$, and every one of the systems is measured, then each $n$-tuple $(a_1,\ldots,a_n)\in\Sigma_1\times\cdots\times\Sigma_n$, which we may view as a string as $a_1\cdots a_n$, is the result of the measurement with probability 
$\vert\langle a_1\cdots a_n\vert \psi\rangle\vert^2$.

For example, if systems $\mathsf{X}$ and $\mathsf{Y}$ are jointly in the quantum state

$$
\frac{3}{5} \vert 0\rangle \vert \heartsuit \rangle
- \frac{4i}{5} \vert 1\rangle \vert \spadesuit \rangle,
$$

then measuring both systems with respect to a standard basis measurement yields the outcome $(0,\heartsuit)$ with probability $9/25$ and the outcome $(1,\spadesuit)$ with probability $16/25$.

#### Partial measurements for two systems

Next, let's consider the situation in which we have multiple systems in some quantum state, and we measure a proper subset of the systems.
As before, we will begin with two systems $\mathsf{X}$ and $\mathsf{Y}$ having classical state sets $\Sigma$ and $\Gamma$, respectively.

In general, a quantum state vector of $(\mathsf{X},\mathsf{Y})$ takes the form

$$
  \vert \psi \rangle 
  = \sum_{(a,b)\in\Sigma\times\Gamma} \alpha_{a,b} \vert ab\rangle,
$$

where $\{\alpha_{a,b} : (a,b)\in\Sigma\times\Gamma\}$ is a collection of complex numbers satisfying

$$
  \sum_{(a,b)\in\Sigma\times\Gamma} \vert \alpha_{a,b} \vert^2 = 1,
$$

which is equivalent to $\vert \psi \rangle$ being a unit vector.

We already know, from the discussion above, that if both $\mathsf{X}$ and $\mathsf{Y}$ were measured, then each possible outcome $(a,b)\in\Sigma\times\Gamma$ would appear with probability

$$
  \bigl\vert \langle ab \vert \psi \rangle \bigr\vert^2 
  = \vert\alpha_{a,b}\vert^2.
$$

Supposing that just the first system $\mathsf{X}$ is measured, the probability for each outcome $a\in\Sigma$ to appear must therefore be equal to

$$
  \sum_{b\in\Gamma} 
  \bigl\vert \langle ab \vert \psi \rangle \bigr\vert^2 
  = 
  \sum_{b\in\Gamma} 
  \vert\alpha_{a,b}\vert^2.
$$

This is consistent with what we have already seen in the probabilistic setting, and is once again consistent with our understanding of physics.
That is, the probability for each particular outcome to appear when $\mathsf{X}$ is measured cannot possibly depend on whether or not $\mathsf{Y}$ was also measured, as that would otherwise allow for superluminal communication.

Having obtained a particular outcome $a\in\Sigma$ of this measurement of $\mathsf{X}$, we expect that the quantum state of $\mathsf{X}$ changes so that it is equal to $\vert a\rangle$, like we had for single systems.
But what happens to the quantum state of $\mathsf{Y}$?

To answer this question and verify our expectations about the state of $\mathsf{X}$, let us describe the joint quantum state of $(\mathsf{X},\mathsf{Y})$, assuming that $\mathsf{X}$ has been measured (with respect to a standard basis measurement) and the result is the classical state $a$.
It may be observed that this description is, in a sense, analogous to what happens in the classical (probabilistic) setting.

First, let us notice that we can express the vector $\vert\psi\rangle$ as

$$
  \vert\psi\rangle
  = \sum_{a\in\Sigma}
  \vert a \rangle
  \otimes \vert \phi_a \rangle
$$

where

$$
  \vert \phi_a \rangle = \sum_{b\in\Gamma} \alpha_{a,b} \vert b\rangle 
$$

for each $a\in\Sigma$.
This follows from the bilinearity of tensor products, and specifically the linearity of the tensor product in the second argument.

Now, as a result of the standard basis measurement of $\mathsf{X}$ resulting in the outcome $a$, we have that the quantum state of the pair $(\mathsf{X},\mathsf{Y})$ together becomes

$$
  \vert a \rangle \otimes 
  \frac{\vert \phi_a \rangle}{\|\vert \phi_a \rangle\|}.
$$

That is, the state "collapses" like in the single-system case, but only as far as is required for the state to be consistent with the measurement of $\mathsf{X}$ having produced the outcome $a$.

Informally speaking, the vector $\vert a \rangle \otimes \vert \phi_a\rangle$ represents a "portion" or "component" of the quantum state vector $\vert \psi\rangle$ that is consistent with the standard basis measurement of $\mathsf{X}$ resulting in the outcome $a$.
We *normalize* this vector — by dividing it by its Euclidean norm, which is equal to $\\|\vert\phi_a\rangle\\|$ — to yield a valid quantum state vector having Euclidean norm equal to $1$.
This normalization step is analogous to what we did in the probabilistic setting when we divided vectors by the sum of their entries to obtain a probability vector.

Let us also observe that the probability that the standard basis measurement of $\mathsf{X}$ results in each outcome $a$ may be written as follows:

$$
  \sum_{b\in\Gamma} \vert\alpha_{a,b}\vert^2 
  = \bigl\| \vert \phi_a \rangle \bigr\|^2.
$$

As an example, let us consider two systems $\mathsf{X}$ and $\mathsf{Y}$
where the classical state set of $\mathsf{X}$ is $\Sigma = \{1,2,3\}$ and the classical state set of $\mathsf{Y}$ is $\Gamma = \{0,1\}$, which is similar to one of the examples we saw for partial measurements in the probabilistic setting.
This time, let us consider the quantum state of these systems represented by the quantum state vector

$$
  \vert \psi \rangle 
  = \frac{1}{\sqrt{2}}  \vert 1,0 \rangle
  - \frac{1}{2\sqrt{3}} \vert 1,1 \rangle
  - \frac{i}{\sqrt{6}}  \vert 2,1 \rangle
  + \frac{i}{2\sqrt{3}} \vert 3,0 \rangle
  + \frac{1}{\sqrt{6}}  \vert 3,1 \rangle.
$$

To obtain this vector we've simply taken the square roots of the entries that we had for the example in the probabilistic case and modified it slightly by multiplying some of the entries by $-1$ and $\pm i$, just to spice it up a bit.

To consider what happens when the first system $\mathsf{X}$ is measured, we begin by writing

$$
\vert \psi \rangle 
 = \vert 1 \rangle \otimes \biggl(
  \frac{1}{\sqrt{2}}  \vert 0 \rangle
  - \frac{1}{2\sqrt{3}} \vert 1 \rangle \biggr)
  + \vert 2 \rangle \otimes \biggl(-\frac{i}{\sqrt{6}} \vert 1 \rangle
  \biggr)
  + \vert 3 \rangle \otimes \biggl(
  \frac{i}{2\sqrt{3}} \vert 0 \rangle 
  + \frac{1}{\sqrt{6}} \vert 1 \rangle \biggr).
$$

We now see, based on the description above, that the probability for the measurement to result in the outcome $1$ is

$$
\biggl\|\frac{1}{\sqrt{2}}  \vert 0 \rangle
  - \frac{1}{2\sqrt{3}} \vert 1 \rangle\biggr\|^2
  = \frac{1}{2} + \frac{1}{12}
  = \frac{7}{12},
$$

in which case the state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 1\rangle \otimes 
  \frac{\frac{1}{\sqrt{2}}  \vert 0 \rangle
  - \frac{1}{2\sqrt{3}} \vert 1 \rangle}{\sqrt{\frac{7}{12}}}
  = \vert 1\rangle \otimes \biggl( \sqrt{\frac{6}{7}} \vert 0 \rangle 
  - \frac{1}{\sqrt{7}} \vert 1\rangle\biggr);
$$

the probability for the measurement to result in the outcome $2$ is

$$
  \biggl\| -\frac{i}{\sqrt{6}} \vert 1 \rangle\biggr\|^2 = \frac{1}{6},
$$

in which case the state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 2\rangle \otimes
  \frac{-\frac{i}{\sqrt{6}} \vert 1 \rangle}{\sqrt{\frac{1}{6}}}
  = -i \hspace{1pt}\vert 2 \rangle \otimes \vert 1\rangle;
$$

and the probability for the measurement to result in the outcome $3$ is

$$
\biggl\|\frac{i}{2\sqrt{3}} \vert 0 \rangle 
  + \frac{1}{\sqrt{6}} \vert 1 \rangle\biggr\|^2
  = \frac{1}{12} + \frac{1}{6}
  = \frac{1}{4},
$$

in which case the state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \vert 3 \rangle \otimes
  \frac{\frac{i}{2\sqrt{3}} \vert 0 \rangle 
  + \frac{1}{\sqrt{6}} \vert 1 \rangle}{\frac{1}{2}}
  = \vert 3\rangle \otimes \biggl(\frac{i}{\sqrt{3}} \vert 0 \rangle 
  + \sqrt{\frac{2}{3}} \vert 1\rangle\biggr).
$$

The same technique reveals what happens if the second system $\mathsf{Y}$ is measured rather than the first, where the roles of the two systems are exchanged.
We rewrite the vector $\vert \psi \rangle$ as 

$$
  \vert \psi \rangle 
  = \biggl( 
    \frac{1}{\sqrt{2}} \vert 1 \rangle 
    + \frac{i}{2\sqrt{3}} \vert 3 \rangle
  \biggr) \otimes \vert 0\rangle
  + \biggl(
    -\frac{1}{2\sqrt{3}} \vert 1 \rangle 
    - \frac{i}{\sqrt{6}} \vert 2\rangle 
    + \frac{1}{\sqrt{6}} \vert 3 \rangle
  \biggr) \otimes \vert 1\rangle.
$$

The probability that the measurement of $\mathsf{Y}$ yields the outcome $0$ is 

$$
\biggl\| \frac{1}{\sqrt{2}} \vert 1 \rangle + \frac{i}{2\sqrt{3}} \vert 3 \rangle \biggr\|^2
= \frac{1}{2} + \frac{1}{12} = \frac{7}{12},
$$

in which case the state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
  \frac{\frac{1}{\sqrt{2}} \vert 1 \rangle 
  + \frac{i}{2\sqrt{3}} \vert 3 \rangle}{\sqrt{\frac{7}{12}}} \otimes \vert 0 \rangle
  = \biggl(\sqrt{\frac{6}{7}} \vert 1 \rangle + \frac{i}{\sqrt{7}} \vert 3 \rangle\biggr) \otimes\vert 0 \rangle;
$$

and the probability that the measurement outcome is $1$ is

$$
  \biggl\|
    -\frac{1}{2\sqrt{3}} \vert 1 \rangle 
    - \frac{i}{\sqrt{6}} \vert 2\rangle 
    + \frac{1}{\sqrt{6}} \vert 3 \rangle
  \biggr\|^2
  = \frac{1}{12} + \frac{1}{6} + \frac{1}{6}
  = \frac{5}{12},
$$

in which case the state of $(\mathsf{X},\mathsf{Y})$ becomes

$$
\frac{
  -\frac{1}{2\sqrt{3}}\vert 1\rangle 
  -\frac{i}{\sqrt{6}}\vert 2\rangle 
  +\frac{1}{\sqrt{6}}\vert 3\rangle}{\sqrt{\frac{5}{12}}}
  \otimes \vert 1\rangle
  = \biggl(-\frac{1}{\sqrt{5}} \vert 1\rangle
  - i \sqrt{\frac{2}{5}} \vert 2\rangle 
  + \sqrt{\frac{2}{5}} \vert 3\rangle\biggr) \otimes \vert 1\rangle.
$$

#### Remark on reduced quantum states

At this point, let us highlight a limitation of the simplified description of quantum information: it offers us no way to describe the reduced (or marginal) quantum state of just one of two systems (or a proper subset of any number of systems) like we did in the probabilistic case.

Specifically, we said that for a probabilistic state of two systems $(\mathsf{X},\mathsf{Y})$ described by a probability vector 

$$
  \vert \psi \rangle 
  = \sum_{(a,b)\in\Sigma\times\Gamma}
  p_{a,b} \vert ab\rangle,
$$

the *reduced* (or *marginal*) state of $\mathsf{X}$ alone is described by the probability vector

$$
  \sum_{(a,b)\in\Sigma\times\Gamma}
  p_{a,b} \vert a\rangle.
$$

For quantum state vectors, there is no analog — for a quantum state vector

$$
  \vert \phi \rangle 
  = \sum_{(a,b)\in\Sigma\times\Gamma}
  \alpha_{a,b} \vert ab\rangle,
$$

the vector

$$
  \vert \phi \rangle 
  = \sum_{(a,b)\in\Sigma\times\Gamma}
  \alpha_{a,b} \vert a\rangle
$$

is not a quantum state vector in general, and does not properly represent the concept of a reduced or marginal state.
It could be, in fact, that this vector is the zero vector.

So, what we must do instead is turn to the general description of quantum information, where reduced states can be defined in a meaningful way that is analogous to the probabilistic setting.
This is the first example we have seen thus far for the advantages of the general description of quantum information, and there will be others.

#### Partial measurements for three or more systems

As was stated earlier, partial measurements for three or more systems can be reduced to the case of two systems by dividing the systems into two collections: those that are measured and those that are not.

Here is an example that illustrates how this can be done for a specific example.
We will use the technique of subscripting kets by the names of the systems they represent, which effectively allows us to express permutations of the systems.
For this example, we have a quantum state of 5 systems $\mathsf{X}_1,\ldots,\mathsf{X}_5$, all sharing the same classical state set $\{\clubsuit,\diamondsuit,\heartsuit,\spadesuit\}$:

$$
\begin{gathered}
\sqrt{\frac{1}{7}} 
\vert\heartsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\spadesuit\rangle
+
\sqrt{\frac{2}{7}}
\vert\diamondsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\clubsuit\rangle
+
\sqrt{\frac{1}{7}}
\vert\spadesuit\rangle \vert\spadesuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\clubsuit\rangle
\\
-i
\sqrt{\frac{2}{7}}
\vert\heartsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\heartsuit\rangle \vert\heartsuit\rangle
-
\sqrt{\frac{1}{7}}
\vert\spadesuit\rangle \vert\heartsuit\rangle \vert\clubsuit\rangle \vert\spadesuit\rangle \vert\clubsuit\rangle.
\end{gathered}
$$

In this example, we've omitted the tensor product symbols; they're implicit between the kets. We will consider the situation in which the first and third systems are measured, and the remaining systems are left alone. 

Conceptually speaking, we can simply imagine that the first and third systems form a single compound system that gets measured, while the remaining systems form a second compound system that is not measured, and then follow the prescription described previously for two systems.

Unfortunately, given that the systems that are measured are interspersed with the ones that are not, we face a hurtle in writing down the expressions needed to perform these calculations.
A way to proceed is to subscript the kets to indicate which systems they refer to, and to give ourselves the freedom to change their ordering, as we will now describe.

First, the quantum state vector above can alternatively be written as

$$
\begin{gathered}
\sqrt{\frac{1}{7}} 
\vert\heartsuit\rangle_1 \vert\clubsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+
\sqrt{\frac{2}{7}}
\vert\diamondsuit\rangle_1 \vert\clubsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5
+
\sqrt{\frac{1}{7}}
\vert\spadesuit\rangle_1 \vert\spadesuit\rangle_2 \vert\clubsuit\rangle_3 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5\\
-i
\sqrt{\frac{2}{7}}
\vert\heartsuit\rangle_1 \vert\clubsuit\rangle_2 \vert\diamondsuit\rangle_3 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
-
\sqrt{\frac{1}{7}}
\vert\spadesuit\rangle_1 \vert\heartsuit\rangle_2 \vert\clubsuit\rangle_3 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5.
\end{gathered}
$$

Nothing here has changed except that each ket now has a subscript indicating which system it corresponds to.
Here we have used the subscripts $1,\ldots,5$, but the names of the systems themselves could also be used (in a situation where we have system names such as $\mathsf{X}$, $\mathsf{Y}$, and $\mathsf{Z}$, for instance).

We can then re-ordered the kets and collect terms as follows:

$$
\begin{aligned}
& 
\sqrt{\frac{1}{7}}
\vert\heartsuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
+
\sqrt{\frac{2}{7}}
\vert\diamondsuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5
+
\sqrt{\frac{1}{7}}
\vert\spadesuit\rangle_1 \vert\clubsuit\rangle_3 \vert\spadesuit\rangle_2 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5 \\
& \quad -i
\sqrt{\frac{2}{7}}
\vert\heartsuit\rangle_1 \vert\diamondsuit\rangle_3 \vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
-
\sqrt{\frac{1}{7}}
\vert\spadesuit\rangle_1 \vert\clubsuit\rangle_3 \vert\heartsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5\\[2mm]
& \hspace{1.5cm} = \vert\heartsuit\rangle_1 \vert\diamondsuit\rangle_3 
\biggl(
\sqrt{\frac{1}{7}} \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
-i \sqrt{\frac{2}{7}} \vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
\biggr)\\
& \hspace{1.5cm} \quad
+ \vert\diamondsuit\rangle_1 \vert\diamondsuit\rangle_3 
\biggl(
\sqrt{\frac{2}{7}} \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5 
\biggr)\\
& \hspace{1.5cm} \quad + \vert\spadesuit\rangle_1 \vert\clubsuit\rangle_3
\biggl(
\sqrt{\frac{1}{7}} \vert\spadesuit\rangle_2 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5
- \sqrt{\frac{1}{7}} \vert\heartsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5\biggr).
\end{aligned}
$$

(The tensor products are still implicit, even when parentheses are used, as in this example.)

We now see that if the systems $\mathsf{X}_1$ and $\mathsf{X}_3$ are measured, the (nonzero) probabilities of the different outcomes are as follow:

  - The measurement outcome $(\heartsuit,\diamondsuit)$ occurs with probability
  
  $$
  \biggl\|
  \sqrt{\frac{1}{7}} \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
-i \sqrt{\frac{2}{7}} \vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5
  \biggr\|^2 = \frac{1}{7} + \frac{2}{7} = \frac{3}{7}
  $$
  

  - The measurement outcome $(\diamondsuit,\diamondsuit)$ occurs with probability

  
  $$
  \biggl\|
  \sqrt{\frac{2}{7}} \vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5 
  \biggr\|^2 = \frac{2}{7}
  $$
 

 
  - The measurement outcome $(\spadesuit,\clubsuit)$ occurs with probability
  
   $$
   \biggl\|
\sqrt{\frac{1}{7}} \vert\spadesuit\rangle_2 \vert\diamondsuit\rangle_4 \vert\clubsuit\rangle_5
- \sqrt{\frac{1}{7}} \vert\heartsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\clubsuit\rangle_5
   \biggr\|^2 = \frac{1}{7} + \frac{1}{7} = \frac{2}{7}.
   $$
  

If the measurement outcome is $(\heartsuit,\diamondsuit)$, for instance, we have that the state of $(\mathsf{X}_1,\ldots,\mathsf{X}_5)$ becomes 

$$
\begin{aligned}
& \vert \heartsuit\rangle_1 \vert \diamondsuit \rangle_3
\otimes
\frac{
\sqrt{\frac{1}{7}}
\vert\clubsuit\rangle_2 \vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
- i
\sqrt{\frac{2}{7}}
\vert\clubsuit\rangle_2 \vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5}
{\sqrt{\frac{3}{7}}}\\
& \qquad
= 
\sqrt{\frac{1}{3}}
\vert \heartsuit\rangle_1 \vert\clubsuit\rangle_2 \vert \diamondsuit \rangle_3\vert\spadesuit\rangle_4 \vert\spadesuit\rangle_5
-i
\sqrt{\frac{2}{3}}
\vert \heartsuit\rangle_1 \vert\clubsuit\rangle_2 \vert \diamondsuit \rangle_3\vert\heartsuit\rangle_4 \vert\heartsuit\rangle_5.
\end{aligned}
$$

For other measurement outcomes the state can be determined in a similar way.

Now, it must be understood that the tensor product is not commutative: if $\vert \phi\rangle$ and $\vert \pi \rangle$ are vectors, then, in general, $\vert \phi\rangle\otimes\vert \pi \rangle$ is different from $\vert \phi\rangle\otimes\vert \pi \rangle$, and likewise for tensor products of three or more vectors.
For instance, 
$\vert\heartsuit\rangle \vert\clubsuit\rangle \vert\diamondsuit\rangle \vert\spadesuit\rangle \vert\spadesuit\rangle$
is a different vector than 
$\vert\heartsuit\rangle \vert\diamondsuit\rangle \vert\clubsuit\rangle \vert\spadesuit\rangle \vert\spadesuit\rangle$.
The technique just described of re-ordering kets should not be interpreted as suggesting otherwise.
Rather, for the sake of performing calculations and expressing the results, we are simply making a decision that it is more convenient to collect the systems $\mathsf{X}_1,\ldots,\mathsf{X}_5$ together as $(\mathsf{X}_1,\mathsf{X}_3,\mathsf{X}_2,\mathsf{X}_4,\mathsf{X}_5)$ rather than $(\mathsf{X}_1,\mathsf{X}_2,\mathsf{X}_3,\mathsf{X}_4,\mathsf{X}_5)$.
The subscripts on the kets serve to keep this all straight.

Analogously, in the closely related but simpler setting of Cartesian products and ordered pairs, if $a$ and $b$ are different classical states, then $(a,b)$ and $(b,a)$ are also different.
Nevertheless, saying that the classical state of two bits $(\mathsf{X},\mathsf{Y})$ is $(1,0)$ is equivalent to
saying that the classical state of $(\mathsf{Y},\mathsf{X})$ is $(0,1)$; when every system has its own unique name, it doesn't really matter what order we choose to list them, so long as the ordering is made clear.

Finally, here are two examples involving the GHZ and W states, as promised earlier.
First let us consider the GHZ state

$$
\frac{1}{\sqrt{2}} \vert 0\rangle\vert 0\rangle\vert 0\rangle + \frac{1}{\sqrt{2}} \vert 1\rangle\vert 1\rangle\vert 1\rangle.
$$

If just the first system is measured, we obtain the outcome $0$ with probability $1/2$, in which case the state of the three qubits becomes $\vert 0\rangle\vert 0\rangle\vert 0\rangle$; and we also obtain the outcome $1$ with probability $1/2$, in which case the state of the three qubits becomes $\vert 1\rangle\vert 1\rangle\vert 1\rangle$.

Next let us consider a W state, which can be written like this:

$$
\begin{aligned}
&
\frac{1}{\sqrt{3}} \vert 0\rangle \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 0\rangle \vert 1 \rangle \vert 0\rangle +
\frac{1}{\sqrt{3}} \vert 1\rangle \vert 0 \rangle \vert 0\rangle \\
& \qquad
= \vert 0 \rangle \biggl(
\frac{1}{\sqrt{3}} \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0\rangle\biggr)
+ \vert 1 \rangle \biggl(\frac{1}{\sqrt{3}}\vert 0\rangle \vert 0\rangle\biggr).
\end{aligned}
$$

The probability that a measurement of the first qubit results in the outcome 0 is therefore equal to

$$
\biggl\| 
\frac{1}{\sqrt{3}} \vert 0 \rangle \vert 1\rangle +
\frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0\rangle
\biggr\|^2 = \frac{2}{3},
$$

and conditioned upon the measurement producing this outcome, the quantum state of the three qubits becomes

$$
\vert 0\rangle\otimes
  \frac{
    \frac{1}{\sqrt{3}} \vert 0 \rangle \vert 1\rangle +
    \frac{1}{\sqrt{3}} \vert 1 \rangle \vert 0\rangle
  }{
    \sqrt{\frac{2}{3}}
  }
  = \vert 0\rangle \biggl(\frac{1}{\sqrt{2}} \vert 0 \rangle \vert 1\rangle 
    + \frac{1}{\sqrt{2}} \vert 1 \rangle \vert 0\rangle \biggr)
  = \vert 0\rangle\vert \psi^+\rangle.
$$

The probability that the measurement outcome is $1/3$, in which case the state of the three qubits becomes
$\vert 1\rangle \vert 0\rangle \vert 0\rangle$.

### 2.3 Unitary operations <a id='multiple-systems-quantum-operations'></a>

Following precisely the same line of thought as in the previous sections of this lesson, which is to view multiple systems collectively as single systems, we recognize that operations on multiple systems are represented by unitary matrices having rows and columns placed in correspondence with the Cartesian product of the classical state sets of the individual systems under consideration.
In principle, any unitary matrix whose rows and columns correspond to the classical states of whatever system we're thinking about represents a valid operation that can be applied to quantum state vectors of that system — and this includes joint systems whose classical state sets are Cartesian products of the classical state sets of two or more individual systems.

Focusing on two systems, if $\mathsf{X}$ is a system having classical state set $\Sigma$ and $\mathsf{Y}$ is a system having classical state set $\Gamma$, then the classical state set of the joint system $(\mathsf{X},\mathsf{Y})$ is $\Sigma\times\Gamma$ — and therefore the set of operations that can be performed on this joint system are represented by unitary matrices whose rows and columns are placed in correspondence with the set $\Sigma\times\Gamma$.
The ordering of the rows and columns of these matrices is the same as the ordering used for quantum state vectors of the system $(\mathsf{X},\mathsf{Y})$.

For example, let us suppose that $\Sigma = \{1,2,3\}$ and $\Gamma = \{0,1\}$, and recall that the standard convention for ordering the elements of the Cartesian product $\{1,2,3\}\times\{0,1\}$ is $(1,0)$, $(1,1)$, $(2,0)$, $(2,1)$, $(3,0)$, $(3,1)$.
Here is an example of a unitary matrix representing an operation on $(\mathsf{X},\mathsf{Y})$:

$$
U = 
\begin{pmatrix}
  \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & 0 & 0 & \frac{1}{2} \\
  \frac{1}{2} & \frac{i}{2} & -\frac{1}{2} & 0 & 0 & -\frac{i}{2} \\
  \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} & 0 & 0 & -\frac{1}{2} \\
  0 & 0 & 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0\\
  \frac{1}{2} & -\frac{i}{2} & -\frac{1}{2} & 0 & 0 & \frac{i}{2} \\
  0 & 0 & 0 &  -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0
\end{pmatrix}.
$$

This unitary operation does not have any special significance, but one may check that $U^{\dagger} U = \mathbb{1}$, so $U$ is unitary.

(*** Code block checking that $U$ is unitary?)

The action if $U$ on the standard basis vector $\vert 1,1 \rangle$, for instance, is

$$
U \vert 1,1\rangle = 
\frac{1}{2} \vert 1,0 \rangle
+ \frac{i}{2} \vert 1,1 \rangle
- \frac{1}{2} \vert 2,0 \rangle
- \frac{i}{2} \vert 3, 0\rangle,
$$

which we can see by examining the second column of $U$, considering our ordering of the set $\{1,2,3\}\times\{0,1\}$.

As an aside, it would be possible to express $U$ using the Dirac notation (as it is always possible to express any matrix using the Dirac notation), using 20 terms for the 20 nonzero entries of $U$.
If we did write down all of these terms rather than writing a $6\times 6$ matrix, we might miss certain patterns that are evident from the matrix expression.
Simply put, the Dirac notation is not always the best choice for how to represent matrices.

Unitary operations on three or more systems work in a similar way, with the unitary matrices having rows and columns corresponding to the Cartesian product of the classical state sets of the systems.

We have already seen an example in this lesson: the three-qubit operation

$$
\sum_{k = 0}^{7} \vert (k+1) \bmod 8 \rangle \langle k \vert
$$

from before, where $\vert j \rangle$ means the three bit binary encoding of the number $j$, is unitary.
Operations that are both unitary and represent deterministic operations are called *reversible* operations.
The conjugate transpose of this matrix can be written like this:

$$
\sum_{k = 0}^{7} \vert k \rangle \langle (k+1) \bmod 8 \vert
= 
\sum_{k = 0}^{7} \vert (k-1) \bmod 8 \rangle \langle k \vert.
$$

This matrix represents the *reverse*, or in mathematical terms the *inverse*, of the original operation — as we expect of the conjugate transpose of a unitary matrix.

We will see other examples of unitary operations on multiple systems as the lesson continues.

#### Unitary operations performed independently on individual systems

When unitary operations are performed independently on a collection of individual systems, the combined action of these independent operations is described by the tensor product of the unitary matrices that represent them.
That is, if $\mathsf{X}_1,\ldots,\mathsf{X}_n$ are quantum systems, $U_1,\ldots, U_n$ are unitary matrices representing operations on these systems, and the operations are performed independently on the systems, the combined action on $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$ is represented by the matrix $U_1\otimes\cdots\otimes U_n$.
Once again, we find that the probabilistic and quantum settings are analogous in this regard.

One would naturally expect, from reading the previous paragraph, that the tensor product of any collection of unitary matrices is unitary.
Indeed this is true, and we can verify it as follows.

Notice first that the conjugate transpose operation satisfies

$$
  (M_1 \otimes \cdots \otimes M_n)^{\dagger} = M_1^{\dagger} \otimes \cdots \otimes M_n^{\dagger}
$$

for any collection of matrices $M_1,\ldots,M_n$.
This can be checked by going back to the definition of the tensor product and of the conjugate transpose, and checking that each entry of the two sides of the equation are in agreement.
This means that 

$$
 (U_1 \otimes \cdots \otimes U_n)^{\dagger} (U_1\otimes\cdots\otimes U_n) 
 = (U_1^{\dagger} \otimes \cdots \otimes U_n^{\dagger}) (U_1\otimes\cdots\otimes U_n).
$$

Because the tensor product of matrices is multiplicative, we find that

$$
  (U_1^{\dagger} \otimes \cdots \otimes U_n^{\dagger}) (U_1\otimes\cdots\otimes U_n)
  = (U_1^{\dagger} U_1) \otimes \cdots \otimes (U_n^{\dagger} U_n)
  = \mathbb{1}_1 \otimes \cdots \otimes \mathbb{1}_n.
$$

Here we have written $\mathbb{1}_1,\ldots,\mathbb{1}_n$ to refer to the matrices representing the identity operation on the systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$ — which is to say that these are identity matrices whose sizes agree with the number of classical states of $\mathsf{X}_1,\ldots,\mathsf{X}_n$.

Finally, the tensor product $\mathbb{1}_1 \otimes \cdots \otimes \mathbb{1}_n$ is equal to the identity matrix, where we have a number of rows and columns that agrees with the product of the number of rows and columns of the matrices $\mathbb{1}_1,\ldots,\mathbb{1}_n$. 
We may view this larger identity matrix as representing the identity operation on the joint system $(\mathsf{X}_1,\ldots,\mathsf{X}_n)$.

In summary, we have the following sequence of equalities:

$$
\begin{aligned}
  & (U_1 \otimes \cdots \otimes U_n)^{\dagger} (U_1\otimes\cdots\otimes U_n) \\
  & \quad = (U_1^{\dagger} \otimes \cdots \otimes U_n^{\dagger}) (U_1\otimes\cdots\otimes U_n) \\
  & \quad = (U_1^{\dagger} U_1) \otimes \cdots \otimes (U_n^{\dagger} U_n)\\
  & \quad = \mathbb{1}_{1} \otimes \cdots \otimes \mathbb{1}_{n}\\
  & \quad = \mathbb{1}.
\end{aligned}
$$

We therefore conclude that $U_1 \otimes \cdots \otimes U_n$ is unitary.

An important situation that often arises is one in which a unitary operation is applied to just one system — or a proper subset of systems — within a larger joint system.
For instance, suppose that $\mathsf{X}$ and $\mathsf{Y}$ are systems that we can view together as forming a single, compound system $(\mathsf{X},\mathsf{Y})$, and we perform an operation just on the system $\mathsf{X}$.
To be precise, let us suppose that $U$ is a unitary matrix representing an operation on $\mathsf{X}$, so that its rows and columns have been placed in correspondence with the classical states of $\mathsf{X}$.

To say that we perform the operation represented by $U$ just on the system $\mathsf{X}$ implies that we do nothing to $\mathsf{Y}$, meaning that we independently perform $U$ on $\mathsf{X}$ and the *identity operation* on $\mathsf{Y}$.
That is, "doing nothing" to $\mathsf{Y}$ is equivalent to performing the identity operation on $\mathsf{Y}$, which is represented by the identity matrix $\mathbb{1}_\\mathsf{Y}$.
(Here, by the way, the subscript $\mathsf{Y}$ tells us that $\mathbb{1}_\mathsf{Y}$ refers to the identity matrix having a number of rows and columns in agreement with the classical state set of $\mathsf{Y}$.)
The operation on $(\mathsf{X},\mathsf{Y})$ that is obtained when we perform $U$ on $\mathsf{X}$ and do nothing to $\mathsf{Y}$ is therefore represented by the unitary matrix

$$
  U \otimes \mathbb{1}_{\mathsf{Y}}.
$$

For example, if $\mathsf{X}$ and $\mathsf{Y}$ are qubits, performing a Hadamard operation on $\mathsf{X}$ (and doing nothing to $\mathsf{Y}$) is equivalent to performing the operation

$$
  H \otimes \mathbb{1}_{\mathsf{Y}} = 
  \begin{pmatrix}
    \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\
    \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}
  \end{pmatrix}
  \otimes 
  \begin{pmatrix}
    1 & 0\\
    0 & 1
  \end{pmatrix}
  =
  \begin{pmatrix}
    \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} & 0\\
    0 & \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}}\\
    \frac{1}{\sqrt{2}} & 0 & -\frac{1}{\sqrt{2}} & 0\\
    0 & \frac{1}{\sqrt{2}} & 0 & -\frac{1}{\sqrt{2}}
  \end{pmatrix}
$$

on the joint system $(\mathsf{X},\mathsf{Y})$.

Along similar lines, we may consider that an operation represented by a unitary matrix $U$ is applied to $\mathsf{Y}$ and nothing is done to $\mathsf{X}$, in which case the resulting operation on $(\mathsf{X},\mathsf{Y})$ is represented by the unitary matrix

$$
  \mathbb{1}_{\mathsf{X}} \otimes U.
$$

For example, if we again consider the situation in which both $\mathsf{X}$ and $\mathsf{Y}$ are qubits and $U$ is a Hadamard operation, the resulting operation on $(\mathsf{X},\mathsf{Y})$ is represented by the matrix

$$
  \begin{pmatrix}
    1 & 0\\
    0 & 1
  \end{pmatrix}
  \otimes
  \begin{pmatrix}
    \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\
    \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}
  \end{pmatrix} 
  =
  \begin{pmatrix}
    \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 & 0\\
    \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 & 0\\
    0 & 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\
    0 & 0 & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}
  \end{pmatrix}.
$$

Not every unitary operation on a collection of systems $\mathsf{X}_1,\ldots,\mathsf{X}_n$ can be written as a tensor product of unitary operations $U_1\otimes\cdots\otimes U_n$, just like not every quantum state vector of these systems is a product state.
For example, neither the swap operation nor the controlled-NOT operation on two qubits, which are described below, can be expressed as a tensor product of unitary operations.

#### The swap operation 

To conclude the lesson, let's take a look at two classes of examples of unitary operations on multiple systems, beginning with the *swap operation*.

Suppose that $\mathsf{X}$ and $\mathsf{Y}$ are systems that share the same classical state set $\Sigma$.
The *swap* operation on the pair $(\mathsf{X},\mathsf{Y})$ is the operation that exchanges the contents of
the two systems, but otherwise leaves the systems alone (so that $\mathsf{X}$ remains on the left and $\mathsf{Y}$
remains on the right).
  
We will denote this operation as $\operatorname{SWAP}$.
It operates like this for every choice of classical states $a,b\in\Sigma$:
  
$$
\operatorname{SWAP} \vert a \rangle \vert b \rangle = \vert b \rangle \vert a \rangle.
$$

One way to write the matrix associated with this operation using the Dirac notation is as follows:
  
$$
\mathrm{SWAP} = \sum_{c,d\in\Sigma} \vert c \rangle \langle d \vert \otimes \vert d \rangle \langle c \vert.
$$

It may not be immediate why this operation can be expressed in this way, but it can be checked that the matrix
expressed in this way satisfies the condition 
$\operatorname{SWAP} \vert a \rangle \vert b \rangle = \vert b \rangle \vert a \rangle$ 
for every choice of classical states $a,b\in\Sigma$.
  
As a simple example, when $\mathsf{X}$ and $\mathsf{Y}$ are qubits, we find that

$$
  \operatorname{SWAP} =
  \begin{pmatrix}
  1 & 0 & 0 & 0\\
  0 & 0 & 1 & 0\\
  0 & 1 & 0 & 0\\
  0 & 0 & 0 & 1
  \end{pmatrix}.
$$

#### Controlled-unitary operations
   
Now let us suppose that $\mathsf{X}$ is a qubit and $\mathsf{Y}$ is an arbitrary system, having whatever classical 
state set we wish.

For every unitary operation $U$ acting on the system $\mathsf{Y}$, a *controlled* $U$ operation is a unitary
operation on the pair $(\mathsf{X},\mathsf{Y})$ defined as follows:

$$ 
cU = 
\vert 0\rangle \langle 0\vert \otimes \mathbb{1}_{\mathsf{Y}} + \vert 1\rangle \langle 1\vert \otimes U.
$$

For example, if $\mathsf{Y}$ is also a qubit and we write $X = \sigma_x$ to denote the Pauli-x operation, then the
controlled-$X$ operation is given by
  
$$
  \mathrm{c}X = 
  \vert 0\rangle \langle 0\vert \otimes \mathbb{1}_{\mathsf{Y}} + \vert 1\rangle \langle 1\vert \otimes X
  = 
  \begin{pmatrix}
  1 & 0 & 0 & 0\\
  0 & 1 & 0 & 0\\
  0 & 0 & 0 & 1\\
  0 & 0 & 1 & 0
  \end{pmatrix}.
$$

We already encountered this operation in the context of classical information and probabilistic operations 
earlier in the lesson.

If instead we consider the Pauli-z operation on $\mathsf{Y}$ in place of the $X$ operation, we obtain this
operation:
  
$$
  \mathrm{c}Z = 
  \vert 0\rangle \langle 0\vert \otimes \mathbb{1}_{\mathsf{Y}} + \vert 1\rangle \langle 1\vert \otimes Z
  = 
  \begin{pmatrix}
  1 & 0 & 0 & 0\\
  0 & 1 & 0 & 0\\
  0 & 0 & 1 & 0\\
  0 & 0 & 0 & -1
  \end{pmatrix}.
$$

If instead we take $\mathsf{Y}$ to be two qubits, and we take $U$ to be the *swap operation* between these two
qubits, we obtain this operation:

$$
  \mathrm{cSWAP} = 
  \begin{pmatrix}
  1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
  0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
  0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
  0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\  
  0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
  0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
  0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
  0 & 0 & 0 & 0 & 0 & 0 & 0 & 1
  \end{pmatrix}.
$$

This operation is also known as a *Fredkin operation* (or, more commonly, a *Fredkin gate*), named for
Edward Fredkin. Its action on standard basis states can be described as follows:

$$
  \begin{aligned}
    \operatorname{cSWAP} \vert 0 b c \rangle 
    & = \vert 0 b c \rangle \\[1mm]
    \operatorname{cSWAP} \vert 1 b c \rangle 
    & = \vert 1 c b \rangle
  \end{aligned}
$$
  
Finally, *controlled-controlled-NOT operation*, which we may denote as $\mathrm{cc}X$, 
is called a *Toffoli operation* (or *Toffoli gate*), named for Tommaso Toffoli.
Its matrix representation looks like this:

$$
  \mathrm{cc}X = 
  \begin{pmatrix}
    1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
    0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
    0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
    0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
    0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
    0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
    0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
    0 & 0 & 0 & 0 & 0 & 0 & 1 & 0
  \end{pmatrix}.
$$

We may alternatively express it using the Dirac notation as follows:
  
$$
  \mathrm{cc}X = \bigl(
    \vert 00 \rangle \langle 00 \vert 
    + \vert 01 \rangle \langle 01 \vert 
    + \vert 10 \rangle \langle 10 \vert \bigr) \otimes \mathbb{1}
    + \vert 11 \rangle \langle 11 \vert \otimes X.
$$