# Circuits, protocols, and games

In this lesson we introduce *quantum circuits* and use them to describe a few key examples of *protocols* and *games*, which illustrate interesting capabilities of quantum information.
In particular, we will discuss quantum teleportation, superdense coding, and the CHSH game, all three of which are fundamentally important examples in quantum information.

Along the way, we will discuss *inner products* between vectors together with the notions of *orthogonality* and *orthonormality*, as well as *projections* and *projective measurements*, which generalize standard basis measurements.
We'll also discuss some limitations of quantum information, including the *no-cloning theorem* and the impossibility to perfectly discriminate non-orthogonal quantum states.

## 1. Circuits

In this section we will introduce the quantum circuit model.

To begin, we'll briefly discuss the general notion of a circuit.
In this context, the word *circuit* refers to a model of computation in computer science, where information is carried by wires through a network of *gates*, which represent operations that transform the information carried by the wires.
Although the word "circuit" in English often refers to a circular path through or around something, the use of this terminology to refer to a model of computation is historical in origin: circular paths aren't actually allowed in the most typically studied circuit models of computation.
That is to say, we usually study *acyclic circuits* when we're thinking about circuits as computational models, and quantum circuits are no different in this regard; although we can imagine iterating a particular quantum circuit as many times as we would like, a quantum circuit itself represents a finite sequence of operations that can't contain feedback loops.

### Boolean circuits

Here is an example of a (classical) Boolean circuit, where the wires carry binary values and the gates represent Boolean logic operations:

![Example of a Boolean circuit](images/Boolean-circuit-XOR.png)

The flow of information along the wires goes from left to right: the wires on the left-hand side of the figure labeled $\mathsf{X}$ and $\mathsf{Y}$ are input bits, which can each be set to whatever binary value we choose, and the wire on the right-hand side is the output.
The intermediate wires take whatever values are determined by the gates, which are evaluated from left to right.

The gates are AND gates (labeled $\wedge$), OR gates (labeled $\vee$), and NOT gates (labeled $\neg$).
The functions computed by these gates will likely be familiar to many readers, but here they are represented by tables of values:

$$
\rule[-10mm]{0mm}{15mm}
\begin{array}[t]{c|c}
  a & \neg a\\
  \hline
  0 & 1\\
  1 & 0
\end{array}
\hspace{1.5cm}
\begin{array}[t]{c|c}
  ab & a \wedge b\\
  \hline
  00 & 0\\
  01 & 0\\
  10 & 0\\
  11 & 1
\end{array}
\hspace{1.5cm}
\begin{array}[t]{c|c}
  ab & a \vee b\\
  \hline
  00 & 0\\
  01 & 1\\
  10 & 1\\
  11 & 1
\end{array}
$$

The two small circles on the wires just to the right of the names $\mathsf{X}$ and $\mathsf{Y}$ represent *fanout* operations, which simply create a copy of whatever value is carried on the wire on which they appear, so that this value can be input into multiple gates.
Fanout operations are not always considered to be gates in the classical setting — sometimes they are treated as if they are "free" in some sense — but when we discuss how ordinary Boolean circuits can be converted into equivalent quantum circuits, which we will do in the next unit, we must classify fanout operations explicitly as being gates and account for them correctly.

Here is the same circuit illustrated in a style more common in electrical engineering, where conventional symbols are used for the AND, OR, and NOT gates:

![Boolean circuit in a classic style](images/Boolean-circuit-classic.png)

We will not use this style or these particular gate symbols further, but we do use different symbols to represent gates in quantum circuits, as will be explained as we encounter them.

The particular circuit in this example computes the *exclusive-OR* (or XOR for short), which is denoted by the symbol $\oplus$:

$$
\rule[-10mm]{0mm}{15mm}
\begin{array}[t]{c|c}
  ab & a \oplus b\\
  \hline
  00 & 0\\
  01 & 1\\
  10 & 1\\
  11 & 0
\end{array}
$$

In the following diagram we consider just one possible selection of the inputs: $\mathsf{X}=1$ and $\mathsf{Y}=0$.
Each wire is labeled by value it carries so that the operation of the circuit can be visualized.
The output value is $1$ in this case, which is the correct value for the XOR: $1 \oplus 0 = 1$.

![Evaluating a Boolean circuit](images/XOR-circuit-evaluate.png)

The other three possible input settings can be checked in a similar way.

### Other types of circuits

The notion of a circuit in computer science is, in fact, much more general than just Boolean circuits.
Circuits whose wires carry values other than $0$ and $1$ are sometimes studied, as are gates representing different choices of operations.

For example, in *arithmetic circuits* the wires may carry arbitrary integer values (or, alternatively, values in some other ring or field) and the gates may represent arithmetic operations such as addition and multiplication.
For instance, the following figure depicts an arithmetic circuit that takes two variable input values ($x$ and $y$) as well as a third input set to the value $1$.
The values carried by the wires, as functions of the values $x$ and $y$, are shown in the figure.

![Example arithmetic circuit](images/arithmetic-circuit.png)

### Quantum circuits

In the quantum circuit model, the wires represent qubits and the gates represent operations that are performed on these qubits.
We will focus for now on the sorts of operations and measurements we have encountered thus far, namely *unitary operations* and *standard basis measurements*.
As we learn about other sorts of quantum operations and measurements, we will enhance our model accordingly.

Here is a simple example of a quantum circuit:

![Simple quantum circuit](images/simple-quantum-circuit.png)

In this circuit, We have a single qubit named $\mathsf{X}$, which is represented by the horizontal line, and a sequence of gates representing unitary operations on this qubit.
Just like in the examples above, the flow of information goes from left to right — so the first operation to be performed is a Hadamard operation, the second operation is an $S$ operation, the third operation is another Hadamard operation, and the final operation is a $T$ operation.
Applying the entire circuit therefore results in the composition of these operations, $THSH$, being applied to the qubit $\mathsf{X}$.

Sometimes we wish to indicate explicitly that certain states are to be input into a circuit, and we can also explicitly indicate the output states if we wish.
For example, if we apply the operation $THSH$ to the state $\vert 0\rangle$, we obtain the state
$\frac{1+i}{2}\vert 0\rangle + \frac{1}{\sqrt{2}} \vert 1 \rangle$, and so we may indicate this as follows:

![Simple quantum circuit evaluated](images/simple-quantum-circuit-evaluated.png)

It is common to consider the action of a quantum circuit when all of its qubits are initialized to the $\vert 0\rangle$ state, but there are also cases where we wish to set the input qubits to different states.

Here is how we can specify this circuit in Qiskit:

In [None]:
from qiskit import QuantumCircuit
circuit = QuantumCircuit(1)
circuit.h(0)
circuit.t(0)
circuit.h(0)
circuit.s(0)
display(circuit.draw())

The default names for qubits in Qiskit are $\mathsf{q_0}$, $\mathsf{q_1}$, $\mathsf{q_2}$, etc., and when there is just a single qubit like in our example, the default name is $\mathsf{q}$ rather than $\mathsf{q_0}$.
If we wish to choose our own name we can do this using the `QuantumRegister` class like this:

In [None]:
from qiskit import QuantumCircuit, QuantumRegister
X = QuantumRegister(1, "x")
circuit = QuantumCircuit(X)
circuit.h(X)
circuit.t(X)
circuit.h(X)
circuit.s(X)
display(circuit.draw())

Here the qubit is given the name $\mathsf{x}$ (lower case) — valid names in Qiskit for qubits (or *registers* more generally, which are simply collections of qubits that we wish to view as a single unit) must start with a lowercase letter.

Here is another example of a quantum circuit, this time with two qubits.

![Quantum circuit that creates an ebit](images/ebit-circuit.png)

As always, the gate labeled $H$ refers to a Hadamard operation, while the second gate is a two-qubit gate: it's the controlled-NOT operation, where the solid circle represents the control qubit and the circle resembling the symbol $\oplus$ denotes the target qubit.

Before examining this circuit in greater detail and explaining what it does, it is imperative that we clarify how qubits are ordered in quantum circuits.

<p style="padding-left: 3em; padding-right: 3em;">
<strong>Ordering of qubits in quantum circuits:</strong>
Throughout this textbook (and in Qiskit) the topmost qubit in a circuit corresponds to the rightmost position in a Cartesian or tensor product, the second-to-top qubit corresponds to the position second-from-right, and so on, down to the bottommost qubit, which corresponds to the leftmost position in a Cartesian or tensor product.
</p>

Thus, in the circuit above, we are considering the circuit to be an operation on two qubits $(\mathsf{X},\mathsf{Y})$.
This means, for instance, that if the input takes the form $\vert \psi\rangle \vert \phi\rangle$, then the assumption is that the state $\vert \psi\rangle$ is being fed into the lower qubit $\mathsf{X}$ and the state $\vert \phi\rangle$ is being fed into the upper qubit $\mathsf{Y}$.

Now let us take a look at the circuit itself, moving from left to right through the operations it describes, to see what it does.

<p style="padding-left: 3em;">
<ol start="1">
    <li>
      The first operation that is performed is a Hadamard operation on the qubit $\mathsf{Y}$.
      When a gate is applied to a single qubit like this, the understanding is that nothing is done to the 
        other qubits — which is to say that the identity operation is performed on the other qubits. 
</ol>        
  
![First operation e-bit creator](images/ebit-circuit-first.png)
  
<ol style="list-style-type:none"><li>
In our circuit there is just one other qubit, which is the qubit $\mathsf{X}$, so the operation on both
qubits that is represented by the dotted rectangle in the figure above is given by
</ol>

$$
\mathbb{1}\otimes H
= \begin{pmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 & 0\\
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 & 0\\
0 & 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\
0 & 0 & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}
\end{pmatrix}.
$$

<ol style="list-style-type:none"><li>
Note that the identity matrix appears on the left-hand side of the tensor product and $H$ appears on the right-hand side because that is the ordering that is consistent with the rule described above.
</ol>

<ol start="2"><li>
      The second operation that is performed is the controlled-NOT operation, where $\mathsf{Y}$ is the control 
      and $\mathsf{X}$ is the target:
</ol>             
            
![Second operation e-bit creator](images/ebit-circuit-second.png)

<ul style="list-style-type:none"><li>        
The controlled-NOT gate's action on standard basis states is as follows:
</ul>

![Controlled-NOT gate](images/cNOT.png)
        
<ul style="list-style-type:none"><li>      
Given that we order the qubits as $(\mathsf{X},\mathsf{Y})$, we get that the matrix representation of the controlled-NOT gate is this:
</ul>

$$
\begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 0 & 0 & 1\\[2mm]
0 & 0 & 1 & 0\\[2mm]
0 & 1 & 0 & 0
\end{pmatrix}.
$$
</p>

The unitary operation on the qubits $(\mathsf{X},\mathsf{Y})$ represented by the entire circuit, to which we will give the name $U$, is therefore obtained by composing the operations:

$$
U = \begin{pmatrix}
1 & 0 & 0 & 0\\[2mm]
0 & 0 & 0 & 1\\[2mm]
0 & 0 & 1 & 0\\[2mm]
0 & 1 & 0 & 0
\end{pmatrix}
\begin{pmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 & 0\\
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 & 0\\
0 & 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\
0 & 0 & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}
\end{pmatrix}
=
\begin{pmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 & 0\\
0 & 0 & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\\
0 & 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 & 0
\end{pmatrix}.
$$

In particular, recalling our notation for the Bell states:

$$
\begin{aligned}
  \vert \phi^+ \rangle & = \frac{1}{\sqrt{2}} \vert 0 0 \rangle 
                         + \frac{1}{\sqrt{2}} \vert 1 1 \rangle \\[1mm]
  \vert \phi^- \rangle & = \frac{1}{\sqrt{2}} \vert 0 0 \rangle 
                         - \frac{1}{\sqrt{2}} \vert 1 1 \rangle \\[1mm]
  \vert \psi^+ \rangle & = \frac{1}{\sqrt{2}} \vert 0 1 \rangle 
                         + \frac{1}{\sqrt{2}} \vert 1 0 \rangle \\[1mm]
  \vert \psi^- \rangle & = \frac{1}{\sqrt{2}} \vert 0 1 \rangle 
                         - \frac{1}{\sqrt{2}} \vert 1 0 \rangle
\end{aligned}
$$

we get that

$$
\begin{aligned}
U \vert 00\rangle & = \vert \phi^+\rangle\\
U \vert 01\rangle & = \vert \phi^-\rangle\\
U \vert 10\rangle & = \vert \psi^+\rangle\\
U \vert 11\rangle & = -\vert \psi^-\rangle.
\end{aligned}
$$

So, this circuit gives us a way to create an e-bit $\vert\phi^+\rangle$ if we run it on two qubits initialized to the $\vert 00\rangle$ state — and more generally it gives us a way to convert the standard basis to the Bell basis.
(The $-1$ phase factor on the last state, $-\vert \psi^-\rangle$, will not cause us any difficulties — but if we wanted to eliminate it we could, by either adding a controlled-Z gate at the beginning or a swap gate at the end.)

In general, quantum circuits can contain any number of qubit wires upon which quantum gates can be applied.
We may also include classical bit wires, which are indicated by double lines like in this example:

![Example circuit with measurements](images/ebit-circuit-measured.png)

In this circuit we have a Hadamard gate and a controlled-NOT gate on two qubits $\mathsf{X}$ and $\mathsf{Y},$ just like in the previous example — and we also have two *classical* bits $\mathsf{A}$ and $\mathsf{B}$, which we can identify as such because this is indicated by double lines, as well as two measurement gates.
The measurement gates indicate that standard basis measurements are made on the qubits upon which they appear:
the qubits are changed into their post-measurement states, while the measurement outcomes are *overwritten* on the classical bits to which the arrows point.

In [None]:
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
X = QuantumRegister(1, "x")
Y = QuantumRegister(1, "y")
A = ClassicalRegister(1,"a")
B = ClassicalRegister(1,"b")
circuit = QuantumCircuit(Y,X,B,A)
circuit.h(Y)
circuit.cx(Y,X)

circuit.measure(Y,B)
circuit.measure(X,A)
display(circuit.draw())

In [None]:
from qiskit import transpile
from qiskit.providers.aer import AerSimulator
from qiskit.visualization import plot_histogram

simulator = AerSimulator()
circuit_simulator = simulator.run(transpile(circuit,simulator), shots=1000)
simulation = circuit_simulator.result()
statistics = simulation.get_counts()
display(plot_histogram(statistics))

Sometimes it is convenient in a quantum circuit diagram to depict a measurement as a gate that takes a qubit as input and outputs a classical bit (as opposed to outputting the qubit in its post-measurement state and writing the result to a separate classical bit).
When this is done, it should be interpreted that the qubit that was measured has effectively been discarded and can safely be ignored thereafter.

For example, the following circuit diagram represents the same process as the one in the previous diagram, but where we ignore $\mathsf{X}$ and $\mathsf{Y}$ after they are measured:

![Example circuit with measurements compact](images/ebit-circuit-measured-compact.png)

## 2. Inner products, orthonormality, and projections

To better prepare ourselves to explore the capabilities and limitations of quantum circuits, we will now introduce some additional mathematical concepts — namely the *inner product* between vectors (and its connection to the Euclidean norm), the notions of *orthogonality* and *orthonormality* for sets of vectors, and *projection* matrices, which will allow us to introduce a handy generalization of standard basis measurements.

### 2.1 Inner products

Recall from Lesson 1 that when we use the Dirac notation to refer to an arbitrary column vector as a ket, such as

$$
\vert \psi \rangle =
\begin{pmatrix}
\alpha_1\\
\alpha_2\\
\vdots\\
\alpha_n
\end{pmatrix},
$$

the corresponding bra vector is the *conjugate transpose* of this vector:

$$
\langle \psi \vert = \bigl(\vert \psi \rangle \bigr)^{\dagger}
=
\begin{pmatrix}
\overline{\alpha_1} & \overline{\alpha_2} & \cdots & \overline{\alpha_n}
\end{pmatrix}.
\tag{2.1}
$$

Alternatively, if we have some classical state set $\Sigma$ in mind, and we express a column vector as a ket,
such as

$$
\vert \psi \rangle = \sum_{a\in\Sigma} \alpha_a \vert a \rangle,
$$

then the corresponding row (or bra) vector is the conjugate transpose

$$
\langle \psi \vert = \sum_{a\in\Sigma} \overline{\alpha_a} \langle a \vert.
\tag{2.2}
$$

We also observed that the product of a bra vector and a ket vector, viewing them as matrices that either have a single row or a single column, results in a scalar.
Specifically, if we have two (column) vectors

$$
\vert \psi \rangle =
\begin{pmatrix}
\alpha_1\\
\alpha_2\\
\vdots\\
\alpha_n
\end{pmatrix}
\quad\text{and}\quad
\vert \phi \rangle =
\begin{pmatrix}
\beta_1\\
\beta_2\\
\vdots\\
\beta_n
\end{pmatrix},
$$

so that the row vector $\langle \psi \vert$ is as in equation $(2.1)$, then 

$$
\langle \psi \vert \phi \rangle = \langle \psi \vert \vert \phi \rangle
=
\begin{pmatrix}
\overline{\alpha_1} & \overline{\alpha_2} & \cdots & \overline{\alpha_n}
\end{pmatrix}
\begin{pmatrix}
\beta_1\\
\beta_2\\
\vdots\\
\beta_n
\end{pmatrix}
=
\overline{\alpha_1} \beta_1 + \cdots + \overline{\alpha_n}\beta_n.
$$

Alternatively, if we have two column vectors that we have written as 

$$
\vert \psi \rangle = \sum_{a\in\Sigma} \alpha_a \vert a \rangle
\quad\text{and}\quad
\vert \phi \rangle = \sum_{b\in\Sigma} \beta_b \vert b \rangle,
$$

so that $\langle \psi \vert$ is the row vector $(2.2)$, we find that

$$
\begin{aligned}
  \langle \psi \vert \phi \rangle & = \langle \psi \vert \vert \phi \rangle\\
  & =
  \Biggl(\sum_{a\in\Sigma} \overline{\alpha_a} \langle a \vert\Biggr)
  \Biggl(\sum_{b\in\Sigma} \beta_b \vert b\rangle\Biggr)\\
  & =
  \sum_{a\in\Sigma}\sum_{b\in\Sigma} \overline{\alpha_a} \beta_b \langle a \vert b \rangle\\
  & = \sum_{a\in\Sigma} \overline{\alpha_a} \beta_a,
\end{aligned}
$$

where the last equality follows from the observation that $\langle a \vert a \rangle = 1$ and $\langle a \vert b \rangle = 0$ for classical states $a$ and $b$ satisfying $a\not=b$.

The value $\langle \psi \vert \phi \rangle$ is called the *inner product* between the vectors $\vert \psi\rangle$ and $\vert \phi \rangle$.
Inner products are critically important in quantum information and computation: we would not get far in understanding quantum information at a mathematical level without this fundamental notion!

Let us now collect together some basic facts about inner products of vectors.

<ol><li>
<strong>Relationship to the Euclidean norm.</strong>
The inner product of any vector
</ol>

$$
\vert \psi \rangle = \sum_{a\in\Sigma} \alpha_a \vert a \rangle
$$

<ol style="list-style-type:none"><li>
with itself is
</ol>
      
$$
\langle \psi \vert \psi \rangle 
= \sum_{a\in\Sigma} \overline{\alpha_a} \alpha_a 
= \sum_{a\in\Sigma} \vert\alpha_a\vert^2 
= \bigl\| \vert \psi \rangle \bigr\|^2.
$$

<ol style="list-style-type:none"><li>
Thus, the Euclidean norm of a vector may alternatively be expressed as
</ol>

$$
\bigl\| \vert \psi \rangle \bigr\| = \sqrt{ \langle \psi \vert \psi \rangle }.
$$

<ol style="list-style-type:none"><li>
Notice that the Euclidean norm of a vector must always be a nonnegative real number (because it equals the sum of the absolute values squared of the entries and each absolute value squared is nonnegative). 
Moreover, the only way the Euclidean norm of a vector can be equal to zero is if every one of the entries is equal to zero, which is to say that the vector is the zero vector. 
</ol>

<ol style="list-style-type:none"><li>
We can summarize these observations like this: for every vector $\vert \psi \rangle$ we have
</ol>

$$
\langle \psi \vert \psi \rangle \geq 0,
$$

<ol style="list-style-type:none"><li>
with $\langle \psi \vert \psi \rangle = 0$ if and only if $\vert \psi \rangle = 0$.
    This property of the inner product is sometimes referred to as <em>positive definiteness</em>.
</ol>

<ol start="2"><li>
<strong>Conjugate symmetry.</strong>
For any two vectors
</ol>

$$
\vert \psi \rangle = \sum_{a\in\Sigma} \alpha_a \vert a \rangle
\quad\text{and}\quad
\vert \phi \rangle = \sum_{b\in\Sigma} \beta_b \vert b \rangle,
$$

<ol style="list-style-type:none"><li>
we have
</ol>

$$
\langle \psi \vert \phi \rangle = \sum_{a\in\Sigma} \overline{\alpha_a} \beta_a
\quad\text{and}\quad
\langle \phi \vert \psi \rangle = \sum_{a\in\Sigma} \overline{\beta_a} \alpha_a,
$$

<ol style="list-style-type:none"><li>
and therefore
</ol>

$$
\overline{\langle \psi \vert \phi \rangle} = \langle \phi \vert \psi \rangle.
$$

<ol start="3"><li>
<strong>Linearity in the second argument (and conjugate linearity in the first).</strong>
Let us suppose that $\vert \psi \rangle$, $\vert \phi_1 \rangle$, and $\vert \phi_2 \rangle$ are vectors and $\alpha_1$ and $\alpha_2$ are complex numbers. If we define a new vector
</ol>
        
$$
\vert \phi\rangle = \alpha_1 \vert \phi_1\rangle + \alpha_2 \vert \phi_2\rangle,
$$

<ol style="list-style-type:none"><li>
then
</ol>

$$
\langle \psi \vert \phi \rangle
= \langle \psi \vert \bigl( \alpha_1\vert \phi_1 \rangle + \alpha_2\vert \phi_2 \rangle\bigr)
= \alpha_1 \langle \psi \vert \phi_1 \rangle + \alpha_2 \langle \psi \vert \phi_2 \rangle.
$$ 

<ol style="list-style-type:none"><li>
That is to say, the inner product is <em>linear</em> in the second argument.
This can be verified either through the formulas above or simply by noting that matrix multiplication is linear in each argument (and specifically the second argument).
</ol>
    
<ol style="list-style-type:none"><li>
Combining this fact with conjugate symmetry reveals that the inner product is <em>conjugate linear</em> in the first argument. That is, if $\vert \psi_1 \rangle$, $\vert \psi_2 \rangle$, and $\vert \phi \rangle$ are vectors and $\alpha_1$ and $\alpha_2$ are complex numbers, and we define
</ol>

$$
\vert \psi \rangle = \alpha_1 \vert \psi_1\rangle + \alpha_2 \vert \psi_2 \rangle,
$$

<ol style="list-style-type:none"><li>
then
</ol>

$$
\langle \psi \vert \phi \rangle
= 
\bigl( \overline{\alpha_1} \langle \psi_1 \vert + \overline{\alpha_2} \langle \psi_2 \vert \bigr) 
\vert\phi\rangle
= \overline{\alpha_1} \langle \psi_1 \vert \phi \rangle + \overline{\alpha_2} \langle \psi_2 \vert \phi \rangle.
$$ 

<ol start="4"><li>
<strong>The Cauchy&ndash;Schwarz inequality.</strong>
For every choice of vectors $\vert \phi \rangle$ and $\vert \psi \rangle$ having the same number of entries, we have
</ol>

$$
\bigl\vert \langle \psi \vert \phi \rangle\bigr| \leq \bigl\| \vert\psi \rangle \bigr\| \bigl\| \vert \phi \rangle
\bigr\|.
$$

<ol style="list-style-type:none"><li>
This is an incredibly handy inequality that gets used quite extensively in quantum information (and in many other fields of study).
</ol>
</p>

### 2.2 Orthogonal and orthonormal sets

Two vectors $\vert \phi \rangle$ and $\vert \psi \rangle$ are said to be *orthogonal* if their inner product is zero:

$$
\langle \psi \vert \phi \rangle = 0.
$$

Geometrically, we can think about orthogonal vectors as vectors that form a right angle to each other.

A set of vectors $\{ \vert \psi_1\rangle,\ldots,\vert\psi_m\rangle\}$ is called an *orthogonal set* if it is the case that every vector in the set is orthogonal to every other vector in the set.
That is, this set is orthogonal if

$$
\langle \psi_j \vert \psi_k\rangle = 0
$$

for all choices of $j,k\in\{1,\ldots,m\}$ for which $j\not=k$.

A set of vectors $\{ \vert \psi_1\rangle,\ldots,\vert\psi_m\rangle\}$ is called an *orthonormal* set if it is an orthogonal set and, in addition, every vector in the set is a unit vector.
Alternatively, this set is an orthonormal set if we have

$$
\langle \psi_j \vert \psi_k\rangle =
\begin{cases}
1 & j = k\\
0 & j\not=k
\end{cases}
\tag{2.3}
$$

for all choices of $j,k\in\{1,\ldots,m\}$.

Finally, a set $\{ \vert \psi_1\rangle,\ldots,\vert\psi_m\rangle\}$ is an *orthonormal basis* if, in addition to being an orthonormal set, it forms a basis.
This is equivalent to $\{ \vert \psi_1\rangle,\ldots,\vert\psi_m\rangle\}$ being an orthonormal set and $m$ being equal to the dimension of the space from which $\vert \psi_1\rangle,\ldots,\vert\psi_m\rangle$ are drawn.

For example, for any classical state set $\Sigma$, the set of all standard basis vectors

$$
\big\{ \vert a \rangle \,:\, a\in\Sigma\bigr\}
$$

is an orthonormal basis.
The set $\{\vert+\rangle,\vert-\rangle\}$ is an orthonormal basis for the $2$-dimensional space corresponding to a single qubit, and the Bell basis $\{\vert\phi^+\rangle, \vert\phi^-\rangle, \vert\psi^+\rangle, \vert\psi^-\rangle\}$ is an orthonormal basis for the $4$-dimensional space corresponding to two qubits.

#### Extending orthonormal sets to orthonormal bases

Suppose that $\vert\psi_1\rangle,\ldots,\vert\psi_m\rangle$ are vectors that live in an $n$-dimensional space, and assume moreover that $\{\vert\psi_1\rangle,\ldots,\vert\psi_m\rangle\}$ is an orthonormal set.
Orthonormal sets are always linearly independent sets, so these vectors necessarily span a subspace of dimension $m$.
From this we immediately conclude that $m\leq n$ because the dimension of the subspace spanned by these vectors cannot be larger than the dimension of the entire space from which they're drawn.

If it is the case that $m<n$, then it is always possible to choose an additional $n-m$ vectors
$\vert \psi_{m+1}\rangle,\ldots,\vert\psi_n\rangle$ so that
$\{\vert\psi_1\rangle,\ldots,\vert\psi_n\rangle\}$ forms an orthonormal basis.
A procedure known as the *Gram*&ndash;*Schmidt orthogonalization process* can be used to construct these vectors.

##### Orthonormal sets and unitary matrices

Orthonormal sets of vectors are closely connected with unitary matrices.
One way to express this connection is to say that the following three statements are logically equivalent (meaning that they are all true or all false) for any choice of a square matrix $U$:

1. The matrix $U$ is unitary (i.e., $U^{\dagger} U = \mathbb{1} = U U^{\dagger}$).
2. The rows of $U$ form an orthonormal set.
3. The columns of $U$ form an orthonormal set.

This equivalence is actually pretty straightforward when we think about how matrix multiplication and the conjugate transpose work.
Suppose, for instance, that we have a $3\times 3$ matrix like this:

$$
U = \begin{pmatrix}
\alpha_{1,1} & \alpha_{1,2} & \alpha_{1,3} \\
\alpha_{2,1} & \alpha_{2,2} & \alpha_{2,3} \\
\alpha_{3,1} & \alpha_{3,2} & \alpha_{3,3}
\end{pmatrix}
$$

The conjugate transpose of $U$ looks like this:

$$
U^{\dagger} = \begin{pmatrix}
\overline{\alpha_{1,1}} & \overline{\alpha_{2,1}} & \overline{\alpha_{3,1}} \\
\overline{\alpha_{1,2}} & \overline{\alpha_{2,2}} & \overline{\alpha_{3,2}} \\
\overline{\alpha_{1,3}} & \overline{\alpha_{2,3}} & \overline{\alpha_{3,3}}
\end{pmatrix}
$$

Multiplying the two matrices, with the conjugate transpose on the left-hand side, gives us this matrix:

$$
\begin{aligned}
&\begin{pmatrix}
\overline{\alpha_{1,1}} & \overline{\alpha_{2,1}} & \overline{\alpha_{3,1}} \\
\overline{\alpha_{1,2}} & \overline{\alpha_{2,2}} & \overline{\alpha_{3,2}} \\
\overline{\alpha_{1,3}} & \overline{\alpha_{2,3}} & \overline{\alpha_{3,3}}
\end{pmatrix}
\begin{pmatrix}
\alpha_{1,1} & \alpha_{1,2} & \alpha_{1,3} \\
\alpha_{2,1} & \alpha_{2,2} & \alpha_{2,3} \\
\alpha_{3,1} & \alpha_{3,2} & \alpha_{3,3}
\end{pmatrix}\\[2mm]
\qquad &=
{\scriptsize
\begin{pmatrix}
\overline{\alpha_{1,1}}\alpha_{1,1} + \overline{\alpha_{2,1}}\alpha_{2,1} + \overline{\alpha_{3,1}}\alpha_{3,1} &
\overline{\alpha_{1,1}}\alpha_{1,2} + \overline{\alpha_{2,1}}\alpha_{2,2} + \overline{\alpha_{3,1}}\alpha_{3,2} &
\overline{\alpha_{1,1}}\alpha_{1,3} + \overline{\alpha_{2,1}}\alpha_{2,3} + \overline{\alpha_{3,1}}\alpha_{3,3} \\[1mm]
\overline{\alpha_{1,2}}\alpha_{1,1} + \overline{\alpha_{2,2}}\alpha_{2,1} + \overline{\alpha_{3,2}}\alpha_{3,1} &
\overline{\alpha_{1,2}}\alpha_{1,2} + \overline{\alpha_{2,2}}\alpha_{2,2} + \overline{\alpha_{3,2}}\alpha_{3,2} &
\overline{\alpha_{1,2}}\alpha_{1,3} + \overline{\alpha_{2,2}}\alpha_{2,3} + \overline{\alpha_{3,2}}\alpha_{3,3} \\[1mm]
\overline{\alpha_{1,3}}\alpha_{1,1} + \overline{\alpha_{2,3}}\alpha_{2,1} + \overline{\alpha_{3,3}}\alpha_{3,1} &
\overline{\alpha_{1,3}}\alpha_{1,2} + \overline{\alpha_{2,3}}\alpha_{2,2} + \overline{\alpha_{3,3}}\alpha_{3,2} &
\overline{\alpha_{1,3}}\alpha_{1,3} + \overline{\alpha_{2,3}}\alpha_{2,3} + \overline{\alpha_{3,3}}\alpha_{3,3}
\end{pmatrix}}
\end{aligned}
$$

If we form three vectors from the columns of $U$, 

$$
\vert \psi_1\rangle = \begin{pmatrix}
\alpha_{1,1}\\
\alpha_{2,1}\\
\alpha_{3,1}
\end{pmatrix},
\quad
\vert \psi_2\rangle = \begin{pmatrix}
\alpha_{1,2}\\
\alpha_{2,2}\\
\alpha_{3,2}
\end{pmatrix},
\quad
\vert \psi_3\rangle =
\begin{pmatrix}
\alpha_{1,3}\\
\alpha_{2,3}\\
\alpha_{3,3}
\end{pmatrix},
$$

then we can alternatively express the product above as follows:

$$
U^{\dagger} U =
\begin{pmatrix}
\langle \psi_1\vert \psi_1 \rangle & \langle \psi_1\vert \psi_2 \rangle & \langle \psi_1\vert \psi_3 \rangle \\
\langle \psi_2\vert \psi_1 \rangle & \langle \psi_2\vert \psi_2 \rangle & \langle \psi_2\vert \psi_3 \rangle \\
\langle \psi_3\vert \psi_1 \rangle & \langle \psi_3\vert \psi_2 \rangle & \langle \psi_3\vert \psi_3 \rangle 
\end{pmatrix}
$$

Referring to the equation (2.3), we now see that the condition that this matrix is equal to the identity matrix is equivalent to the orthonormality of the set $\{\vert\psi_1\rangle,\vert\psi_2\rangle,\vert\psi_3\rangle\}$.

This argument generalizes to unitary matrices of any size, and a similar argument reveals that the rows of a unitary matrix, as opposed to the columns, must be orthonormal.
In this case, we instead make use of the equation $U U^{\dagger} = \mathbb{1}$.

Given this equivalence, together with the fact that every orthonormal set can be extended to form an orthonormal basis (which was discussed above), we conclude the following useful fact:
Given any orthonormal set of vectors $\{\vert\psi_1\rangle,\ldots,\vert\psi_m\rangle\}$ drawn from an $n$-dimensional space, there exists a unitary matrix $U$ whose first $m$ columns are the vectors $\vert\psi_1\rangle,\ldots,\vert\psi_m\rangle$.
Pictorially, we can always find a unitary matrix having this form:

$$
U =
\left(
  \begin{array}{ccccccc}
    \rule{0.4pt}{10pt} & \rule{0.4pt}{10pt} & & \rule{0.4pt}{10pt} & \rule{0.4pt}{10pt} & & \rule{0.4pt}{10pt}\\
    \vert\psi_1\rangle & \vert\psi_2\rangle & \cdots & \vert\psi_m\rangle & \vert\psi_{m+1}\rangle & 
    \cdots & \vert\psi_n\rangle\\
    \rule{0.4pt}{10pt} & \rule{0.4pt}{10pt} & & \rule{0.4pt}{10pt} & \rule{0.4pt}{10pt} & & \rule{0.4pt}{10pt}
  \end{array}
\right).
$$

Here, the last $n-m$ columns are filled in with any choice of vectors $\vert\psi_{m+1}\rangle,\ldots,\vert\psi_n\rangle$ that make
$\{\vert\psi_1\rangle,\ldots,\vert\psi_n\rangle\}$ an orthonormal basis.

### 2.3 Projections and projective measurements

#### Projection matrices

A square matrix $\Pi$ is called a *projection* if it satisfies two properties:

  1. $\Pi = \Pi^{\dagger}$.
  2. $\Pi^2 = \Pi$.

Matrices that satisfy the first condition — that they are equal to their own conjugate transpose — are called *Hermitian matrices*, and matrices that satisfy the second condition — that squaring them leaves them unchanged — are called *idempotent* matrices.

As a word of caution, the word *projection* is sometimes used to refer to any matrix that satisfies just the second condition but not necessarily the first, and when this is done the term *orthogonal projection* is typically used to refer to matrices satisfying both properties.
In this textbook, however, we will use the terms *projection* and *projection matrix* to mean matrices satisfying both conditions.

An example of a projection is the matrix

$$
\Pi = \vert \psi \rangle \langle \psi \vert
\tag{2.4}
$$

for any unit vector $\vert \psi\rangle$.
We can see that this matrix is Hermitian as follows:

$$
\Pi^{\dagger} = \bigl( \vert \psi \rangle \langle \psi \vert \bigr)^{\dagger}
= \bigl( \langle \psi \vert \bigr)^{\dagger}\bigl( \vert \psi \rangle \bigr)^{\dagger} 
= \vert \psi \rangle \langle \psi \vert = \Pi.
$$

Here, to obtain the second equality, we have used the formula

$$
(A B)^{\dagger} = B^{\dagger} A^{\dagger},
$$

which is always true — for any two matrices $A$ and $B$ for which the product $AB$ makes sense.

To see that the matrix $\Pi$ in $(2.4)$ is idempotent, we can use the assumption that $\vert\psi\rangle$ is a unit vector, so that it satisfies $\langle \psi \vert \psi\rangle = 1.$
Thus, we have

$$
\Pi^2 
= \bigl( \vert\psi\rangle\langle \psi\vert \bigr)^2 
= \vert\psi\rangle\langle \psi\vert\psi\rangle\langle\psi\vert
= \vert\psi\rangle\langle\psi\vert = \Pi.
$$

More generally, if $\{\vert \psi_1\rangle,\ldots,\vert \psi_m\rangle\}$ is any orthonormal set of vectors, then

$$
\Pi = \sum_{k = 1}^m \vert \psi_k\rangle \langle \psi_k \vert
\tag{2.5}
$$

is a projection.
Specifically, we have

$$
\begin{aligned}
\Pi^{\dagger} 
&= \biggl(\sum_{k = 1}^m \vert \psi_k\rangle \langle \psi_k \vert\biggr)^{\dagger} \\
&= \sum_{k = 1}^m \bigl(\vert\psi_k\rangle\langle\psi_k\vert\bigr)^{\dagger} \\
&= \sum_{k = 1}^m \vert \psi_k\rangle \langle \psi_k \vert\\
&= \Pi,
\end{aligned}
$$

and

$$
\begin{aligned}
\Pi^2 
& = \biggl( \sum_{j = 1}^m \vert \psi_j\rangle \langle \psi_j \vert\Bigr)\Bigl(\sum_{k = 1}^m \vert \psi_k\rangle \langle \psi_k \vert\biggr) \\
& = \sum_{j = 1}^m\sum_{k = 1}^m \vert \psi_j\rangle \langle \psi_j \vert  \psi_k\rangle \langle \psi_k \vert \\
& = \sum_{k = 1}^m \vert \psi_k\rangle \langle \psi_k \vert\\
& = \Pi,
\end{aligned}
$$

where the orthonormality of $\{\vert \psi_1\rangle,\ldots,\vert \psi_m\rangle\}$ is used just for the second-to-last equality.

In fact, this exhausts all of the possibilities: *every* projection $\Pi$ can be written in the form $(2.5)$ for some choice of an orthonormal set $\{\vert \psi_1\rangle,\ldots,\vert \psi_m\rangle\}$.
(The zero matrix $\Pi=0$, which is a projection, is a special case: to fit it into the general form (P) we have to allow the possibility that the sum is empty, resulting in the zero matrix.)

#### Projective measurements

As has already been mentioned, the notion of a measurement of a quantum system is more general than just standard basis measurements.
*Projective measurements* are measurements that are described by a collection of projections whose sum is equal to the identity matrix.
In symbols, a collection $\{\Pi_1,\ldots,\Pi_m\}$ of projection matrices describes a projective measurement if

$$
\Pi_1 + \cdots + \Pi_m = \mathbb{1}.
$$

When such a measurement is performed on a system $\mathsf{X}$ while it is in some state $\vert\psi\rangle$, two things happen:

1. For each $k\in\{1,\ldots,m\}$, the outcome of the measurement is $k$ with probability equal to

$$
\operatorname{Pr}\bigl(\text{outcome is $k$}\bigr) = \bigl\| \Pi_k \vert \psi \rangle \bigr\|^2.
$$

2. For whichever outcome $k$ the measurement produces, the state of $\mathsf{X}$ becomes

$$
\frac{\Pi_k \vert\psi\rangle}{\bigl\|\Pi_k \vert\psi\rangle\bigr\|}.
$$

We can also choose different indices besides $\{1,\ldots,m\}$ for projective measurements if we wish.
More generally, for any finite and nonempty set $\Sigma$, if we have a collection of projection matrices
$\{\Pi_a:a\in\Sigma\}$ that satisfies the condition

$$
\sum_{a\in\Sigma} \Pi_a = \mathbb{1},
$$

then this collection describes a projective measurement whose possible outcomes coincide with the set $\Sigma$, where the rules are the same as before:

1. For each $a\in\Sigma$, the outcome of the measurement is $a$ with probability equal to

$$
\operatorname{Pr}\bigl(\text{outcome is $a$}\bigr) = \bigl\| \Pi_a \vert \psi \rangle \bigr\|^2.
$$

2. For whichever outcome $a$ the measurement produces, the state of $\mathsf{X}$ becomes

$$
\frac{\Pi_a \vert\psi\rangle}{\bigl\|\Pi_a \vert\psi\rangle\bigr\|}.
$$

For example, standard basis measurements are projective measurements, where $\Sigma$ is the set of classical states of whatever system $\mathsf{X}$ we're talking about and our set of projection matrices is
$\{\vert a\rangle\langle a\vert:a\in\Sigma\}$.

Another example of a projective measurement, this time on two qubits $(\mathsf{X},\mathsf{Y})$, is given by the set
$\{\Pi_0,\Pi_1\}$, where

$$
\Pi_0 = \vert \phi^+\rangle\langle \phi^+ \vert + \vert \phi^-\rangle\langle \phi^- \vert + \vert \psi^+\rangle\langle \psi^+ \vert
\quad\text{and}\quad
\Pi_1 = \vert\psi^-\rangle\langle\psi^-\vert.
$$

If we have multiple systems that are jointly in some quantum state and a projective measurement is performed on just one of the systems, the action is similar to what we had for standard basis measurements — and in fact we can now describe this action in much simpler terms than we could before.
To be precise, let us suppose that we have two systems $(\mathsf{X},\mathsf{Y})$ in a quantum state $\vert\psi\rangle$, and a projective measurement described by a collection $\{\Pi_a:a\in\Sigma\}$ is performed on the system $\mathsf{X}$, while nothing is done to $\mathsf{Y}$.
Doing this is then equivalent to performing the projective measurement described by the collection

$$
\bigl\{ \Pi_a \otimes \mathbb{1} \,:\, a\in\Sigma\bigr\}
$$

on the joint system $(\mathsf{X},\mathsf{Y})$.
Each measurement outcome $a$ results with probability

$$
\bigl\| (\Pi_a \otimes \mathbb{1})\vert \psi\rangle \bigr\|^2,
$$

and conditioned on the result $a$ appearing, the state of the joint system $(\mathsf{X},\mathsf{Y})$ becomes

$$
\frac{(\Pi_a \otimes \mathbb{1})\vert \psi\rangle}{\bigl\| (\Pi_a \otimes \mathbb{1})\vert \psi\rangle \bigr\|}.
$$

#### Implementing projective measurements using standard basis measurements

Arbitrary projective measurements can be implemented using unitary operations, standard basis measurements, and an extra workspace system, as we will now explain.

Let us suppose that $\mathsf{X}$ is a system and $\{\Pi_1,\ldots,\Pi_m\}$ is a projective measurement on $\mathsf{X}$. We can easily generalize this discussion to projective measurements having different sets of outcomes, but in the interest of convenience and simplicity we will assume the set of possible outcomes for our measurement is $\{1,\ldots,m\}$.
Let us note explicitly that $m$ is not necessarily equal to the number of classical states of $\mathsf{X}$ — we will let $n$ be the number of classical states of $\mathsf{X}$, which means that each matrix $\Pi_k$ is an $n\times n$ projection matrix.
Because we assume that $\{\Pi_1,\ldots,\Pi_m\}$ represents a projective measurement, it is necessarily the case that

$$
\sum_{k = 1}^m \Pi_k = \mathbb{1}_n.
$$

Our goal is to perform a process that has the same effect as performing this projective measurement on $\mathsf{X}$, but to do this using only unitary operations and standard basis measurements.
We will also make use of an extra workspace system $\mathsf{Y}$ to do this, and specifically we take the classical state set of $\mathsf{Y}$ to be $\{1,\ldots,m\}$ — the same as the set of outcomes of the projective measurement.
The idea is that we will perform a standard basis measurement on $\mathsf{Y}$, and interpret the outcome of this measurement as being equivalent to the outcome of the projective measurement on $\mathsf{X}$.
We will need to assume that $\mathsf{Y}$ is initialized to some fixed state, which we will choose, more or less arbitrarily, to be $\vert 1\rangle$.
(Any other choice of fixed quantum state vector could be made to work, but choosing $\vert 1\rangle$ makes the explanation to follow much simpler.)

Of course, in order for a standard basis measurement of $\mathsf{Y}$ to tell us anything about $\mathsf{X}$, we will need to allow $\mathsf{X}$ and $\mathsf{Y}$ to interact somehow before measuring $\mathsf{Y}$, by performing a unitary operation on the system $(\mathsf{Y},\mathsf{X})$.
First consider this matrix:

$$
M = \sum_{k = 1}^m \vert k \rangle \langle 1 \vert \otimes \Pi_k.
$$

Expressed explicitly as a block matrix, this matrix looks like this:

$$
M = 
\begin{pmatrix}
\Pi_1 & 0 & \cdots & 0\\
\Pi_2 & 0 & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots\\
\Pi_m & 0 & \cdots & 0
\end{pmatrix}.
$$

(Each $0$ in this matrix represents an $n\times n$ matrix filled entirely with zeros).

Now, $M$ is certainly not a unitary matrix (unless $m=1$, in which case $\Pi_1 = \mathbb{1}$, giving $M = \mathbb{1}$ in this trivial case) because unitary matrices cannot have any columns (or rows) that are entirely $0$; unitary matrices have columns that form orthonormal bases, and the all-zero vector is not a unit vector.
However, it is the case that the first $n$ columns of $M$ are orthonormal, and we get this from the assumption that $\{\Pi_1,\ldots,\Pi_m\}$ is a measurement.
To verify this claim, notice that for each $j\in\{1,\ldots,n\}$, column number $j$ of $M$ is this vector:

$$
\vert \psi_j\rangle = M \vert 1, j\rangle = \sum_{k = 1}^m \vert k \rangle \otimes \Pi_k \vert j\rangle.
$$

Taking the inner product of column $i$ with column $j$ (still assuming we're talking about the first $n$ columns, so $i,j\in\{1,\ldots,n\}$) gives

$$
\begin{aligned}
\langle \psi_i \vert \psi_j \rangle 
& = 
\biggl(\sum_{k = 1}^m \vert k \rangle \otimes \Pi_k \vert i\rangle\biggr)^{\dagger}
\biggl(\sum_{l = 1}^m \vert l \rangle \otimes \Pi_l \vert j\rangle\biggr) \\
& = 
\sum_{k = 1}^m \sum_{l = 1}^m  
\langle k \vert l \rangle \langle i \vert \Pi_k \Pi_l \vert j\rangle\\
& = 
\sum_{k = 1}^m 
\langle i \vert \Pi_k \Pi_k \vert j\rangle\\
& = 
\sum_{k = 1}^m 
\langle i \vert \Pi_k \vert j\rangle\\
& = \langle i \vert \mathbb{1} \vert j \rangle\\
& = \begin{cases}
1 & i = j\\
0 & i\not=j,
\end{cases}
\end{aligned}
$$

which is what we needed to show.

Thus, because the first $n$ columns of the matrix $M$ are orthonormal, we can replace all of the remaining zero entries by some different choice of complex number entries so that the entire matrix is unitary:

$$
U = \begin{pmatrix}
\Pi_1 & \fbox{?} & \cdots & \fbox{?}\\
\Pi_2 & \fbox{?} & \cdots & \fbox{?}\\
\vdots & \vdots & \ddots & \vdots\\
\Pi_m & \fbox{?} & \cdots & \fbox{?}
\end{pmatrix}
$$

(If we are given the matrices $\Pi_1,\ldots,\Pi_m$, we can compute suitable matrices to fill in for the blocks marked $\fbox{?}$ in the equation — using the Gram&ndash;Schmidt process — but it will not matter specifically what these matrices are for the sake of this discussion.)

Finally we can describe the process: we first perform $U$ on the joint system $(\mathsf{Y},\mathsf{X})$ and then measure $\mathsf{Y}$ with respect to a standard basis measurement.
For an arbitrary state $\vert \phi \rangle$ of $\mathsf{X}$, we obtain the state

$$
U \bigl( \vert 1\rangle \vert \phi\rangle\bigr)
= M \bigl( \vert 1\rangle \vert \phi\rangle\bigr)
= \sum_{k = 1}^m \vert k\rangle \otimes \Pi_k \vert\phi\rangle,
$$

where the first equality follows from the fact that $U$ and $M$ agree on their first $n$ columns.
When we perform a projective measurement on $\mathsf{Y}$, we obtain each outcome $k$ with probability

$$
\bigl\| \Pi_k \vert \phi\rangle \bigr\|^2,
$$

in which case the state of $(\mathsf{Y},\mathsf{X})$ becomes

$$
\vert k\rangle \otimes \frac{\Pi_k \vert \phi\rangle}{\bigl\| \Pi_k \vert \phi\rangle \bigr\|}.
$$

Thus, $\mathsf{Y}$ stores a copy of the measurement outcome and $\mathsf{X}$ changes precisely as it would had the projective measurement described by $\{\Pi_1,\ldots,\Pi_m\}$ been performed directly on $\mathsf{X}$.

(*** Possible Qiskit example: implement the projective measurement $\{\Pi_0,\Pi_1\}$ concerning the Bell states above.)

## 3. Limitations on quantum information

Despite the fact that quantum and classical information have a common underlying mathematical structure, they are different in the way that they work.
As we continue on in this textbook, we will see many examples of tasks that can be performed using quantum information but not classical information.
Before doing this, however, it is appropriate that we take note of some important limitations on quantum information — having an understanding of the sorts of things quantum information cannot do helps us to identify the things it can do.

### 3.1 Irrelevance of global phases

The first limitation of quantum information that we will observe, which is really more of a slight degeneracy in the way that quantum states are represented by quantum state vectors as opposed to an actual limitation, concerns the notion of a *global phase*.

What we mean by a global phase is this.
Suppose that $\vert \psi \rangle$ and $\vert \phi \rangle$ are unit vectors representing quantum states of some system, and assume moreover that there exists a complex number $\alpha$ on the unit circle (which means that
$\vert \alpha \vert = 1$, or alternatively $\alpha = e^{i\theta}$ for some real number $\theta$) such that

$$
\vert \phi \rangle = \alpha \vert \psi \rangle.
$$

The vectors $\vert \psi \rangle$ and $\vert \phi \rangle$ are then said to *differ by a global phase*.
(We also sometimes refer to $\alpha$ as being or representing a global phase, although this is context-dependent in some sense: any number on the unit circle can be thought of as a global phase when it is multiplied to some unit vector.)

Now, consider what happens when a system is in either one of the two quantum states $\vert\psi\rangle$ or 
$\vert\phi\rangle$ and is measured, with respect to a standard basis measurement.
In the first case, in which the system is in the state $\vert\psi\rangle$, the probability for the measurement to result in any chosen classical state $a$ is

$$
\bigl\vert \langle a \vert \psi \rangle \bigr\vert^2.
$$

In the second case, in which the system is in the state $\vert\phi\rangle$, the probability for the measurement to result in any chosen classical state $a$ is

$$
\bigl\vert \langle a \vert \phi \rangle \bigr\vert^2 
= \bigl\vert \alpha \langle a \vert \psi \rangle \bigr\vert^2
= \vert \alpha \vert^2 \bigl\vert \langle a \vert \psi \rangle \bigr\vert^2
= \bigl\vert \langle a \vert \psi \rangle \bigr\vert^2
$$

because $\vert\alpha\vert = 1$.
That is, the probability for each outcome to appear is the same for both states.

Now consider what happens when any unitary operation $U$ is performed on the systems when it is in one of the two states.
In the first case, in which the initial state is $\vert \psi \rangle$, the state becomes

$$
U \vert \psi \rangle
$$

after the operation is performed, and in the second case, in which the initial state is $\vert \phi\rangle$, it becomes

$$
U \vert \phi \rangle = \alpha U \vert \psi \rangle.
$$

That is, the two resulting states still differ by the same global phase $\alpha$.

Consequently, the two quantum states $\vert\psi\rangle$ and $\vert\phi\rangle$ that differ by a global phase are completely indistinguishable:
no matter what operation, or sequence of operations, we apply to the two states, they will always differ by a global phase, and any measurement we perform will produce outcomes with precisely the same probabilities as the other.
For this reason, any two quantum state vectors that differ by a global phase are considered to be equivalent, and are effectively viewed as being the same state.

For example, the quantum states

$$
\vert - \rangle = \frac{1}{\sqrt{2}} \vert 0 \rangle - \frac{1}{\sqrt{2}} \vert 1 \rangle
\quad\text{and}\quad
-\vert - \rangle = -\frac{1}{\sqrt{2}} \vert 0 \rangle + \frac{1}{\sqrt{2}} \vert 1 \rangle
$$

differ by a global phase (which is $-1$ in this example), and are therefore considered to be the same state.

On the other hand, the quantum states

$$
\vert + \rangle = \frac{1}{\sqrt{2}} \vert 0 \rangle + \frac{1}{\sqrt{2}} \vert 1 \rangle
\quad\text{and}\quad
\vert - \rangle = \frac{1}{\sqrt{2}} \vert 0 \rangle - \frac{1}{\sqrt{2}} \vert 1 \rangle
$$

do not differ by a global phase.
Although the only difference between the two states is that a plus sign turns into a minus sign, this is not a *global* phase difference, it is a *relative* phase difference because it does not affect every vector entry, but only a proper subset of the entries.
This is consistent with what we have already observed in Lesson 1, which is that the states $\vert + \rangle$ and $\vert - \rangle$ can be discriminated perfectly — performing a Hadamard operation and then measuring yields outcome probabilities as follows:

$$
\begin{aligned}
\bigl\vert \langle 0 \vert H \vert + \rangle \bigr\vert^2 = 1 & \hspace{1cm} 
\bigl\vert \langle 0 \vert H \vert - \rangle \bigr\vert^2 = 0 \\[1mm]
\bigl\vert \langle 1 \vert H \vert + \rangle \bigr\vert^2 = 0 & \hspace{1cm} 
\bigl\vert \langle 1 \vert H \vert - \rangle \bigr\vert^2 = 1.
\end{aligned}
$$

Here, by the way, we find another advantage of the general description of quantum information based on density matrices over the simplified description based on quantum state vectors.
In the general description of quantum information, the degeneracy in which two quantum state vectors can differ by a global phase, and hence effectively represent the same quantum state, disappears: any two density matrices that differ in any way necessarily represent two distinct quantum states that can be discriminated in a statistical sense.

### 3.2 No-cloning theorem

A theorem known as the *no-cloning theorem* establishes that it is not possible to create a perfect copy of an unknown quantum state.
Here is a statement of the theorem.

<p style="padding-left: 3em; padding-right: 3em;">
    <strong>Theorem (No-cloning theorem).</strong>
    Let $\mathsf{X}$ and $\mathsf{Y}$ be systems sharing the same classical state set $\Sigma$ having at least 
    two elements.
    There does not exist a quantum state $\vert \phi\rangle$ of $\mathsf{Y}$ and a unitary operation $U$ on the
    pair $(\mathsf{X},\mathsf{Y})$ such that
</p>

$$
  U \bigl( \vert \psi \rangle \otimes \vert\phi\rangle\bigr)
  = \vert \psi \rangle \otimes \vert\psi\rangle
  \tag{3.1}
$$

<p style="padding-left: 3em; padding-right: 3em;">
  for every state $\vert \psi \rangle$ of $\mathsf{X}$.
</p>
    
That is, there is no way to initialize the system $\mathsf{Y}$ (to any state $\vert\phi\rangle$ whatsoever) and perform a unitary operation $U$ on the joint system $(\mathsf{X},\mathsf{Y})$ so that the effect is for the state $\vert\psi\rangle$ of $\mathsf{X}$ to be *cloned* — resulting in $(\mathsf{X},\mathsf{Y})$ being in the state
$\vert \psi \rangle \otimes \vert\psi\rangle$.

The proof of this theorem is actually quite simple: it boils down to the observation that the mapping

$$
\vert\psi\rangle \otimes \vert \phi\rangle\mapsto\vert\psi\rangle \otimes \vert \psi\rangle
$$ 

is not linear in $\vert\psi\rangle$.
In particular, because $\Sigma$ has at least two elements, we may choose $a,b\in\Sigma$ with
$a\not=b$.
Now, if there did exist a quantum state $\vert \phi\rangle$ of $\mathsf{Y}$ and a unitary operation $U$ on the pair
$(\mathsf{X},\mathsf{Y})$ for which $(3.1)$ is true for every quantum state $\vert\psi\rangle$ of $\mathsf{X}$, then it would be the case that

$$
U \bigl( \vert a \rangle \otimes \vert\phi\rangle\bigr)
= \vert a \rangle \otimes \vert a\rangle
\quad\text{and}\quad
U \bigl( \vert b \rangle \otimes \vert\phi\rangle\bigr)
= \vert b \rangle \otimes \vert b\rangle.
$$

By linearity, meaning specifically the linearity of the tensor product in the first argument and the linearity of matrix-vector multiplication in the second (vector) argument, we must therefore have

$$
U \biggl(\biggl( \frac{1}{\sqrt{2}}\vert a \rangle + \frac{1}{\sqrt{2}} \vert b\rangle \biggr) \otimes \vert\phi\rangle\biggr)
= \frac{1}{\sqrt{2}} \vert a \rangle \otimes \vert a\rangle
+ \frac{1}{\sqrt{2}} \vert b \rangle \otimes \vert b\rangle.
$$

However, the requirement that $(clone)$ is true for every quantum state $\vert\psi\rangle$ demands that

$$
\begin{aligned}
  & U \biggl(\biggl( \frac{1}{\sqrt{2}}\vert a \rangle + \frac{1}{\sqrt{2}} \vert b\rangle \biggr) 
  \otimes \vert\phi\rangle\biggr)\\
  & \qquad = \biggl(\frac{1}{\sqrt{2}} \vert a \rangle + \frac{1}{\sqrt{2}} \vert b \rangle\biggr)
  \otimes \biggl(\frac{1}{\sqrt{2}} \vert a \rangle + \frac{1}{\sqrt{2}} \vert b \rangle\biggr)\\
  & \qquad = \frac{1}{2} \vert a \rangle \otimes \vert a\rangle
  + \frac{1}{2} \vert a \rangle \otimes \vert b\rangle
  + \frac{1}{2} \vert b \rangle \otimes \vert a\rangle
  + \frac{1}{2} \vert b \rangle \otimes \vert b\rangle\\
  & \qquad \not= \frac{1}{\sqrt{2}} \vert a \rangle \otimes \vert a\rangle 
  + \frac{1}{\sqrt{2}} \vert b \rangle \otimes \vert b\rangle
\end{aligned}
$$

Therefore there cannot exist a state $\vert \phi\rangle$ and a unitary operation $U$ for which the equation $(clone)$ is true for every quantum state vector $\vert \psi\rangle$.

A few remarks concerning the no-cloning theorem are in order.
The first one is that the statement of the no-cloning theorem above is absolute, in the sense that it states that *perfect* cloning is impossible — but it does not say anything about possibly cloning with limited accuracy, where we might succeed in producing an approximate clone (with respect to some way of measuring how similar two different quantum states might be).
There are, in fact, statements of the no-cloning theorem that place limitations on approximate cloning, as well as methods to achieve approximate cloning (with limited accuracy), but we will delay this discussion to a later lesson when the pieces needed to explain approximate cloning are in place.

The second remark is that the no-cloning theorem is a statement about the impossibility of cloning an *arbitrary* state $\vert\psi\rangle$.
We can easily create a clone of any standard basis state, for instance.
For example, we can clone a qubit standard basis state using a controlled-NOT operation:

![Classical copy](images/cNOT-copy.png)

While there is no difficulty in creating a clone of a standard basis state, this does not contradict the no-cloning theorem — this approach of using a controlled-NOT gate would not succeed in creating a clone of the state $\vert + \rangle$, for instance.

One final remark about the no-cloning theorem is that it really isn't unique to quantum information, it's also impossible to clone an arbitrary probabilistic state using a classical (deterministic or probabilistic) process.
This is pretty intuitive.
Imagine someone hands you a system in some probabilistic state, but you're not sure what that probabilistic state is.
For example, maybe they randomly generated a number between $1$ and $10$, but they didn't tell you how they generated that number.
There's certainly no physical process through which you can obtain two *independent* copies of that same probabilistic state: all you have in your hands is a number between $1$ and $10$, and there just isn't enough information present for you to somehow reconstruct the probabilities for all of the other outcomes to appear.
Mathematically speaking, a version of the no-cloning theorem for probabilistic states can be proved in exactly the same way as the regular no-cloning theorem (for quantum states).
That is, cloning an arbitrary probabilistic state is a non-linear process, so it cannot possibly be represented by a stochastic matrix.

### 3.3 Non-orthogonal states cannot be perfectly discriminated

Next we will show that if we have two quantum states $\vert\psi\rangle$ and $\vert\phi\rangle$ that are not orthogonal, which means that $\langle \phi\vert\psi\rangle \not=0$, then it is not possible to discriminate them (or, in other words, to tell them apart) perfectly.
In fact, what we will show is something logically equivalent: if we do have a way to discriminate two states perfectly, without any error, then they must be orthogonal.

We will restrict our attention to quantum circuits that consist of any number of unitary gates, followed by a single standard basis measurement of the top qubit.
What we require of a quantum circuit, to say that it perfectly discriminates the states $\vert\psi\rangle$ and $\vert\phi\rangle$, is that the measurement always yields the value $0$ for one of the two states and always yields $1$ for the other state.
To be precise, we shall assume that we have a quantum circuit that operates as the following diagrams suggest:

![Discriminate psi](images/discriminate-psi.png)

![Discriminate phi](images/discriminate-phi.png)

The box labeled $U$ denotes the unitary operation representing the combined action of all of the unitary gates in our circuit, but not including the final measurement.
There is no loss of generality in assuming that the measurement outputs $0$ for $\vert\psi\rangle$ and $1$ for $\vert\phi\rangle$ — the analysis would not differ fundamentally if these output values were reversed.

Notice that in addition to the qubits that initially store either $\vert\psi\rangle$ or $\vert\phi\rangle$, the circuit is free to make use of any number of additional *workspace* qubits.
These qubits are initially each set to the $\vert 0\rangle$ state — so their combined state is denoted $\vert 0\cdots 0\rangle$ in the figures — and these qubits can be used by the circuit in any way that might be beneficial.
It is very common to make use of workspace qubits in quantum circuits like this, as we will see in the next unit.

Now, consider what happens when we run our circuit on the state $\vert\psi\rangle$ (along with the initialized workspace qubits).
The resulting state, immediately prior to the measurement being performed, can be written as

$$
U \bigl(  \vert 0\cdots 0 \rangle \vert \psi \rangle\bigr) 
= \vert \gamma_0\rangle\vert 0 \rangle + \vert \gamma_1 \rangle\vert 1 \rangle 
$$

for two vectors $\vert \gamma_0\rangle$ and $\vert \gamma_1\rangle$ that correspond to all of the qubits except the top qubit.
In general, for such a state the probabilities that a measurement of the top qubit yields the outcomes $0$ and $1$ are as follows:

$$
\operatorname{Pr}(\text{outcome is $0$}) = \bigl\| \vert\gamma_0\rangle \bigr\|^2
\qquad\text{and}\qquad
\operatorname{Pr}(\text{outcome is $1$}) = \bigl\| \vert\gamma_1\rangle \bigr\|^2.
$$

Because we assume that our circuit always outputs $0$ for the state $\vert\psi\rangle$, it must be that $\vert\gamma_1\rangle = 0,$ and so

$$
U \bigl( \vert 0\cdots 0\rangle\vert \psi \rangle  \bigr) 
= \vert\gamma_0\rangle\vert 0 \rangle.
$$

Multiplying both sides of this equation by $U^{\dagger}$ yields this equation:

$$
\vert 0\cdots 0\rangle\vert \psi \rangle   
= U^{\dagger} \bigl( \vert \gamma_0\rangle\vert 0 \rangle \bigr).
\tag{3.2}
$$

Reasoning similarly for $\vert\phi\rangle$ in place of $\vert\psi\rangle$, we conclude that

$$
U \bigl( \vert 0\cdots 0\rangle\vert \phi \rangle  \bigr) 
=  \vert \delta_1\rangle\vert 1 \rangle
$$

for some vector $\vert\delta_1\rangle$, and therefore

$$
\vert 0\cdots 0\rangle\vert \phi \rangle   
= U^{\dagger} \bigl(  \vert \delta_1\rangle\vert 1 \rangle\bigr).
\tag{3.3}
$$

Now let us take the inner product of the vectors represented by the equations $(3.2)$ and $(3.3)$, starting with the representations on the right-hand side of each equation.
We have

$$
\bigl(U^{\dagger} \bigl( \vert \gamma_0\rangle\vert 0 \rangle \bigr)\bigr)^{\dagger}
= 
\bigl( \langle\gamma_0\vert\langle 0\vert \bigr)U
$$

so the inner product of the vector $(3.2)$ with the vector $(3.3)$ is

$$
\bigl( \langle\gamma_0\vert\langle 0\vert \bigr)U U^{\dagger} \bigl(  \vert \delta\rangle\vert 1 \rangle\bigr)
= \bigl( \langle\gamma_0\vert\langle 0\vert \bigr) \bigl(  \vert \delta_1\rangle\vert 1 \rangle\bigr)
=  \langle \gamma_0 \vert \delta_1\rangle \langle 0 \vert 1 \rangle = 0.
$$

Here we have used the fact that $U U^{\dagger} = \mathbb{1}$, as well as the fact that the inner product of tensor products is the product of the inner products:

$$
\langle u \otimes v \vert w \otimes x\rangle = \langle u \vert w\rangle \langle v \vert x\rangle
$$

for any choices of these vectors (assuming $\vert u\rangle$ and $\vert w\rangle$ have the same number of entries
and $\vert v\rangle$ and $\vert x\rangle$ have the same number of entries, so that it makes sense to form the inner products $\langle u\vert w\rangle$ and $\langle v\vert x \rangle$).
Notice that the value of the inner product $\langle \gamma_0 \vert \delta_1\rangle$ is irrelevant because it is multiplied by $\langle 0 \vert 1 \rangle = 0$.
This is fortunate because we really don't know much about these two vectors.

Finally, taking the inner product of the vectors $(3.2)$ and $(3.3)$ in terms of the left-hand side of the equations must result in the same zero value, and so

$$
0 = \bigl(  \langle 0\cdots 0\vert\langle \psi\vert \bigr) \vert 0\cdots 0\rangle\vert \phi\rangle\bigr)
=  \langle 0\cdots 0 \vert 0\cdots 0 \rangle \langle \psi \vert \phi \rangle = \langle \psi \vert \phi \rangle.
$$

We have concluded what we wanted, which is that $\vert \psi\rangle$ and $\vert\phi\rangle$ are orthogonal:
$\langle \psi \vert \phi \rangle = 0.$

It is possible, by the way, to perfectly discriminate any two states that are orthogonal.
Suppose that the two states to be discriminated are $\vert \phi\rangle$ and $\vert \psi\rangle$, where
$\langle \phi\vert\psi\rangle = 0$.
We can then perfectly discriminate these states by performing the projective measurement described by these matrices, for instance:

$$
\bigl\{
\vert\phi\rangle\langle\phi\vert,\,\mathbb{1} - \vert\phi\rangle\langle\phi\vert
\bigr\}.
$$

For the state $\vert\phi\rangle$, the first outcome is always obtained:

$$
\begin{aligned}
& \bigl\| \vert\phi\rangle\langle\phi\vert \vert\phi\rangle \bigr\|^2 = 
\bigl\| \vert\phi\rangle\langle\phi\vert\phi\rangle \bigr\|^2 = 
\bigl\| \vert\phi\rangle \bigr\|^2 = 1,\\[1mm]
& \bigl\| (\mathbb{1} - \vert\phi\rangle\langle\phi\vert) \vert\phi\rangle \bigr\|^2 = 
\bigl\| \vert\phi\rangle - \vert\phi\rangle\langle\phi\vert\phi\rangle \bigr\|^2 = 
\bigl\| \vert\phi\rangle - \vert\phi\rangle \bigr\|^2 = 0.
\end{aligned}
$$

And, for the state $\vert\psi\rangle$, the second outcome is always obtained:

$$
\begin{aligned}
& \bigl\| \vert\phi\rangle\langle\phi\vert \vert\psi\rangle \bigr\|^2 = 
\bigl\| \vert\phi\rangle\langle\phi\vert\psi\rangle \bigr\|^2 = 
\bigl\| 0 \bigr\|^2 = 0,\\[1mm]
& \bigl\| (\mathbb{1} - \vert\phi\rangle\langle\phi\vert) \vert\psi\rangle \bigr\|^2 = 
\bigl\| \vert\psi\rangle - \vert\phi\rangle\langle\phi\vert\psi\rangle \bigr\|^2 = 
\bigl\| \vert\psi\rangle \bigr\|^2 = 1.
\end{aligned}
$$


## 3. Three fundamental examples

In this last section of the lesson we will take a look at three fundamentally important examples.
The first two are the *teleportation* and *superdense coding* protocols, which are principally concerned with the transmission of information from a *sender* to a *receiver*.
The third example is an abstract game, called the *CHSH game*, which illustrates a phenomenon in quantum information that is sometimes referred to as *nonlocality*.

### 3.1 Teleportation

Quantum teleportation, or just teleportation for short, is a protocol where a sender (who we will name Alice) transmits a qubit to a receiver (who we will name Bob) by making use of a shared entangled quantum state (one e-bit of entanglement, specifically) along with two bits of classical communication.
The name *teleportation* is meant to be suggestive of the concept in science fiction where matter is transported from one location to another by a futuristic process, but it must be understood that nothing is physically teleported in this sense of the word;
what is teleported in this case is quantum information.

#### The protocol

The set-up is as follows.
As the result of something that happened in the past, Alice and Bob are in possession of a shared pair qubits $(\mathsf{X},\mathsf{Y})$ in the $\vert\phi^+\rangle$ state: Alice holds $\mathsf{X}$ and Bob holds $\mathsf{Y}$.
That is to say, Alice and Bob share one e-bit of entanglement.

It could be, for instance, that Alice and Bob were in the same location in the past, prepared the qubits $\mathsf{X}$ and $\mathsf{Y}$ in the state $\vert \phi^+ \rangle$ state, and then each went their own way with their qubit in hand.
Or it could be that a different process, such as one involving a third party or a complex distributive process, was used to establish this shared e-bit.
These details are not part of the teleportation protocol itself, but rather the protocol considers the shared e-bit as a resource that is available to be used.

Alice then comes in possession of a third qubit $\mathsf{Z}$ that she wishes to transmit to Bob.
The state of the qubit $\mathsf{Z}$ is considered to be *unknown* to Alice and Bob, and no assumptions can be made about it. 
Indeed, the qubit $\mathsf{Z}$ might even be entangled with other systems that neither Alice nor Bob can access. 
To say that Alice wishes to transmit the qubit $\mathsf{Z}$ to Bob means that Alice would like Bob to be holding a qubit that is in the same state as $\mathsf{Z}$, having whatever correlations with other systems that $\mathsf{Z}$ had.

Here is a diagram that describes the teleportation protocol itself:

![Teleportation circuit](images/teleportation.png)

In words, it is as follows:

1. Alice performs a controlled-NOT operation on the pair $(\mathsf{Y},\mathsf{Z})$, with $\mathsf{Z}$ being the control and $\mathsf{Y}$ being the target, and then performs a Hadamard operation on $\mathsf{Z}$.
   She then measures both $\mathsf{Y}$ and $\mathsf{Z}$, with respect to a standard basis measurement in both cases, and transmits the classical outcomes to Bob.
   Let us refer to the outcome of the measurement of $\mathsf{Y}$ as $a$ and the outcome of the measurement of $\mathsf{Z}$ as $b$.
   
2. Bob receives $a$ and $b$ from Alice, and depending on the values of these bits he performs these operations:
   - If $a = 1$, then Bob performs a bit flip (or $X$ gate) on his qubit $\mathsf{X}$.
   - If $b = 1$, then Bob performs a phase flip (or $Z$ gate) on his qubit $\mathsf{X}$.
   
   That is, conditioned on $ab$ being $00$, $01$, $10$, or $11$, Bob performs one of the operations $\mathbb{1},$ $Z,$ $X,$ or $ZX$ on the qubit $\mathsf{X}$.
   
That is the complete description of the teleportation protocol.
The analysis that follows reveals that when it is complete, the qubit $\mathsf{X}$ will be in whatever state $\mathsf{Z}$ was in prior to the protocol being executed, including whatever correlations it had with any other systems — which is to say that the protocol has effectively implemented a perfect qubit communication channel, where the state of $\mathsf{Z}$ has been "teleported" into $\mathsf{X}$.

Before proceeding to the analysis itself, let us note that this protocol does not succeed in cloning the state of $\mathsf{Z}$, which we already know is impossible by the no-cloning theorem:
when the protocol is finished, the state of the qubit $\mathsf{Z}$ will have changed from its original value to $\vert b\rangle$ as a result of the measurement performed on it.
We also note that the e-bit of entanglement has been "burned" in the process: the state of $\mathsf{Y}$ has changed to $\vert a\rangle$ and is no longer entangled with $\mathsf{Z}$ or any other system.
This is the price of teleportation.

#### Analysis of the protocol

Let us assume that the state of Alice's qubit $\mathsf{Z}$ that she wishes to teleport to Bob is $\vert\psi\rangle = \alpha\vert 0\rangle + \beta\vert 1\rangle$.
The state of $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ at the start of the protocol is therefore

$$
\begin{aligned}
& \biggl( \frac{1}{\sqrt{2}} \vert 00 \rangle + \frac{1}{\sqrt{2}} \vert 11 \rangle \biggr)
\biggl(\alpha\vert 0\rangle + \beta\vert 1\rangle \biggr)\\
& \qquad = \frac{1}{\sqrt{2}} \biggl( \alpha \vert 00\rangle \vert0 \rangle + \alpha \vert 11\rangle \vert0\rangle + \beta \vert 00\rangle \vert1\rangle + \beta \vert 11\rangle \vert1\rangle \biggr).
\end{aligned}
$$

First the controlled-NOT gate is applied, which transforms the state to

$$
\frac{1}{\sqrt{2}} \biggl( \alpha \vert 00\rangle \vert0 \rangle + \alpha \vert 11\rangle \vert0\rangle + \beta \vert 01\rangle \vert1\rangle + \beta \vert 10\rangle \vert1\rangle \biggr).
$$

Then the Hadamard gate is applied, which transforms the state to

$$
\frac{1}{\sqrt{2}} \biggl( \alpha \vert 00\rangle \vert + \rangle + \alpha \vert 11\rangle\vert +\rangle + \beta \vert 01\rangle\vert -\rangle + \beta \vert 10\rangle\vert -\rangle \biggr).
$$

Expanding the states $\vert +\rangle$ and $\vert - \rangle$ yields this expression of the state:

$$
\begin{aligned}
&
\frac{1}{\sqrt{2}} \biggl( \alpha \vert 00\rangle \vert + \rangle + \alpha \vert 11\rangle\vert +\rangle + \beta \vert 01\rangle\vert -\rangle + \beta \vert 10\rangle\vert -\rangle \biggr) \\
& \qquad = \frac{1}{2}
\bigl(
\alpha \vert 000 \rangle
+ \alpha \vert 001 \rangle
+ \alpha \vert 110 \rangle
+ \alpha \vert 111 \rangle
+ \beta \vert 010 \rangle
- \beta \vert 011 \rangle
+ \beta \vert 100 \rangle
- \beta \vert 101 \rangle
\bigr)\\
& \qquad = \frac{1}{2}
\bigl(
\alpha \vert 000 \rangle
+ \beta \vert 100 \rangle
+ \alpha \vert 001 \rangle
- \beta \vert 101 \rangle
+ \alpha \vert 110 \rangle
+ \beta \vert 010 \rangle
+ \alpha \vert 111 \rangle
- \beta \vert 011 \rangle
\bigr).
\end{aligned}
$$

Finally, by using the multilinearity of tensor products we obtain this expression of the state:

$$
\begin{aligned}
  & \frac{1}{2} \bigl(\alpha\vert 0 \rangle + \beta \vert 1\rangle \bigr)\vert 00\rangle \\[1mm]
+ & \frac{1}{2} \bigl(\alpha\vert 0 \rangle - \beta \vert 1\rangle \bigr)\vert 01\rangle \\[1mm]
+ & \frac{1}{2} \bigl(\alpha\vert 1 \rangle + \beta \vert 0\rangle \bigr)\vert 10\rangle \\[1mm]
+ & \frac{1}{2} \bigl(\alpha\vert 1 \rangle - \beta \vert 0\rangle \bigr)\vert 11\rangle.
\end{aligned}
$$

Although it might look like something magic has already happened — because the numbers $\alpha$ and $\beta$ moved, in some sense, into the part of the state we associate with the first qubit $\mathsf{X}$ — this is an illusion.
Scalars float freely through tensor products, and all we have done is to express the state in a way that facilitates an analysis of these measurements.

Now let us consider the four possible outcomes of Alice's standard basis measurements, together with the actions that Bob performs as a result:

<p style="padding-left: 3em;">
<ul><li>
The outcome of Alice's measurement is $ab = 00$ with probability
</ul>

$$
\Bigl\| \frac{1}{2}\bigl(\alpha \vert 0\rangle + \beta\vert 1\rangle\bigr) \Bigr\|^2 
= \frac{\vert\alpha\vert^2 + \vert\beta\vert^2}{4} = \frac{1}{4},
$$

<ul style="list-style-type:none"><li>
in which case the state of $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ becomes
</ul>

$$
\bigl( \alpha \vert 0 \rangle + \beta \vert 1 \rangle \bigr) \vert 00 \rangle.
$$

<ul style="list-style-type:none"><li>
Bob does nothing in this case, and so this is the final state of these three qubits.
</ul>

<ul><li>
The outcome of Alice's measurement is $ab = 01$ with probability
</ul>

$$
\Bigl\| \frac{1}{2}\bigl(\alpha \vert 0\rangle - \beta\vert 1\rangle\bigr) \Bigr\|^2 
= \frac{\vert\alpha\vert^2 + \vert{-\beta}\vert^2}{4} = \frac{1}{4},
$$

<ul style="list-style-type:none"><li>
in which case the state of $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ becomes
</ul>

$$
\bigl( \alpha \vert 0 \rangle - \beta \vert 1 \rangle \bigr) \vert 01 \rangle.
$$

<ul style="list-style-type:none"><li>
In this case Bob applies a $Z$ gate to $\mathsf{X}$, leaving $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ in the state
</ul>

$$
\bigl( \alpha \vert 0 \rangle + \beta \vert 1 \rangle \bigr) \vert 01 \rangle.
$$

<ul><li>
The outcome of Alice's measurement is $ab = 10$ with probability
</ul>

$$
\Bigl\| \frac{1}{2}\bigl(\alpha \vert 1\rangle + \beta\vert 0\rangle\bigr) \Bigr\|^2 
= \frac{\vert\alpha\vert^2 + \vert\beta\vert^2}{4} = \frac{1}{4},
$$

<ul style="list-style-type:none"><li>
in which case the state of $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ becomes
</ul>

$$
\bigl( \alpha \vert 1 \rangle + \beta \vert 0 \rangle \bigr) \vert 10 \rangle.
$$

<ul style="list-style-type:none"><li>
In this case, Bob applies an $X$ gate to the qubit $\mathsf{X}$, leaving $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ in the state
</ul>

$$
\bigl( \alpha \vert 0 \rangle + \beta \vert 1 \rangle \bigr) \vert 10 \rangle.
$$

<ul><li>
The outcome of Alice's measurement is $ab = 11$ with probability
</ul>

$$
\Bigl\| \frac{1}{2}\bigl(\alpha \vert 1\rangle - \beta\vert 0\rangle\bigr) \Bigr\|^2 
= \frac{\vert\alpha\vert^2 + \vert{-\beta}\vert^2}{4} = \frac{1}{4},
$$

<ul style="list-style-type:none"><li>
in which case the state of $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ becomes
</ul>

$$
\bigl( \alpha \vert 1 \rangle - \beta \vert 0 \rangle \bigr) \vert 11 \rangle.
$$

<ul style="list-style-type:none"><li>
In this case, Bob performs the operation $ZX$ on the qubit $\mathsf{X}$, leaving $(\mathsf{X},\mathsf{Y},\mathsf{Z})$ in the state
</ul>

$$
\bigl( \alpha \vert 0 \rangle + \beta \vert 1 \rangle \bigr) \vert 11 \rangle.
$$
</p>

Thus, we see that in all four cases, Bob's qubit $\mathsf{X}$ is left in the state
$\alpha\vert 0\rangle + \beta\vert 1\rangle$ at the end of the protocol, as we wanted to show.

Now we know that if the initial state of the qubit $\mathsf{Z}$ is $\alpha \vert 0\rangle + \beta \vert 1\rangle$, then the final state of $\mathsf{X}$ is the same state $\alpha \vert 0\rangle + \beta \vert 1\rangle$, while the qubits $\mathsf{Y}$ and $\mathsf{Z}$ are left in one of the four states $\vert 00\rangle$, $\vert 01\rangle$, $\vert 10\rangle$, or $\vert 11\rangle$, each with probability $1/4$, depending upon the measurement outcomes that Alice obtained.
Notice that Alice's measurements yield absolutely no information about the state $\alpha \vert 0\rangle + \beta \vert 1\rangle$ — the probabilities for each of the four outcomes is $1/4$, irrespective of $\alpha$ and $\beta$.
This, in fact, is essential for teleportation to work: extracting information from an unknown quantum state necessarily causes a disturbance of that state in general, but here Bob obtains the state without it being disturbed.

More generally, we may consider the possibility that the qubit $\mathsf{Z}$ was initially entangled with some other system, say $\mathsf{W}$.
Essentially the same analysis reveals that the protocol functions as suggested earlier: it acts like a communication channel, so that at the end of the protocol the qubit $\mathsf{X}$ is entangled with $\mathsf{W}$ in the same way that $\mathsf{Z}$ was at the start of the protocol.

Let us suppose that the state of the pair $(\mathsf{Z},\mathsf{W})$ is initially given by a quantum state vector of the form

$$
\vert 0 \rangle_{\mathsf{Z}} \vert \phi_0\rangle_{\mathsf{W}} 
+ \vert 1 \rangle_{\mathsf{Z}} \vert \phi_1\rangle_{\mathsf{W}}.
$$

To analyze what happens when the teleportation protocol is run, it is helpful to permute the systems: we will consider the state of the systems in the order $(\mathsf{X},\mathsf{W},\mathsf{Y},\mathsf{Z})$.
At the start of the protocol, the state of these systems is as follows:

$$
\begin{aligned}
& \biggl( 
  \frac{1}{\sqrt{2}} \vert 00 \rangle_{\mathsf{XY}} + \frac{1}{\sqrt{2}} \vert 11 \rangle_{\mathsf{XY}} \biggr)
  \biggl(
  \vert 0\rangle_{\mathsf{Z}}\vert\phi_0\rangle_{\mathsf{W}} 
  + \vert 1\rangle_{\mathsf{Z}}\vert\phi_1\rangle_{\mathsf{W}}\biggr)\\
& \qquad = \frac{1}{\sqrt{2}} 
\biggl( 
  \vert 0\rangle_{\mathsf{X}} \vert \phi_0 \rangle_{\mathsf{W}}\vert 00\rangle_{\mathsf{YZ}} 
+ \vert 1\rangle_{\mathsf{X}} \vert \phi_0 \rangle_{\mathsf{W}}\vert 10\rangle_{\mathsf{YZ}}
+ \vert 0\rangle_{\mathsf{X}} \vert \phi_1 \rangle_{\mathsf{W}}\vert 01\rangle_{\mathsf{YZ}}
+ \vert 1\rangle_{\mathsf{X}} \vert \phi_1 \rangle_{\mathsf{W}}\vert 11\rangle_{\mathsf{YZ}}
\biggr)
\end{aligned}
$$

First the controlled-NOT gate is applied, which transforms the state to

$$
\frac{1}{\sqrt{2}} \biggl( 
  \vert 0\rangle_{\mathsf{X}} \vert\phi_0 \rangle_{\mathsf{W}} \vert 0\rangle_{\mathsf{Y}} \vert 0\rangle_{\mathsf{Z}} 
+ \vert 1\rangle_{\mathsf{X}} \vert\phi_0 \rangle_{\mathsf{W}} \vert 1\rangle_{\mathsf{Y}} \vert 0\rangle_{\mathsf{Z}} 
+ \vert 0\rangle_{\mathsf{X}} \vert\phi_1 \rangle_{\mathsf{W}} \vert 1\rangle_{\mathsf{Y}} \vert 1\rangle_{\mathsf{Z}} 
+ \vert 1\rangle_{\mathsf{X}} \vert\phi_1 \rangle_{\mathsf{W}} \vert 0\rangle_{\mathsf{Y}} \vert 1\rangle_{\mathsf{Z}} 
\biggr)
$$

Then the Hadamard gate is applied, which transforms the state to

$$
\frac{1}{\sqrt{2}} \biggl( 
  \vert 0\rangle_{\mathsf{X}}\vert \phi_0 \rangle_{\mathsf{W}} \vert 0\rangle_{\mathsf{Y}} \vert +\rangle_{\mathsf{Z}} 
+ \vert 1\rangle_{\mathsf{X}}\vert \phi_0 \rangle_{\mathsf{W}} \vert 1\rangle_{\mathsf{Y}} \vert +\rangle_{\mathsf{Z}} 
+ \vert 0\rangle_{\mathsf{X}}\vert \phi_1 \rangle_{\mathsf{W}} \vert 1\rangle_{\mathsf{Y}} \vert -\rangle_{\mathsf{Z}} 
+ \vert 1\rangle_{\mathsf{X}}\vert \phi_1 \rangle_{\mathsf{W}} \vert 0\rangle_{\mathsf{Y}} \vert -\rangle_{\mathsf{Z}} 
\biggr)
$$

If we expand this state and then simplify using basic facts about tensor products that were covered in the previous lesson, we obtain this expression of the state:

$$
\begin{aligned}
  & \frac{1}{2}
  \bigl(\vert 0\rangle_{\mathsf{X}}\vert\phi_0\rangle_{\mathsf{W}} 
  + \vert 1\rangle_{\mathsf{X}}\vert\phi_1\rangle_{\mathsf{W}} \bigr) 
  \vert 00\rangle_{\mathsf{YZ}}\\[1mm] 
+\, & \frac{1}{2}
  \bigl(\vert 0\rangle_{\mathsf{X}}\vert\phi_0\rangle_{\mathsf{W}} 
  - \vert 1\rangle_{\mathsf{X}}\vert\phi_1\rangle_{\mathsf{W}} \bigr)
  \vert 01\rangle_{\mathsf{YZ}}\\[1mm]
+\, & \frac{1}{2}
  \bigl(\vert 1\rangle_{\mathsf{X}}\vert\phi_0\rangle_{\mathsf{W}} 
  + \vert 0\rangle_{\mathsf{X}}\vert\phi_1\rangle_{\mathsf{W}} \bigr)
  \vert 10\rangle_{\mathsf{YZ}}\\[1mm]
+\, & \frac{1}{2}
  \bigl(\vert 1\rangle_{\mathsf{X}}\vert\phi_0\rangle_{\mathsf{W}} 
  - \vert 0\rangle_{\mathsf{X}}\vert\phi_1\rangle _{\mathsf{W}}\bigr)
  \vert 11\rangle_{\mathsf{YZ}}.
\end{aligned}
$$

Proceeding as we did above, where we consider the four different possible outcomes of Alice's measurements along with the corresponding actions performed by Bob, we find that at the end of the protocol, the state of $(\mathsf{X},\mathsf{Z})$ is always

$$
\vert 0 \rangle \vert \phi_0\rangle + \vert 1 \rangle \vert \phi_1\rangle.
$$

So, teleportation succeeds in creating a perfect quantum communication channel, effectively transmitting the contents of the qubit $\mathsf{X}$ into $\mathsf{Z}$ and preserving all correlations with any other systems.

(*** Qiskit implementation)

### 3.2 Superdense coding

Superdense coding is a protocol that, in some sense, achieves a complementary aim to teleportation: rather than allowing for the transmission of one qubit using two classical bits of communication (at the cost of one e-bit of entanglement), it allows for the transmission of two classical bits using one qubit of quantum communication (again at the cost of one e-bit of entanglement).

In greater detail, we again have a sender (Alice) and a receiver (Bob), and we assume that Alice and Bob share one e-bit of entanglement — which is to say that Alice holds a qubit $\mathsf{Y}$, Bob holds a qubit $\mathsf{X}$, and the state of the pair $(\mathsf{X},\mathsf{Y})$ is $\vert \phi^+\rangle$.
This time, Alice wishes to transmit *two classical bits* to Bob, and she will accomplish this by sending him one qubit.

Let us denote the two bits that Alice wishes to transmit to Bob by $a$ and $b$, so we have $a,b\in\{0,1\}$.
Here's what Alice does:

1. Alice first checks to see if $b=1$. If $b=1$, she performs a $Z$ gate on her qubit $\mathsf{Y}$ (and if $b=0$ she doesn't).

2. Alice then checks to see if $a=1$. If $a=1$, she performs an $X$ gate on her qubit $\mathsf{Y}$ (and if $a=0$ she doesn't).

Alice then sends her qubit $\mathsf{Y}$ to Bob.

What Bob does when he receives the qubit $\mathsf{Y}$ is to first perform a controlled-NOT gate, with $\mathsf{Y}$ being the control and $\mathsf{X}$ being the target, and then to apply a Hadamard gate to $\mathsf{Y}$.
He then measures $\mathsf{X}$ to obtain $a$ and $\mathsf{Y}$ to obtain $b$, both meaning standard basis measurements.

The following circuit diagram illustrates the protocol:

![Superdense coding circuit](images/superdense-coding.png)

The idea behind this protocol is actually pretty simple: by applying $\mathbb{1}$, $X$, $Z$, or $XZ$ to $\mathsf{Y}$, Alice can shift the state of $(\mathsf{X},\mathsf{Y})$ to any one of the four Bell states that she chooses:

$$
\begin{aligned}
(\mathbb{1} \otimes \mathbb{1}) \vert \phi^+ \rangle & = \vert \phi^+\rangle \\
(\mathbb{1} \otimes Z) \vert \phi^+ \rangle & = \vert \phi^-\rangle \\
(\mathbb{1} \otimes X) \vert \phi^+ \rangle & = \vert \psi^+\rangle \\
(\mathbb{1} \otimes XZ) \vert \phi^+ \rangle & = \vert \psi^-\rangle
\end{aligned}
$$

Bob's actions have the following effects on the four Bell states:

$$
\begin{aligned}
\vert \phi^+\rangle & \mapsto \vert 00\rangle\\
\vert \phi^-\rangle & \mapsto \vert 01\rangle\\
\vert \psi^+\rangle & \mapsto \vert 10\rangle\\
\vert \psi^-\rangle & \mapsto -\vert 11\rangle\\
\end{aligned}
$$

It is then a matter of checking each case:

 - If $ab = 00,$ then the state of $(\mathsf{X},\mathsf{Y})$ when Bob receives $\mathsf{Y}$ is $\vert \phi^+\rangle.$ He transforms this state into $\vert 00\rangle$ and obtains $ab = 00.$

 - If $ab = 01,$ then the state of $(\mathsf{X},\mathsf{Y})$ when Bob receives $\mathsf{Y}$ is $\vert \phi^-\rangle.$ He transforms this state into $\vert 01\rangle$ and obtains $ab = 01.$

 - If $ab = 10,$ then the state of $(\mathsf{X},\mathsf{Y})$ when Bob receives $\mathsf{Y}$ is $\vert \phi^+\rangle.$ He transforms this state into $\vert 10\rangle$ and obtains $ab = 10.$

 - If $ab = 11,$ then the state of $(\mathsf{X},\mathsf{Y})$ when Bob receives $\mathsf{Y}$ is $\vert \phi^+\rangle.$ He transforms this state into $-\vert 11\rangle$ and obtains $ab = 11.$ (The negative-one phase factor doesn't have any effect here.)

(*** Qiskit implementation)

### 3.3 The CHSH game

The last example we will discuss in this lesson is an example of a game called the CHSH game.
It is named after John Clauser, Michael Horne, Abner Shimony, and Richard Holt.
Although they did not describe it as a game, this is a very natural and intuitive way to view it.
Specifically, it is an example of a *cooperative game*, where two individuals (Alice and Bob) work together to achieve a particular outcome.

The basic assumptions behind the game are as follows:

 - Alice and Bob are in different locations and cannot communicate with one another.
   (Perhaps they are on distant planets and the impossibility of faster than light communication denies
   them the opportunity to communicate while the game is played.)

 - They can, however, prepare in advance for the game anyway that they wish, including the possibility that they
   share a joint quantum state of two systems and agree on exactly how they will coordinate their actions.

And here is the CHSH game itself:

 - Alice is presented with a bit $x$ and Bob is presented with a bit $y$.
   Both $x$ and $y$ are generated uniformly at random.
   
 - Alice must respond with a bit $a$ and Bob must respond with a bit $b$.
 
 - Alice and Bob win if $a \oplus b = x \wedge y$ and they lose if $a \oplus b \not= x \wedge y.$
   
Just like earlier in the lesson, $a \oplus b$ is the XOR of $a$ and $b$ and $x\wedge y$ is the AND of $x$ and $y$.
   
The challenge for Alice and Bob lies in the fact that $x$ and $y$ are chosen randomly, so despite the fact that they may have coordinated their strategies, neither of them has any knowledge of the bit that the other one received.
An alternative way of describing the winning condition is that their answer bits $a$ and $b$ should be equal, except in the case that both $x$ and $y$ are equal to $1$, in which case $a$ and $b$ should not be equal.

#### Classical strategies

If we think first about *deterministic* strategies for Alice and Bob, where they determine their answers according to functions $a = f(x)$ and $b = g(x)$, there aren't that many possibilities: there are $16$ of them — Alice and Bob each choose one of the four functions from one bit to one bit.
Going through them one by one reveals that none of them wins the game every time: for at least one of the four choices of $(x,y)\in\{(0,0),(0,1),(1,0),(1,1)\}$, the answers $a=f(x)$ and $b=g(y)$ will lead to a loss.
To be precise, the largest winning probability for any deterministic strategy of the form just described is $3/4$.

We can also reason this analytically.
Let us assume toward contradiction that there is a deterministic strategy, represented by functions $f$ and $g$ from one bit to one bit as described above, that wins for every $(x,y)\in\{(0,0),(0,1),(1,0),(1,1)\}$.

Because the strategy wins for $(x,y) = (0,0)$, we see that $f(0) = g(0)$ — Alice and Bob's answers must agree in this case.
Similarly, because it wins for $(x,y) = (0,1)$ we conclude $f(0) = g(1)$, and because it wins for $(x,y)=(1,0)$ we conclude $f(1) = g(0)$.
Thus, $f(0) = g(0) = f(1) = g(1)$.
However, because the strategy wins for $(x,y) = (1,1)$, we conclude that $f(1) \not= g(1)$.

At this point we have obtained a contradiction: $f(1) = g(1)$ and $f(1)\not=g(1)$.
That contradiction stemmed from the assumption of a strategy that always wins, and so we conclude that there cannot be such a strategy.

An example of a strategy that wins for three out of the four possible choices of
$(x,y)\in\{(0,0),(0,1),(1,0),(1,1)\}$ is one where Alice and Bob both output 0 all of the time, regardless of $x$ and $y$.
The only input on which this strategy loses is $(x,y) = (1,1)$.

So, a deterministic strategy, where Alice and Bob answer according to functions $f$ and $g$, wins with probability at most $3/4$, and there are strategies that achieve this winning probability.

But could a *probabilistic* strategy do better?
What we mean is a strategy where Alice and Bob still can't communicate, but where they can use randomness.
We also allow them to agree beforehand on a shared random state, meaning that they each have a system and the two systems are somehow randomly correlated.

The answer is that such a strategy cannot do better.
Just like we observed when we remarked that probabilistic operations can be thought of as random choices of deterministic operations in Lesson 1, any sort of probabilistic strategy of the sort just suggested can always be viewed as a random selection of a deterministic strategy.
That means we're effectively just averaging over a collection of strategies that each win with probability at most $3/4$, so the end result is that our probabilistic strategy wins with probability at most $3/4$.
In simple terms, the average is never larger than the maximum.



#### A better quantum strategy

We will now describe a quantum strategy for the CHSH game that wins with a higher probability than $3/4$.

The strategy makes use of these four unitary matrices:

$$
\begin{aligned}
A_0 & = 
\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}\\[2mm]
A_1 & = 
\begin{pmatrix}
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\\
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{pmatrix}\\[2mm]
B_0 & = 
\begin{pmatrix}
\frac{\sqrt{2 + \sqrt{2}}}{2} & -\frac{\sqrt{2 - \sqrt{2}}}{2}\\
\frac{\sqrt{2 - \sqrt{2}}}{2} & \frac{\sqrt{2 + \sqrt{2}}}{2}
\end{pmatrix}\\[2mm]
B_1 & = 
\begin{pmatrix}
\frac{\sqrt{2 + \sqrt{2}}}{2} & \frac{\sqrt{2 - \sqrt{2}}}{2}\\
-\frac{\sqrt{2 - \sqrt{2}}}{2} & \frac{\sqrt{2 + \sqrt{2}}}{2}
\end{pmatrix}
\end{aligned}
$$

The strategy is as follows, assuming as before that Alice's input bit is $x$ and Bob's input bit is $y$:

 - Prior to the start of the game, Alice and Bob share a pair of qubits $(\mathsf{X},\mathsf{Y})$ in the $\vert\phi^+\rangle$ state.

 - Alice performs the unitary operation $A_x$ on her qubit $\mathsf{X}$, measures with respect to the standard basis, and outputs the result.

 - Bob performs the unitary operation $B_y$ on his qubit $\mathsf{Y}$, measures with respect to the standard basis, and outputs the result.

Here is a quantum circuit diagram that illustrates this strategy:

![CHSH game circuit](images/CHSH.png)

#### Analysis of the strategy

Now we will figure out exactly how well the strategy performs.
First let us write

$$
\alpha =  \frac{\sqrt{2 + \sqrt{2}}}{2} \qquad\text{and}\qquad
\beta = \frac{\sqrt{2 - \sqrt{2}}}{2},
$$

which allows us to express $B_0$ and $B_1$ more succinctly as

$$
B_0 = 
\begin{pmatrix}
\alpha & -\beta\\
\beta & \alpha
\end{pmatrix}
\quad\text{and}\quad
B_1 = 
\begin{pmatrix}
\alpha & \beta\\
-\beta & \alpha
\end{pmatrix}.
$$

It is the case that

$$
\alpha = \cos\Bigl(\frac{\pi}{8}\Bigr) 
\qquad \text{and}\qquad 
\beta = \sin\Bigl(\frac{\pi}{8}\Bigr);
$$

these are both positive and real numbers and we can approximate them as $\alpha \approx 0.9239$ and $\beta \approx 0.3827$.
Rewriting the matrices $B_0$ and $B_1$ again as

$$
\begin{aligned}
& B_0 = 
\begin{pmatrix}
  \cos\bigl(\frac{\pi}{8}\bigr) & -\sin\bigl(\frac{\pi}{8}\bigr)\\
  \sin\bigl(\frac{\pi}{8}\bigr) & \cos\bigl(\frac{\pi}{8}\bigr)
\end{pmatrix}\\[2mm]
& B_1 = 
\begin{pmatrix}
\cos\bigl(\frac{\pi}{8}\bigr) & \sin\bigl(\frac{\pi}{8}\bigr)\\
-\sin\bigl(\frac{\pi}{8}\bigr) & \cos\bigl(\frac{\pi}{8}\bigr)
\end{pmatrix}
=
\begin{pmatrix}
\cos\bigl(-\frac{\pi}{8}\bigr) & -\sin\bigl(-\frac{\pi}{8}\bigr)\\
\sin\bigl(-\frac{\pi}{8}\bigr) & \cos\bigl(-\frac{\pi}{8}\bigr)
\end{pmatrix}
\end{aligned}
$$

reveals that $B_0$ represents a rotation by the angle $\pi/8$ while $B_1$ represents a rotation by the angle $-\pi/8$.
In general, a rotation by the angle $\theta$ is represented by the matrix

$$
R_{\theta} = 
\begin{pmatrix}
  \cos(\theta) & -\sin(\theta)\\
  \sin(\theta) & \cos(\theta)
\end{pmatrix}.
$$

As an aside, from this general form for a rotation matrix we also see that $A_0$ and $A_1$ are also rotations:
$A_0 = R_{0}$ and $A_1 = R_{\pi/4}$.

Now, we immediately obtain from the *Pythagorean identity* $\sin^2(\theta) + \cos^2(\theta) = 1,$ which is true for any angle $\theta,$ that $\alpha^2 + \beta^2 = 1.$
We can also verify this directly by squaring the expressions above and adding:

$$
\alpha^2 + \beta^2 = \frac{2 + \sqrt{2}}{4} + \frac{2 - \sqrt{2}}{4} = 1.
$$

We can also use the *double angle formula*

$$
2\sin(\theta)\cos(\theta) = \sin(2\theta),
$$

which again is true for any angle $\theta,$ to conclude that

$$
\alpha\beta = \frac{\sin\bigl(\frac{\pi}{4}\bigr)}{2} = \frac{1}{2\sqrt{2}}.
$$

Again, this can also be verified directly:

$$
\alpha\beta = \frac{\sqrt{(2 + \sqrt{2})(2 - \sqrt{2})}}{4} 
= \frac{\sqrt{4-2}}{4}
= \frac{1}{2\sqrt{2}}.
$$

Putting these observations together, we can conclude that these two equations are true:

$$
\frac{(\alpha + \beta)^2}{2}
= 
\frac{\alpha^2 + \beta^2}{2} + \alpha\beta
= \frac{1}{2} + \frac{1}{2\sqrt{2}} = \alpha^2
$$

and

$$
\frac{(\alpha - \beta)^2}{2}
= 
\frac{\alpha^2 + \beta^2}{2} - \alpha\beta
= \frac{1}{2} - \frac{1}{2\sqrt{2}} = \beta^2.
$$

We will make use of these two equations in just a few moments.

Next let us expand the following tensor products:

$$
\begin{aligned}
A_0 \otimes B_0 =
\begin{pmatrix}
\alpha & -\beta & 0 & 0 \\
\beta & \alpha & 0 & 0 \\
 0 & 0 & \alpha & -\beta \\
 0 & 0 & \beta & \alpha 
\end{pmatrix} & \qquad
A_1 \otimes B_0  =
\frac{1}{\sqrt{2}}\begin{pmatrix}
\alpha & -\beta & -\alpha & \beta \\
\beta & \alpha & -\beta & -\alpha \\
 \alpha & -\beta & \alpha & -\beta \\
 \beta & \alpha & \beta & \alpha 
\end{pmatrix} \\[3mm]
A_0 \otimes B_1 =
\begin{pmatrix}
\alpha & \beta & 0 & 0 \\
-\beta & \alpha & 0 & 0 \\
 0 & 0 & \alpha & \beta \\
 0 & 0 & -\beta & \alpha 
\end{pmatrix} &\qquad
A_1 \otimes B_1 =
\frac{1}{\sqrt{2}}\begin{pmatrix}
\alpha & \beta & -\alpha & -\beta \\
-\beta & \alpha & \beta & -\alpha \\
 \alpha & \beta & \alpha & \beta \\
 -\beta & \alpha & -\beta & \alpha 
\end{pmatrix}
\end{aligned}
$$

We can then calculate the state of Alice and Bob's qubits $(\mathsf{X},\mathsf{Y})$ immediately before they are measured by summing the first and last columns and dividing by $\sqrt{2}$.
This allows us to compute the probabilities of winning in each case, as we will now do.

<p style="padding-left: 3em;">
<ul><li>
If $(x,y) = (0,0)$ then the state of Alice and Bob's qubits immediately prior to measurement is
</ul>

$$
(A_0 \otimes B_0) \vert \phi^+\rangle = \frac{1}{\sqrt{2}} \bigl( \alpha \vert 00\rangle + \beta \vert 01 \rangle - \beta \vert 10 \rangle + \alpha \vert 11\rangle\bigr).
$$

<ul style="list-style-type:none"><li>
The probability that their measurements output the same value (which could be either 0 or 1) is therefore
</ul>

$$
\biggl\vert \frac{\alpha}{\sqrt{2}}\biggr\vert^2 + \biggl\vert \frac{\alpha}{\sqrt{2}}\biggr\vert^2 = \alpha^2
$$

<ul style="list-style-type:none"><li>
while the probability they output different values is
</ul>

$$
\biggl\vert \frac{\beta}{\sqrt{2}} \biggr\vert^2 
+ \biggl\vert - \frac{\beta}{\sqrt{2}}\biggr\vert^2 = \beta^2.
$$

<ul style="list-style-type:none"><li>
In the case $(x,y) = (0,0)$, Alice and Bob win if their bits agree and lose if their bits disagree. 
Therefore, they win with probability $\alpha^2$ and lose with probability $\beta^2$.
</ul>

<ul><li>
If $(x,y) = (0,1)$ then the state of Alice and Bob's qubits immediately prior to measurement is
</ul>

$$
(A_0 \otimes B_1) \vert \phi^+\rangle = \frac{1}{\sqrt{2}} \bigl( \alpha \vert 00\rangle - \beta \vert 01 \rangle
+ \beta \vert 10 \rangle + \alpha \vert 11\rangle\bigr)
$$

<ul style="list-style-type:none"><li>
The probability that their measurements output the same value is therefore
</ul>

$$
\biggl\vert \frac{\alpha}{\sqrt{2}}\biggr\vert^2 + \biggl\vert \frac{\alpha}{\sqrt{2}}\biggr\vert^2 = \alpha^2
$$

<ul style="list-style-type:none"><li>
while the probability they output different values is
</ul>

$$
\biggl\vert -\frac{\beta}{\sqrt{2}}\biggr\vert^2 + \biggl\vert \frac{\beta}{\sqrt{2}}\biggr\vert^2 = \beta^2.
$$

<ul style="list-style-type:none"><li>
In the case $(x,y) = (0,1)$, Alice and Bob win if their bits agree and lose if their bits disagree.
Therefore, they win with probability $\alpha^2$ and lose with probability $\beta^2$.
</ul>

<ul><li>
If $(x,y) = (1,0)$ then the state of Alice and Bob's qubits immediately prior to measurement is
</ul>

$$
(A_1 \otimes B_0) \vert \phi^+\rangle = \frac{1}{2} \bigl( (\alpha+\beta) \vert 00\rangle + (\beta-\alpha) \vert 01    \rangle
+ (\alpha-\beta) \vert 10 \rangle + (\alpha+\beta) \vert 11\rangle\bigr)
$$

<ul style="list-style-type:none"><li>
The probability that their measurements output the same value is therefore
</ul>

$$
\biggl\vert \frac{\alpha+\beta}{2}\biggr\vert^2 + \biggl\vert \frac{\alpha+\beta}{2}\biggr\vert^2  
= \frac{(\alpha + \beta)^2}{2} = \alpha^2
$$

<ul style="list-style-type:none"><li>
while the probability that their measurements output different values is
</ul>

$$
\biggl\vert \frac{\beta - \alpha}{2}\biggr\vert^2 + \biggl\vert \frac{\alpha-\beta}{2}\biggr\vert^2  
= \frac{(\alpha - \beta)^2}{2} = \beta^2.
$$

<ul style="list-style-type:none"><li>
In the case $(x,y) = (1,0)$, Alice and Bob win if their bits agree and lose if their bits disagree.
Therefore, they win with probability $\alpha^2$ and lose with probability $\beta^2$.
</ul>

<ul><li>
If $(x,y) = (1,1)$ then the state of Alice and Bob's qubits immediately prior to measurement is
</ul>

$$
(A_1 \otimes B_1) \vert \phi^+\rangle = \frac{1}{2} \bigl( (\alpha-\beta) \vert 00\rangle - (\alpha+\beta) \vert 01    \rangle
+ (\alpha+\beta) \vert 10 \rangle + (\alpha-\beta) \vert 11\rangle\bigr)
$$

<ul style="list-style-type:none"><li>
The probability that their measurements output the same value is therefore
</ul>

$$
\biggl\vert \frac{\alpha-\beta}{2}\biggr\vert^2 + \biggl\vert \frac{\alpha-\beta}{2}\biggr\vert^2  
= \frac{(\alpha - \beta)^2}{2}
= \beta^2  
$$

<ul style="list-style-type:none"><li>
while the probability that their measurements output different values is
</ul>

$$
\biggl\vert \frac{-(\alpha + \beta)}{2}\biggr\vert^2 + \biggl\vert \frac{\alpha+\beta}{2}\biggr\vert^2  
= \frac{(\alpha + \beta)^2}{2} = \alpha^2.
$$

<ul style="list-style-type:none"><li>
In the case $(x,y) = (1,1)$, Alice and Bob win if their bits disagree and lose if their bits agree — which is the only one of the four cases for which this is true.
Therefore, they win with probability $\alpha^2$ and lose with probability $\beta^2$.
</ul>
</p>

Thus, in all four cases, Alice and Bob win the game with probability

$$
\alpha^2 = \frac{1}{2} + \frac{1}{2\sqrt{2}} \approx 0.85355
$$

and lose with probability

$$
\beta^2 = \frac{1}{2} - \frac{1}{2\sqrt{2}} \approx 0.14645,
$$

so these are the probabilities of winning and losing overall.
As it turns out for this particular strategy, it does not actually matter for the analysis that each possible pair $(x,y)\in\{(0,0),(0,1),(1,0),(1,1)\}$ is chosen with probability $1/4$ — they win with probability $\alpha^2$ and lose with probability $\beta^2$ regardless of which $x$ and $y$ are selected.