# Classical information problem set

## Overview

### What is the purpose of these problems?

One of the most effective ways to learn is to engage actively with the content of interest. By working through some of these problems, you will have the opportunity to

1. hone your skills and knowledge, and
1. receive feedback so that you can make corrections to your understanding.

### Why *classical* information?

Why should you spend time learning about *classical* information, rather than *quantum*? Mastering the material covered in these activities will serve as an excellent foundation for understanding quantum information and more advanced topics. Indeed, one of the central themes of this course is the **surprising mathematical similarities between quantum and classical information**. This material will give you a sharper intuition for the quantum information, in addition to the requisite mathematical skills.

### How to use these practice problems

The problems generally increase in difficulty going further down, testing increasingly sophisticated skills. The section titles give an idea of what skill the problems intend to improve. If the problems in a given section are too easy for you, **don't hesitate to skip them and take note that you have mastered the corresponding skill**. If a problem is too challenging, try to figure out what makes the problem challenging and **go back to earlier problems** if necessary. 

When hints and answers are provided, try your best to **complete as much of the problem as you can first, then look at the hints and answers** when stuck or to check your work. Sometimes the answers will have further explanation and commentary, which will hopefully add perspective to the problem. Don't worry if you don't understand all of the explanation, or if you did the problem a different way. Hopefully, the hints and answers will add some interactivity to the problems and allow you to **reflect** on the material.

### What is the assumed background for these problems?

The problems here complement the classical information material discussed in the [Single Systems lesson](https://learn.qiskit.org/course/basics/single-systems) of the [Understanding Quantum Information and Computation](https://www.youtube.com/watch?v=0Av89fZenSY) course. The assumed background is a facility with linear algebra, and later problems also ask for mathematical arguments, i.e., proofs. Feel free to try out the problems below and decide whether they are a good fit for you. 

### Learning goals

1. Decide whether or not a given mathematical object can represent a probability vector or stochastic operation.
1. Read and express probabilistic states and operations in both matrix and Dirac representations.
1. Construct an appropriate representation of a physical state or operation as a vector or matrix.
1. Understand the reason for certain choices in mathematical representation of probabilistic phenomena.
1. Reason about general properties of probability vectors and stochastic operations.

## Mathematical Skills Development
<font size = '4'>
The problems in this section help develop the essential skills of quantum information science, particularly learning goals 1-3 of the "Overview" section.
</font>

<br><br>
<b><font size = '6'>Recognizing probability vectors</font></b>
<br><br>
<font size = '3'>
Which of the following vectors represents a valid probabilistic state for a classical system? When deciding, consider the two defining properties of a probability vector:
<br>

1. Is the amplitude for each classical state nonnegative?
1. Is the sum of the amplitudes for each event equal to one?
</font>

<!-- ::: q-block.exercise -->

### 1. Basic example

<!-- ::: q-quiz(goal="Recognizing-probability-vectors-1") -->

<!-- ::: .question -->

$$
\begin{pmatrix}
    \frac{1}{3} \\
    \frac{1}{6} \\
    \frac{1}{2}
\end{pmatrix}
$$

<!-- ::: -->

<!-- ::: .option(correct) -->

Valid

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
The vector is a valid probability vector. Each entry is nonnegative, and all three entries sum to one. This vector can model a system with three possible outcomes.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2. Dirac notation

<!-- ::: q-quiz(goal="Recognizing-probability-vectors-2") -->

<!-- ::: .question -->

$$
    0.25 \vert\text{sitting}\rangle \\
    + 0.50 \vert\text{standing}\rangle \\
    + 0.25 \vert\text{laying down}\rangle
$$

<!-- ::: -->

<!-- ::: .option(correct) -->

Valid

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
Yes, this is a valid probability vector. For example, it could represent a bear in a zoo, with a 25% chance seeing it sitting, 25% chance laying down, and 50% chance standing.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 3. Dead or alive?

<!-- ::: q-quiz(goal="Recognizing-probability-vectors-3") -->

<!-- ::: .question -->

$$\frac{1}{5} \vert\text{alive}\rangle + \frac{3}{5} \vert\text{dead}\rangle$$

<!-- ::: -->

<!-- ::: .option -->

Valid

<!-- ::: -->

<!-- ::: .option(correct) -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    This is not a valid probability vector. If "alive" and "dead" are (unfortunately) the relevant states of this living system, then the possibilities do not add up to the total value of one. It is as if we are missing information about what other states could there be that complete the pie chart, so to speak. Even so, in this case there should be a $1/5 \vert\text{something else}\rangle$, or something of that nature. Bottom line, <b> the probabilities do not add to one</b>.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 4. Lots of pi

<!-- ::: q-quiz(goal="Recognizing-probability-vectors-4") -->

<!-- ::: .question -->

$$\frac{\pi}{4}\begin{pmatrix}
1 \\
0\\
0\\
0
\end{pmatrix} + \frac{\pi}{4}\begin{pmatrix}
0 \\
0\\
1\\
0
\end{pmatrix} + (1-\frac{\pi}{2}) \begin{pmatrix}
0 \\
1 \\
0 \\
0
\end{pmatrix}$$

<!-- ::: -->

<!-- ::: .option -->

Valid

<!-- ::: -->

<!-- ::: .option(correct) -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    One of the entries of this vector is negative, with value $1-\frac{\pi}{2}$. This represents a nonsensical value for a likelihood, and so this cannot be a probability vector.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 5. Lots of parentheses

<!-- ::: q-quiz(goal="Recognizing-probability-vectors-5") -->

<!-- ::: .question -->

$$\frac{1}{2} \times \left[\begin{pmatrix}
2 \\
-1
\end{pmatrix} + \begin{pmatrix}
-1 \\
2
\end{pmatrix}\right]$$

<!-- ::: -->

<!-- ::: .option(correct) -->

Valid

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    Despite the complicated appearance, this vector is a probability vector. In fact, with some simplification it reduces to the simple vector $(1/2, 1/2)$. See if you can show this!
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 6. A valid bit?

<!-- ::: q-quiz(goal="Recognizing-probability-vectors-6") -->

<!-- ::: .question -->

$$(3 \vert0\rangle + 4 \vert1\rangle) \times \frac{1}{7}$$

<!-- ::: -->

<!-- ::: .option(correct) -->

Valid

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    This is a valid probability vector. The probabilities of 0 and 1 are 3/7 and 4/7, respectively, which we see by distributing the 1/7 factor.
</details>

<!-- ::: -->

<!-- ::: q-block -->

### Reflection

<font size = '3'>
It is easy to write and use computer code which will do many of the exercises above for you. What is the value of doing these exercises yourself, if any? How will this help you learn quantum information science?
</font>

<!-- ::: -->

<br>

<font size = '6'><b>Recognizing processes & transformations</b></font>
<br><br>
<font size = '3'>
Which of the following represents a valid transformation of a classical, probabilistic system? In particular, decide whether the given process is
<br>

* **deterministic**, meaning classical states are mapped to other classical states,
<br><br>
* **probabilistic**, meaning some classical states are mapped to probabilistic mixtures of classical states, or
<br><br>
* **invalid**, so that it does not preserve probability vectors.
</font>

<!-- ::: q-block.exercise -->

### 1. Valid, or (c)not?

<!-- ::: q-quiz(goal="recognizing-processes-1") -->

<!-- ::: .question -->

$$\begin{pmatrix}
    1 & 0 & 0 & 0 \\
    0 & 1 & 0 & 0 \\
    0 & 0 & 0 & 1 \\
    0 & 0 & 1 & 0 
\end{pmatrix}$$

<!-- ::: -->

<!-- ::: .option(correct) -->

Deterministic

<!-- ::: -->

<!-- ::: .option -->

Probabilistic

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    This map is deterministic. Each standard basis vector is mapped to another standard basis vector, and standard basis vectors represent definitive knowledge of a system's state. 
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2. All the same

<!-- ::: q-quiz(goal="recognizing-processes-2") -->

<!-- ::: .question -->

$$\begin{bmatrix}
    1/3 & 1/3 & 1/3 \\
    1/3 & 1/3 & 1/3 \\
    1/3 & 1/3 & 1/3 
\end{bmatrix}$$

<!-- ::: -->

<!-- ::: .option -->

Deterministic

<!-- ::: -->

<!-- ::: .option(correct) -->

Probabilistic

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
This matrix is very much probabilitic. In fact, it completely garbles any input state, transforming it into the uniformly mixed state. Try a few different input vectors and see for yourself!
    
You might notice we used square brackets $[\cdot]$ to surround the matrix, instead of the rounded brackets $(\cdot)$ used in most of these exercises. We hope to emphasize that you will encounter both notations, however, there is no fundamental different in their meaning. Just different conventions for the same thing.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 3. Heads or...?

<!-- ::: q-quiz(goal="recognizing-processes-3") -->

<!-- ::: .question -->

$$\vert\text{heads}\rangle\langle\text{heads}\vert$$

<!-- ::: -->

<!-- ::: .option -->

Deterministic

<!-- ::: -->

<!-- ::: .option -->

Probabilistic

<!-- ::: -->

<!-- ::: .option(correct) -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
This matrix is not a valid stochastic transformation. Since there is a "heads" state, from our experience with coins we would infer a "tails" state also exists. But the vector $\vert \text{tails}\rangle$ will get mapped to zero (meaning the zero vector), which is not a valid probability vector. This shows $\vert \text{heads}\rangle\langle \text{heads}\vert$ is not a valid map on probability vectors.
    
If you answered "deterministic", and assumed only one state, don't fret! The point is you're now thinking about the conditions and assumptions going into these answers, which is hopefully valuable. 
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 4. A row of ones.

<!-- ::: q-quiz(goal="recognizing-processes-4") -->

<!-- ::: .question -->

$$\begin{pmatrix}
    0 & 0 & 0 \\
    1 & 1 & 1 \\
    0 & 0 & 0
\end{pmatrix}$$

<!-- ::: -->

<!-- ::: .option(correct) -->

Deterministic

<!-- ::: -->

<!-- ::: .option -->

Probabilistic

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    This matrix is stochastic and deterministic. Each classical state gets mapped to a particular classical state $(0, 1, 0)$. In fact, <i>every</i> probability vector gets mapped to this state. See for yourself! In the context of computing, we could view this as a "reset" operation. Why is that?
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 5. More Dirac

<!-- ::: q-quiz(goal="recognizing-processes-5") -->

<!-- ::: .question -->

$$\vert 0 \rangle\langle 0 \vert + \vert 0 \rangle\langle 1 \vert/2 + \vert 1 \rangle\langle 1 \vert /2$$

<!-- ::: -->

<!-- ::: .option -->

Deterministic

<!-- ::: -->

<!-- ::: .option(correct) -->

Probabilistic

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    As written, it is hard to tell what kind of operation this is. We could make things easier by expressing the operator in matrix form. Another option is to regroup the second term to better understand what the basis vectors $\vert 0 \rangle$ and $\vert 1\rangle$ get mapped to.
    
$$\vert 0 \rangle\langle 0 \vert + \left(\frac{1}{2} \vert 0 \rangle + \frac{1}{2} \vert 1\rangle \right)\langle 1\vert
$$

In this form, it is easier to see that both $\vert 0\rangle$ and $\vert 1\rangle$ are mapped to probability vectors. Hence any mixture of 0 and 1 will also get mapped to a probability vector. However, while 0 gets mapped to 0, 1 gets mapped to a linear combination. Therefore this operation is not deterministic. This means it is probabilistic.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 6. In words

<!-- ::: q-quiz(goal="recognizing-processes-6") -->

<!-- ::: .question -->

There are four coins on the table. Any coins you see heads up, you flip exactly once. You leave alone any coins which are tails.

<!-- ::: -->

<!-- ::: .option -->

Deterministic

<!-- ::: -->

<!-- ::: .option(correct) -->

Probabilistic

<!-- ::: -->

<!-- ::: .option -->

Invalid

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    This is a probabilistic transformation. The key piece here is that the heads coins are "flipped", introducing randomness. Even though some configurations lead to deterministic outcomes, such as all the coins being tails, a map is random if it introduces randomness for <em>some</em> configuration.
</details>

<!-- ::: -->

<font size = '6'><b> Physical scenarios to probability vectors </b></font>
<br><br>
<font size = '3'>
For each of the following, identify the probability vector that most appropriately models the situation.
</font>

<!-- ::: q-block.exercise -->

### 1. Dice games

A standard six-sided die is accidentally rolled under the table in a game of [Yahtzee](https://en.wikipedia.org/wiki/Yahtzee).

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>

$$ \begin{pmatrix}
1/6 \\ 1/6 \\ 1/6 \\ 1/6 \\ 1/6 \\ 1/6
\end{pmatrix}
$$

Each outcome of a roll of a (fair) die is equally likely, and since there are six sides each possible roll has a corresponding probability of $1/6$. 
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2. Seeing a coin flip

<font size = '3'>
You flip a coin high in the air and it lands on the ground in front of you. Looking down, you see that the resulting flip is heads.
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>

$$ 1 \vert \text{heads}\rangle + 0 \vert \text{tails}\rangle = \vert \text{heads}\rangle
$$ 

Since you’ve seen that the coin is heads up, the process by how it got there (i.e., the flip) is irrelevant. Again, we see that our descriptions in probability often reflect a state of knowledge, rather than objective reality.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 3. Not seeing the coin. 

<font size = '3'>
In the previous scenario, your friend is sitting in the same room, and loses sight of the coin as it hits the ground. Is the probabilistic state the same as for part 2? Why or why not?
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>

$$ \frac{1}{2} \vert \text{heads}\rangle + \frac{1}{2} \vert \text{tails}\rangle
$$ 

   The probabilities in the state vector represent the lack of knowledge of the observer, in this case your friend. From their perspective, without seeing the result, heads or tails is equally likely (assuming a fair coin). Notice that we are using probabilities to model <i>subjective</i> rather than objective reality, hence there is no surprise that you and your friend have different vectors representing your probabilistic states.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 4. The waiter and a die.

<font size = '3'>
A standard die has “1” face up on a silver platter, then is covered with a cloche (a silver cover used in fancy restaurants). A waiter picks up the covered tray and walks towards the expectant customer. The waiter (being quite graceful) takes pains to not disturb the die. However, he thinks there is about a 12% percent change that the die had flipped along one of the four base edges. Before the cloche is lifted, what is the probabilistic state of the die?
</font>
    
<details>
    <summary><u><font size = '2'>Hint 1</font></u></summary>
    To solve this problem, we need to know a bit about the layout of a standard die. Look this up and determine how this will help you solve the problem
</details>
<details>
    <summary><u><font size = '2'>Hint 2</font></u></summary>
    The "1" side is opposite the "6" side on a standard die. Since the die only flips along one edge at most, there’s no way the six side comes up. This entry will have zero chance.
</details>
<details>
    <summary><u><font size = '2'>Hint 3</font></u></summary>
    There’s a 12% chance the die flips, so there is a 88% chance the die stays on the "1" side.
</details>
<details>
    <summary><u><font size = '2'>Hint 4</font></u></summary>
    The remaining 12% must get divided equally amongst the remaining four possibilities, since they are all equally likely to occur. Each has probability 12%/4 = 3%.
</details>
<details>
    <summary><u><font size = '2'>Final answer</font></u></summary>
    Ordering the die sides in the natural way, the probability vector is given by

$$\begin{pmatrix}
0.88 \\ 0.03 \\ 0.03 \\ 0.03 \\ 0.03 \\ 0
\end{pmatrix}.
$$

<br>
We can (and should) check that this is indeed a valid probability vector.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 5. The waiter and a cake.

<font size = '3'>
Perhaps a more realistic take on the previous exercise: you are a waiter delivering a slice of cake as dessert to your table. The cake slice is approximately cubical, plain on the sides but frosted on top. You really hope the cake stays upright, but there is a chance that the cake could fall onto one of its sides, ruining the presentation. Even worse, the cake could flip over completely, frosting down. You try your best to avoid these situations as you carefully walk towards eager customer. 

a. What are the relevant states of the system, and how many are there? Only focus on the distinctions in the cake orientation that matter to you, the waiter.
</font>
<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    There are three realistic possibilities we care about: the cake stays upright, the cake flips on one side, and the cake flips all the way upside down. Notice we don’t care to distinguish between the different sides of the cake. Although we tried to emphasize this point by making the cake like a cube, this wasn’t strictly necessary.
</details>
<br><br>
<font size = '3'>
b. On your way, you get bumped a couple times by passing customers. As you nervously lift the cover (in front of your table, no less!), you think to yourself that there must be a 20% chance the cake flipped along one edge, and a 10% chance the cake is upside down. What is the probabilistic state of the cake prior to taking off the cover?
</font>
    
<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
	As a result of part a, there are only three states we need to assign probabilities to. There is a 10% odds the cake is upside down and a 20% chance it is on its side. Since there is only one other relevant state, upright, it must have probability 70%. Thus, we might express the state as
$$
0.7\vert\text{upright}\rangle+0.2\vert \text{side}\rangle+0.1\vert \text{upsidedown} \rangle
$$
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 6. Bent coins at a carnival

<font size = '3'>
At a strange carnival game, two coins are in front of you. One of which you know to be fair (50-50), the other you see is bent, in such a way that you know (for whatever reason) that the odds of heads are 60%. The gamemaster takes the coins behind her back, shuffles them, and holds one in each hand, closed fist, asking you to choose. You choose the right hand, and the gamemaster quickly flings that coin in the air, with rapid spin such that you cannot tell whether the coin is bent or not. The gamemaster snatches the coin out of the air presses it against the back of her hand, covered. Represent your knowledge of the state (heads/tails) of the coin.
</font>
    
<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    If we knew the coin flipped was the fair coin, the probability vector would be $\begin{pmatrix}0.5 \\ 0.5\end{pmatrix}$. If we knew it was the unfair coin, the state would $\begin{pmatrix}0.6 \\ 0.4\end{pmatrix}$. We have no idea which of these two situations we’re in, and each occurs with probability 1/2. To deal with our uncertainty, we can average over our ignorance. We weight each probability vector by its likelihood $1/2$, which gives us
    
$$
1/2 \begin{pmatrix}0.5 \\ 0.5\end{pmatrix}+1/2 \begin{pmatrix}0.6 \\ 0.4\end{pmatrix}= \begin{pmatrix}0.55 \\ 0.45\end{pmatrix}.
$$
    
Check that this answer makes sense intuitively. What would change if we were, say, 70% sure that the fair coin was flipped?
</details>

<!-- ::: -->

<font size = '6'><b> Expressing physical transformations as matrices </b></font>
<br><br>
<font size = '3'>
For each of the following scenarios, identify the linear map, in matrix representation, that most appropriately models the situation.
</font>

<!-- ::: q-block.exercise -->

### 1. DnD randomness

Rolling a fair, four-sided (tetrahedral) die in a game of [Dungeons and Dragons](https://en.wikipedia.org/wiki/Dungeons_%26_Dragons).

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    In matrix representation, the state is given by

 $$
    \begin{pmatrix}
        1/4 & 1/4 & 1/4 & 1/4 \\
        1/4 & 1/4 & 1/4 & 1/4 \\
        1/4 & 1/4 & 1/4 & 1/4 \\
        1/4 & 1/4 & 1/4 & 1/4
    \end{pmatrix}
$$
<br>
    No matter what the initial state of the die, it needs to end up in the uniformly mixed state. Thus, every column must be a uniformly mixed state, and since there are four outcomes, all entries must be $1/4$. 
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2. Resetting a coin.

<font size = '3'>
A coin is picked up and placed on the table heads up.
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>

$$\begin{pmatrix}
        1 & 1 \\
        0 & 0
    \end{pmatrix}.
$$
<br>
    No matter the input state, the output must be “heads." Notice I’ve assumed $\vert \text{heads}\rangle =\begin{pmatrix} 1 \\
    0 \end{pmatrix}$. What would the matrix representation be if instead $\vert\text{tails}\rangle=\begin{pmatrix} 1 \\
    0 \end{pmatrix}$?
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 3. Lights!

<font size = '3'>
A light switch is switched.
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>

$$\begin{pmatrix}
        0 & 1 \\
        1 & 0
    \end{pmatrix}.
$$
<br>
    In computing, this is equivalent to the NOT operation.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 4. "Roll" the dice.

<font size = '3'>
A die is carefully rolled along the 1-6 axis of the die, like a wheel, so that only 4 of the sides are circulating.
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
We must make use of some properties of standard dice. The problem says the “1-6 axis” because the two faces 1 and 6 are opposite one another. So really, we are mixing all the other states, without knowledge of which one we end up with. We just know that each face has an equally likely outcome of $1/4$. This leads to the map
   $$\begin{pmatrix}
       0 & 0 & 0 & 0 & 0 & 0 \\
       1/4 & 1/4 & 1/4 & 1/4 & 1/4 & 1/4 \\
       1/4 & 1/4 & 1/4 & 1/4 & 1/4 & 1/4 \\
       1/4 & 1/4 & 1/4 & 1/4 & 1/4 & 1/4 \\
       1/4 & 1/4 & 1/4 & 1/4 & 1/4 & 1/4 \\
       0 & 0 & 0 & 0 & 0 & 0 \\
   \end{pmatrix} = \frac{1}{4} \; \begin{pmatrix}
   0 \\
   1 \\
   1 \\
   1 \\
   1 \\
   0
   \end{pmatrix} \begin{pmatrix}
       1 & 1 & 1 & 1 & 1 & 1
   \end{pmatrix}.
   $$
</details>

<!-- ::: -->

![FanDial.png](attachment:FanDial.png)

<!-- ::: q-block.exercise -->

### 5. Fan twist.

<font size = '3'>
A knob for a fan has four settings as shown below. You attempt to turn one setting to the right.
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    Let's order the states from left to right: "off", "low", "mid', "high". Turning to the right is then given by 
    
$$
    \begin{pmatrix}
        0 & 0 & 0 & 0 \\
        1 & 0 & 0 & 0 \\
        0 & 1 & 0 & 0 \\
        0 & 0 & 1 & 1
    \end{pmatrix}
$$
<br>
    The "high" state is left unchanged, becuase the knob cannot be rotated any further from that position. This is a deterministic operation: performing this operation on a known fan state leads to another known state.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 6. Shake a die.

<font size = '3'>
Softly "kicking" a die, so that there is an 80% chance it flips over exactly one edge, and a 20% chance it does not. (Hint, you may want to look at the configuration of a standard die!)
</font>

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    Each die has four edges, so there is a 20% chance of flipping along each. On a standard die, the opposite sides always add to seven, which we can use to determine the neighbors of each side. Here’s the matrix form (ordering the sides of the die according to its number).

$$\begin{pmatrix}
        1/5 & 1/5 & 1/5 & 1/5 & 1/5 & 0 \\
        1/5 & 1/5 & 1/5 & 1/5 & 0 & 1/5 \\
        1/5 & 1/5 & 1/5 & 0 & 1/5 & 1/5 \\
        1/5 & 1/5 & 0 & 1/5 & 1/5 & 1/5 \\
        1/5 & 0 & 1/5 & 1/5 & 1/5 & 1/5 \\
        0 & 1/5 & 1/5 & 1/5 & 1/5 & 1/5 \\
    \end{pmatrix}
$$
<br>
    Notice this operation is symmetric, so that the transitioning from side $i$ to side $j$ has the same probability as from $j$ to $i$. If we think of this transformation like a chemical reaction, with $i$ and $j$ being two different chemicals transforming into one another, we have a manifestation of the notion of <i>detailed balance</i>. The "equilibrium state" of the process is a uniform mixture of six chemicals,
    $$\begin{pmatrix}1/6 \\ 1/6\\ 1/6 \\ 1/6 \\ 1/6 \\ 1/6\end{pmatrix},$$ 
    and the transitions from one state to the other are of equal rate.
</details>

<!-- ::: -->

## Conceptual Understanding

<font size = '3'>
The problems of this section are intended to develop a conceptual understanding of the material.
</font>

<b><font size = '6'>Definitional problems </font></b>

<font size = '3'>
Knowing terminology by heart is not the end-all goal, but gaining a familiarity with standard language helps make understanding complicated ideas easier down the line. Use the exercises below to get practice with the jargon introduced in the lesson.
</font>

<!-- ::: q-block.exercise -->

### 1. Ordering

<!-- ::: q-quiz(goal="Definitional-problems-1") -->

<!-- ::: .question -->

If two operations can be performed on a system, in any order, without affecting the resulting state, the two operations are said to 
    
<!-- ::: -->

<!-- ::: .option(correct) -->

commute.

<!-- ::: -->

<!-- ::: .option -->

completely overlap.

<!-- ::: -->

<!-- ::: .option -->

distribute.

<!-- ::: -->

<!-- ::: .option -->

associate.

<!-- ::: -->

<!-- ::: .option -->

be equal.

<!-- ::: -->

<!-- ::: -->

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2. The notation of quantum 

<!-- ::: q-quiz(goal="Definitional-problems-2") -->

<!-- ::: .question -->

The notational convention, frequently used in quantum physics, of writing vectors as a symbol or string of symbols, sandwiched between a $\vert$ and $\rangle$, is known as
<!-- ::: -->

<!-- ::: .option -->

Einstein notation.

<!-- ::: -->

<!-- ::: .option -->

matrix representation.

<!-- ::: -->

<!-- ::: .option -->

operator expressions.

<!-- ::: -->

<!-- ::: .option(correct) -->

Dirac notation.

<!-- ::: -->

<!-- ::: -->

<!-- ::: -->

<b><font size = '6'>Flash cards</font></b>

<font size = '3'>
For each of the terms below, think about its meaning and as many associations as you can. Then click on the word to see a provided definition. 
</font>

<!-- ::: q-block.reminder -->

### Classical state set

<details>
    <summary><u><font size = '3'>_</font></u></summary>
    The basic configurations a physical system can take on. These configurations are considered independent in the sense that, loosely speaking, a system that is in a definite classical state cannot be in more than one such state. 
</details>
<!-- ::: -->

<!-- ::: q-block.reminder -->

### Probability vector

<details>
    <summary><font size = '3'>_</font></summary>
    A representation of the state of a system as a (column) vector, whose entries represent probabilities of certain classical states.
</details>
<!-- ::: -->

<!-- ::: q-block.reminder -->

### Classical

<details>
    <summary><font size = '3'>_</font></summary>
    Referring to physics as known to humanity prior to the 20th century, often used in contrast to “quantum."
</details>
<!-- ::: -->

<!-- ::: q-block.reminder -->

### Deterministic

<details>
    <summary><font size = '3'>_</font></summary>
    Having no element of randomness, being completely known or specified. 
</details>
<!-- ::: -->

<!-- ::: q-block.reminder -->

### Stochastic

<details>
    <summary><font size = '3'>_</font></summary>
    Having an element of randomness or unpredictability, especially when referring to processes or transformations. 
</details>
<!-- ::: -->

<!-- ::: q-block.reminder -->

### Stochastic matrices

<details>
    <summary><font size = '3'>_</font></summary>
    A matrix whose columns are probability vectors. Equivalently, one which transforms any probability vector into another probability vector. Stochastic matrices are a good representation of random processes found in the real world.
</details>
<!-- ::: -->

<!-- ::: q-block.reminder -->

### Standard basis vectors

<details>
    <summary><font size = '3'>_</font></summary>
    Column vectors with a single nonzero entry, set equal to one. The collection of these vectors forms an orthonormal basis, and perhaps the simplest one.
</details>
<!-- ::: -->

<b><font size = '6'>Comprehension true or false</font></b>

<font size = '3'>
Answer the following true-or-false questions related to concepts in classical probability theory. 
</font>

<!-- ::: q-block.exercise -->

### 1. Columns

<!-- ::: q-quiz(goal="comprehension-true-or-false-1") -->

<!-- ::: .question -->

If a matrix, with nonnegative entries, has columns with sum to one, the matrix is stochastic.

<!-- ::: -->

<!-- ::: .option(correct) -->

True

<!-- ::: -->

<!-- ::: .option -->

False

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    The statement is correct. Stochastic matrices are exactly those matrices whose columns are probability vectors.
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2. Zero states

<!-- ::: q-quiz(goal="comprehension-true-or-false-2") -->

<!-- ::: .question -->

The simplest possible classical system is one that possesses zero states.

<!-- ::: -->

<!-- ::: .option -->

True

<!-- ::: -->

<!-- ::: .option(correct) -->

False

<!-- ::: -->

<!-- ::: -->

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    The answer is false. Any physical system has at least one state. If you don't believe me, try to find a counterexample. Even "nothing" has one state!
</details>

<!-- ::: -->

<b><font size = '6'>Discussion questions</font></b>

<font size = '3'>
Consider the following open-ended questions to reflect on the classical probability material you learned. These are great questions to discuss with peers, or to write about on your own.
</font>

<!-- ::: q-block -->

### 1. Subjective quantum physics?
We saw several situations where the probability vector describing a classical system could vary depending on the observer. This might suggest to you that these descriptions are *subjective*, as they vary from observer to observer. 

> a. Do you find this situation problematic, since science is often about finding "objective" truths?
    
> b. Later on, when we discuss quantum information, we will see that probabilities still come about when a system is "measured" (looked at). Do you think this implies quantum physics has subjective aspects as well? Why or why not?

<!-- ::: -->

<!-- ::: q-block -->
    
### 2. Why linear?

Why should random physical processes correspond mathematically to *linear* transformations? Support your arguments with examples. Can you think of realistic processes which are better represented by *nonlinear* maps? (A nonlinear map is any function that is not linear. A simple example is the map sending every vector in the space to a particular nonzero vector $v_0$.)

<!-- ::: -->

<!-- ::: q-block -->

### 3. Column vectors? Why not row vectors?

We've been using column vectors to represent our probabilistic states. Why not row vectors? What is essential and what is conventional with regards to these choices? Could we reexpress 'kets' as row vectors and 'bras' as column vectors? How would that change the math?

<!-- ::: -->

<!-- ::: q-block -->

### 4. Non-commuting operators is the norm

Give three different scenarios in which processes or transformations do not commute in real life. As an example, if I pick up an apple, then throw, that has a different result than throw, then pick up an apple. Thus, it is not at all surprising that linear transformations, in general, do not commute.

Give three scenarios from real life in which processes or transformations do commute. Do you notice any commonalities and differences with respect to the non-commuting case? Can you come up with heuristic rules to help determine when operations commute?

<!-- ::: -->

## Diving Deeper

<b><font size = '6'>Measurement as linear operations</font></b> 

We’ve considered measurement already in a very simple manner: a measurement yields as result one of the classical states in $\Sigma$, with probability proportional to the coefficient, and then returns the system to a classical state |a> corresponding to the result. This procedure may seem so obvious as to not require any further analysis. However, considering our frequent application of linear algebra, it might appear odd that the action of “measurement” does not correspond to a linear operation, i.e., multiplication by some “measurement matrix” $M$.

<!-- ::: q-block.exercise -->

### 1. 

Argue why there does not exist a single matrix $M$ which models the outcome of a measurement for, say, the state of a coin. 

<details>
    <summary><u><font size = '2'>Hint</font></u></summary>
    Matrix multiplication by a given probability vector yields a definite outcome. How many outcomes can a measurement produce?
</details>
<details>
    <summary><u><font size = '2'>Suggested argument</font></u></summary>
    Our matrix $M$ should produce either heads or tails as a single outcome, but measuring a coin could yield either of them. Given an initial state $\vert\psi \rangle$, $M\vert \psi \rangle$ only can produce one output. This is insufficient for our purposes. 
</details>

<!-- ::: -->

Having ruled out a single linear map (matrix) $M$ to describe a measurement, perhaps we can still express a measurement process in terms of multiple linear maps $\{M_i\}$ with as few complications as possible. Our constructions will carry over straightforwardly to quantum information as well, and become indispensable in that context.

<!-- ::: q-block.exercise -->

### 2. 

Before moving on, think about how you might do this. That is, can we generate a set of matrices, together with a well-defined procedure, that gives both the probability of measurement as well as the associated outcomes. See how much of the following text you can anticipate, and if there are differences, see if they can be reconciled. Be creative and think!

<!-- ::: -->

Since a single matrix can’t give all possible measurement outcomes, we might instead have a linear operator $M_a$ for each measurement outcome $a \in \Sigma$. A seemingly natural requirement might be to have $M_a \vert\psi\rangle = \vert a\rangle$ for any initial probability vector |psi>.

<!-- ::: q-block.exercise -->

### 3. 

Is $M_a$ a stochastic operation? If so, express $M_a$ as a matrix. 

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
Yes, it is stochastic, because it maps any probability vector $\vert\psi\rangle$ to another probability vector $\vert a \rangle$. To write it in matrix form, we first realize that the only row that can have nonzero elements is the $a$ th row (that is, the row corresponding to state $a$ in whatever ordering we chose). Moreover, since each column must have entries nonnegative and adding to one, the $a$ th row must be filled with ones. The $n\times n$ matrix we get, with $n = \vert \Sigma \vert$, is as follows.
    
$$\begin{pmatrix}
    0 & 0 & \cdots & 0 & 0 \\
    \vdots & \vdots & \ddots & \vdots & \vdots \\
    1 & 1 & \cdots & 1 & 1 \\
    \vdots & \vdots & \cdots & \vdots & \vdots \\
    0 & 0 & \cdots & 0 & 0 \\
\end{pmatrix}
$$
    
We’ve seen before that this operation corresponds to a "reset" type operation. Are the operations RESET and MEASURE the same thing physically?
</details>

<!-- ::: -->

The problem with this formulation is that we have no information about how likely $M_a$ was applied (i.e., how likely result $a$ was measured). For example, if $M_h$ is measuring heads on a coin, $M_h \vert\text{tails}\rangle = \vert\text{heads}\rangle$. Yet we know in our heart that there is some nonsense here, since we should never measure heads for state $\vert\text{tails}\rangle$ in the first place.

Let’s think back to our original procedure. The number that gave the likelihood for outcome $a$ is the coefficient in front of the classical state $\vert a\rangle$. This is the component of $\vert\psi\rangle$ along $\vert a\rangle$. We can picture $\vert \psi\rangle$ and $\vert a\rangle$ as two arrows like so.

![pic1.png](attachment:pic1.png)

The component is just the “length of $\vert\psi\rangle$ along $a$”.

![pic2.png](attachment:pic2.png)

This is also like a “shadow” of $\vert\psi\rangle$ along $\vert a\rangle$, thinking of $\vert\psi\rangle$ as casting a shadow on the “ground” defined by $\vert a \rangle$. The *projection of $\vert\psi\rangle$ along $a$* is the vector proportional to $\vert a\rangle$ whose length is the component along $\vert a\rangle$. We might call this new vector $\vert\text{proj}_a \psi \rangle$. In pictures:

![pic3.png](attachment:pic3.png)

<!-- ::: q-block.exercise -->

### 4.

a. Let $P_a$ be the operation which sends any input $\vert \psi \rangle$ to its projection $\vert \text{proj}_a \psi \rangle$ along vector $\vert a \rangle$. Argue that $P_a$ is a linear operator, and write it in matrix form. For a coin, what is $P_\text{heads}$? $P_\text{tails}$?

<details>
    <summary><u><font size = '2'>Answer with explanation</font></u></summary>
    <b>Proof of linearity:</b>
Let $\vert\psi\rangle$ and $\vert\phi\rangle$ be two probability vectors. Written out in terms of the classical states, we have
$$\vert\psi\rangle = \sum_{b\in\Sigma} \psi_b \vert b\rangle \qquad \vert\phi\rangle = \sum_{b\in\Sigma} \phi_b \vert b \rangle.
$$
Here, $\psi_b$ and $\phi_b$ are the probabilities associated with each classical state $b$, i.e., the components along the vector $\vert b\rangle$. An arbitrary linear combination $\lambda \vert \psi\rangle + \eta \vert\phi\rangle$ of the two vectors can be expressed in terms of the standard basis as $\sum_{b\in\Sigma} (\lambda \psi_b + \eta \phi_b) \vert b \rangle$. Let's apply $P_a$ to this combination. The component of the linear combination that lies along state $\vert a \rangle$ is given by $\lambda \psi_a + \eta \phi_a$. Hence, according to our definition of $P_a$,
    
\begin{align}
    P_a (\lambda \vert \psi\rangle + \eta \vert \phi \rangle) &= (\lambda \psi_a + \eta \phi_a) \vert a \rangle \\
    &= \lambda (\psi_a \vert a \rangle) + \eta (\phi_a \vert a \rangle).
\end{align}

On the other hand, $P_a \vert \psi\rangle = \psi_a \vert a \rangle$, and similarly for $P_a \vert \phi \rangle$. Thus, looking at the previous line above, we have 
$$\lambda (\psi_a \vert a \rangle) + \eta (\phi_a \vert a \rangle) = \lambda P_a \vert \psi\rangle + \eta P_a \vert \phi \rangle.
$$
This shows $P_a (\lambda \vert \psi\rangle + \eta \vert \phi \rangle) = \lambda P_a \vert \psi\rangle + \eta P_a \vert \phi \rangle$, which is exactly the condition for linearity. Thus, $P_a$ is a linear operator.
<br>    
<b>Matrix form:</b>
To obtain the matrix form, we can observe how $P_a$ acts on standard basis vectors $\vert b \rangle$ for $b \in \Sigma$. We see that $P_a \vert b \rangle = \delta_{ab} \vert a \rangle$, meaning the result is zero unless $a = b$, in which case it is $\vert a \rangle$. This implies only the $a$th diagonal entry will be nonzero, equal to one, and the rest will be zero.
    
Try to see if you can write down a $2 \times 2$ matrix representing $P_\text{heads}$ for the case of a coin!
</details>
<br>
b. Are projections $P_a$ also stochastic? 

<!-- ::: -->

If $P_a$ is the projection along $a$, then $P_a \vert \psi\rangle = \vert a\rangle \psi_a$, where $\psi_a$ is the component. The right hand side therefore contains information about both the resulting state ($\vert a\rangle$) after applying $P_a$, as well as the associated probability ($\psi_a$) of applying $P_a$ in the first place.
We’ve arrived at the concept of projective measurement for classical information. The collection of projection operators $\{P_a \vert a\in\Sigma\}$ give full information about a measurement of any chosen state. 

Again, this framework may seem excessive, since we've been able to "calculate" measurement probabilities simply by looking at the coefficient for the classical state. The situation becomes more complex in the quantum setting, as we will see in later lessons.

<b><font size = '6'>Decomposing probabilistic operations in terms of deterministic ones</font></b> 

<!-- ::: q-block.exercise -->

### 1. 
<font size = '3'>
Consider the following stochastic matrix.

$$ M = \frac{1}{2} \begin{pmatrix}
    1 & 1 \\
    1 & 1
    \end{pmatrix}
$$

Interpret $M$ physically, in words, and then decompose $M$ into a linear combination of deterministic operations in two different ways. Assign physical interpretations to each of them. Combine these two distinct decompositions to produce yet another one. How many ways does it seem we can do this?
</font>

<details>
    <summary><u><font size = '2'>Physical interpretation</font></u></summary>
$M$ completely randomizes any binary state. It could represent the flipping of a coin.
</details>
<br>
<details>
    <summary><u><font size = '2'>Decomposition 1</font></u></summary>
One way we can go about this is "filling" the top row with one matrix, then "filling" the bottom row with the other. This looks like   

$$
    \frac{1}{2}\begin{pmatrix}
        1 & 1 \\
        1 & 1
    \end{pmatrix} = \frac{1}{2}\begin{pmatrix}
        1 & 1 \\
        0 & 0
    \end{pmatrix} + \frac{1}{2} \begin{pmatrix}
        0 & 0 \\
        1 & 1
    \end{pmatrix}
$$
<br>
Each of the two matrices on the right is a deterministic operation (without the 1/2 prefactor): it corresponds to "resetting" the object to one of the two states, e.g. 0 or 1. It does one reset or the other with probability $1/2$. 
</details>
<br>
<details>
    <summary><u><font size = '2'>Decomposition 2</font></u></summary>
Another way we can think about the "coin flip" operation is as follows: leave the coin alone with probability $1/2$, and turn the coin over with probability $1/2$. An equation for this concept is given by
    
$$
    \frac{1}{2} \begin{pmatrix}
        1 & 1 \\
        1 & 1
    \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 
        1 & 0 \\
        0 & 1
    \end{pmatrix} + \frac{1}{2} \begin{pmatrix}

        0 & 1 \\
        1 & 0
    \end{pmatrix}.
$$
   <br>
The first matrix on the right hand side we recognize as the identity, while the second is a NOT operation (you may also see it as a pauli X operation). 
</details>
<br>
<details>
    <summary><u><font size = '2'>Combining decompositions</font></u></summary>
To combine the decompositions we made above, we can use the fact that
    
$$
    M = p M + (1-p) M
$$
<br>
for any $p \in [0,1]$. Though this may seem uninteresting, we can use different representations of $S$ for each part of the right hand side. This gives us
    
$$
    \frac{1}{2} \begin{pmatrix}
        1 & 1 \\
        1 & 1
    \end{pmatrix} = \frac{p}{2}\begin{pmatrix}
        1 & 1 \\
        0 & 0
    \end{pmatrix} + \frac{p}{2} \begin{pmatrix}
        0 & 0 \\
        1 & 1
    \end{pmatrix} + \frac{1-p}{2} \begin{pmatrix} 
        1 & 0 \\
        0 & 1
    \end{pmatrix} + \frac{1-p}{2} \begin{pmatrix}
        0 & 1 \\
        1 & 0
    \end{pmatrix}.
$$
   <br>
As an extra challenge, can you show that <i>any</i> decomposition of $M$ into deterministic operations must take this form?
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 2.

<font size = '3'>
Can a deterministic operation $D$ be written as a sum of (different) deterministic operations? Prove that if $D = p D_1 + (1-p) D_2$ for deterministic operations $D_1, D_2$ and $p \in (0,1)$, then $D = D_1 = D_2$. Summarize the implication of this result: what does this statement mean to you? Can you generalize to sums of <i>more than two</i> operations?
</font>
    
<details>
    <summary><u><font size = '2'>Tip 1</font></u></summary>
If you're unsure how to proceed, try some examples first. What if $D$ is the NOT operation? What's preventing me from writing it as $p D_1 + (1-p) D_2$?
</details>
<br>
<details>
    <summary><u><font size = '2'>Tip 2</font></u></summary>
Another helpful strategy may be to think about the consequences of the opposite case. What if $D$ <i>was</i> able to be written as $p D_1 + (1-p) D_2$? Would that lead to strange consequences? 
</details>
<br>
<details> 
    <summary><u><font size = '2'>Hint 1</font></u></summary>
It seems like an operation such as $p D_1 + (1-p) D_2$ should turn at least some classical states (standard basis vectors) to probabilistic mixtures. If $D_1 \neq D_2$, at least one classical state $\vert a\rangle$ should be sent to two different classical states. That is, $D_1 \vert a\rangle \neq D_2 \vert a \rangle$. Call these two output states $\vert a_1 \rangle$ and $\vert a_2\rangle$, respectively.
</details>
<br>
<details>
    <summary><u><font size = '2'>Hint 2</font></u></summary>
On the other hand, $D\vert a\rangle$ is a standard basis vector by assumption. This would imply $D\vert a \rangle = p \vert a_1 \rangle + (1-p)\vert a_2 \rangle$. What is wrong with this?
</details>
<br>
<details>
    <summary><u><font size = '2'>Full proof</font></u></summary>
Suppose $D$ is a deterministic operation such that $D = p D_1 + (1-p) D_2$, where $p \in (0,1)$ and $D_1$, $D_2$ are deterministic operations. Let's consider the action of $D$ on some classical state $\vert a\rangle$. Since $D, D_1$ and $D_2$ are all deterministic, they must transform $\vert a \rangle$ into another classical state. Let $Da, D_1 a$ and $D_2 a$ be these classical states. Then, in slightly odd but useful notation, $D \vert a \rangle = \vert D a \rangle$, and similarly for $D_1$ and $D_2$. That is,
$$
    \vert D a\rangle = p \vert D_1 a \rangle + (1-p) \vert D_2 a \rangle.
$$  
Recall that the classical state vectors form a linearly independent set. Since, $p \neq 0$ and $p \neq 1$, we must therefore have that, in fact, all of these vectors are the same. That is $\vert D a \rangle = \vert D_1 a\rangle = \vert D_2 a \rangle$. We conclude that $D, D_1$ and $D_2$ all act the same way on the state $\vert a \rangle$. 
    
However, recall that $\vert a \rangle$ was an arbitrary standard basis vector. This means that all three operators act the same on a basis. This directly implies, by linearity, that $D = D_1 = D_2$, as we set out to show. 
</details>
<br>
<details>
    <summary><u><font size = '2'>Interpretation</font></u></summary>
It would be quite strange if a deterministic operation, in our experience, could be somehow given by different deterministic operations that were applied with some probability. This seems like we would inevitably introduce randomness. This is exact what our proof seems to express. 
    
Do you have a different interpretation of this result? Can you support your claim?
</details>

<!-- ::: -->

<!-- ::: q-block.exercise -->

### 3.

Using the result of the above exercise, together with the fact that any stochastic matrix may be expressed as a combination of deterministic operators (as discussed in the text), show that any deterministic operator $D$ cannot be expressed as a linear combination of any stochastic operations. 

<!-- ::: -->

There is a helpful lesson in this, for which I will have to introduce jargon that is used frequently in quantum information theory (and other mathematical disciplines). Let $\mathcal{S}$ denote the set of stochastic matrices. This is a *convex set*, meaning for any two $S_1, S_2 \in \mathcal{S}$, the "chord" given by the set
$$ \{\lambda S_1 + (1-\lambda) S_2 |\lambda \in [0,1]\}
$$
is also in $\mathcal{S}$, i.e., it is a collection of stochastic matrices. The set $\mathcal{D}$ of deterministic operations form a subset of $\mathcal{S}$, and is exactly the set of *extreme points* for $\mathcal{S}$. That means that any element of $\mathcal{D}$ cannot be expressed as a chord between two other stochastic matrices. We essentially showed this above. 

The geometric picture provided by the idea of convex sets is a fruitful one, and will be returned to in later lessons.

<!-- ::: q-block.exercise -->

### Input-output representation of a linear map.

<font size = '3'>
Let $A$ be a linear transformation on a vector space $V$, and suppose $\vert v_1 \rangle, \vert v_2 \rangle \dots \vert v_n \rangle$ are an orthonormal basis on $V$. 
</font>

### 1.

<font size = '3'>
Show that

$$ A = \sum_{i=1}^n \vert A v_i \rangle \langle v_i\vert,
$$

where $\vert A v_i\rangle$ is just alternative notation for $A \vert v_i \rangle$. Verify this equality by acting the right hand side above on a particular basis vector $\vert v_k \rangle$. We might call this expression for the operator $A$ the <i>input-output representation</i>. 
</font>

### 2.

<font size = '3'>
Use this result to find a representation of the identity operation $I$ in terms of the basis vectors. This is sometimes called a "resolution of the identity". The resolution of the identity is useful in a variety of contexts.
</font>

### 3.

<font size = '3'>
In the special case where the basis vectors are classical states $\vert a \rangle$ for $a \in \Sigma$, and when $A$ is a deterministic operation, what form does the input-output representation take?
</font>
<!-- ::: -->

## Where should I go next?

Do you feel at ease with much of the material in the previous sections, especially the Mathematical Skills and Conceptual Understanding? You are likely ready to begin your studies of information in *quantum* systems. As you move into quantum material, think back periodically to the work you did in this classical section. Try to observe key similarities and differences between the quantum and classical versions of information theory. Those connections will provide a strong basis as you continue your study of quantum information and computation!