## Supplementary Information: VQT

## Constructing the Variational Quantum Thermalizer

Now, that we understand what we are actually making, let's understand how the paper goes about constructing the VQT. We said earlier we will begin the process by creating a simple mixed state. Now, let's actually understand how we will create this simple state. Basically, the idea is to pick some $\rho_\theta$ parametrized by $\theta$ such that:

<br>
$$\rho_\theta \ = \ \frac{1}{\mathcal{Z}_\theta} e^{- K_\theta}$$
<br>

Where $\mathcal{Z}_\theta$ is simply the partition function that normalizes our density matrix, $\mathcal{Z}_{\theta} \ = \ \text{Tr}(e^{- K_\theta})$. $K_\theta$ is just some simple Hamiltonian. 

Our motivation for constructing our simple mixed state in this fashion is that it is similar in structure to that of the thermal state, which is the exponential of a Hamiltonian. In addition, recall that time evolution of a density matrix is given by multiplying it left and right by the unitary that is "doing the evolving". if you're not familiar with this, since we know that evolution of the form $|\psi\rangle \ \rightarrow \ |\psi'\rangle$ is given by $|\psi'\rangle \ = \ U|\psi\rangle$, we know that some density matrix will evolve as:

<br>
$$\rho \ = \ \displaystyle\sum_{i} p_i |\psi_i\rangle \langle \psi_i | \ \rightarrow \ \displaystyle\sum_{i} p_i |\psi_i'\rangle \langle \psi_i' | \ = \ \displaystyle\sum_{i} p_i U |\psi_i\rangle \langle \psi_i | U^{\dagger} \ = \ U \rho U^{\dagger}$$
<br>

Anyways, getting back to $K_{\theta}$, we must consider a method for selecting it. Luckily, the paper gives us a fairly obvious choice for this Hamiltonian:

<br>
$$K_{\theta} \ = \ \displaystyle\sum_{x} \theta_x |x\rangle \langle x|$$
<br>

Where each $|x\rangle$ is a computational basis state of our qubit system. This Hamiltonian is quite simple, it is diagonal in the basis that we are used to operating in, and all we have to do is take our variational parameters and slap them onto the outer products of the different basis states. We can then construct our initial mixed state as follows:

<br>
$$\rho_{\theta} \ = \ \frac{1}{\mathcal{Z}_{\theta}} e^{-K_{\theta}} \ = \ \frac{1}{\mathcal{Z}_{\theta}} \displaystyle\sum_{n \ = \ 0}^{\infty} \ \frac{(-K_{\theta})^n}{n!} \ = \ = \ \frac{1}{\mathcal{Z}_{\theta}} \displaystyle\sum_{n \ = \ 0}^{\infty} \ \frac{(-\sum_{k} \theta_k |k\rangle \langle k|)^n}{n!} \ = \ \frac{1}{\mathcal{Z}_{\theta}} \displaystyle\sum_{n \ = \ 0}^{\infty} \displaystyle\sum_{k} \ \frac{(-\theta_k)^n}{n!} |k\rangle \langle k| \ = \ \frac{1}{\mathcal{Z}_{\theta}} \displaystyle\sum_{k} e^{-\theta_k} |k\rangle \langle k |$$
<br>

This is great! All we have to do is sample from the probability distribution of $k \ \sim \ \frac{1}{\mathcal{Z}_\theta} e^{-\theta_k}$, prepare $|k\rangle$ which is fairly easy, since it is just a computational basis state, pass it through $\hat{U}(\phi)$, which we will discuss soon, and then repeat this process to calculate the expectation value and the cost function. 

#### The Cost Function

Let's talk more about the cost function. We will define our cost function to be the following, which we call the **relative free energy** of our system:

<br>
$$\mathcal{L}(\theta, \ \phi) \ = \ \beta \langle \hat{H} \rangle \ - \ S_\theta \ = \ \beta \ \text{Tr} (\hat{H} \ \rho_{\theta \phi}) \ - \ S_\theta \ = \ \beta \ \text{Tr}( \hat{H} \ \hat{U}(\phi) \rho_{\theta} \hat{U}(\phi)^{\dagger} ) \ - \ S_\theta$$
<br>

This is very similar to the concept of **Helmholtz free energy** from thermal physics, which is given as:

<br>
$$FE \ = \ U \ - \ TS \ = \ U \ - \ \frac{S}{\beta}$$
<br>

Where $H$ is the internal energy of the system, $T$ is the temperature, and $S$ is the entropy. Mapping this to the language of quantum states, we get:

<br>
$$F \ = \ \langle \hat{H} \rangle \ - \ \frac{S_\phi}{\beta}$$
<br>

Where internal energy is mapped to the expectation value of the energy of our quantum system, and the entropy is mapped to the Von Neumann entropy of our state. Multiplying by $\beta$, and we recover our initial loss function, $\mathcal{L}$. Now, let's convince ourselves that the thermal state of $\hat{H}$ with temperature $T \ = \ 1/\beta$ minimizes the loss function $\mathcal{L}$. Consider the **relative entropy** between our two states. This is just an extension of the concept of **Von Neumann entropy** of a quantum state, which is defined as:

<br>
$$S(\rho) \ = \ - \text{Tr} (\rho \log \rho) \ = \ - \displaystyle\sum_{n} p_n \log p_n$$
<br>

Where:

<br>
$$p_n \ = \ \langle n | \rho | n \rangle$$
<br>

Is the probability associated with measurement of the $|n\rangle$ state in the mixed state $\rho$, written as a sum of the eigenbasis formed by the eigenvectors of $\rho$. This is effectively a measure of the "density" of individual pure quantum states in our overall mixed state. With a mixed quantum state where the probabilities $p_n$ are more "spread out" over many pure states, the Von Neumann entropy is higher, whereas if our state (let's call it $\sigma$) is complete pure, then:
<br>
$$S(\sigma) \ = \ \displaystyle\sum_{n} p_n \log p_n \ = \ - p_k \log p_k \ = \ - 1 \cdot \log 1 \ = \ -1 \cdot 0 \ = \ 0$$
<br>
Because by definition of a pure state, one eigenstate has a probability of $1$, while the rest of probability $0$. The entropy is thus zero! In the event that we have a mixed state, each $p_n \ < \ 1$, it follows that $\log p_n$ is negative, making our entropy greater than $0$, thus, our entropy is minimized when we have a pure state!

Getting back on topic, we must conider the relative entropy between two states. This is just the amount of entropy that one state has, with respect to another one. For instance, if I compared the relative entropy of two copies of the same state, I would get a value of $0$, as the entropy of both states is completely the same. You can think about one of the states as our "reference point" (just as the completely pure state is our reference point in the case of normal calculations of entropy). We define this relative entropy as:

$$D(\rho_1 || \rho_2) \ = \ \text{Tr} (\rho_1 \log \rho_1) \ - \ \text{Tr}(\rho_1 \log \rho_2)$$

If we take the relative entropy of some state $\rho$ with respect to our thermal state, we get:

$$D(\rho || \rho_{\text{Thermal}}) \ = \ \text{Tr} (\rho \log \rho) \ - \ \text{Tr}(\rho \log \rho_{\text{Thermal}}) \ = \ \text{Tr} (\rho \log \rho) \ - \ \text{Tr}(\rho \log \rho_{\text{Thermal}}) \ = \ \beta \text{Tr}(\rho \hat{H}) \ - \ S(\rho) \ + \ \log \mathcal{Z}$$

Now, since relative entropy will always be greater than or equal to $0$, we know that this quantity that we have calculated will be $0$ (and thus minimized), when $\rho$ is equal to the thermal state itself. Thus, if we want to variationally learn the thermal state, all we have to do is minimize the quantity $\beta \text{Tr}(\rho \hat{H}) \ - \ S(\rho)$ (we can omit the $\log \mathcal{Z}$, as this is just an overall added constant. This is exactly the loss function that we proposed that we use initially, thus we have shown that minimization of this loss function does in fact equte to $\rho$ being in the desired thermal state! This is almost like a more general version of the variational principle: any thermal state that we prepare will have a greater "free energy" with respect to some Hamiltonian than the desired thermal state, just as the expected energy with respect some Hamiltonian of any pure state will have a greater energy than the ground state!

For fun, let's now calculate the loss function for the exact representation of the thermal state corresponding to $\hat{H}$ and $\beta$:

$$\mathcal{L}_{\text{Thermal}} \ = \ \beta \text{Tr}(\rho_{\text{Thermal}} \hat{H}) \ + \ \text{Tr} ( \rho_{\text{Thermal}} \ \log \rho_{\text{Thermal}} ) \ = \ \beta \text{Tr}\Bigg( \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \hat{H} \Bigg) \ + \ \text{Tr} \Bigg( \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \log \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \Bigg)$$

Where the partition function is given as:

$$\mathcal{Z}_{\beta} \ = \ \text{Tr}(e^{-\beta \hat{H}})$$

Continuing this expansion, we get:

$$\beta \text{Tr}\Bigg( \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \hat{H} \Bigg) \ + \ \text{Tr} \Bigg( \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \log \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \Bigg) \ = \ \frac{1}{\mathcal{Z}_{\beta}} \ \Bigg[ \beta  \text{Tr} \big( \hat{H} \ e^{ -\beta \hat{H}} \big) \ + \ \text{Tr} \Bigg( e^{-\beta \hat{H}} \ \log \frac{e^{-\beta \hat{H}}}{\mathcal{Z}_{\beta}} \Bigg) \Bigg] \ = \ \frac{1}{\mathcal{Z}_{\beta}} \ \Bigg[ \beta  \text{Tr} \big( \hat{H} \ e^{ -\beta \hat{H}} \big) \ + \ \text{Tr} ( e^{-\beta \hat{H}} \ ( \log e^{-\beta \hat{H}} \ - \ \log \mathcal{Z}_{\beta} )) \Bigg]$$

$$\Rightarrow \ \frac{1}{\mathcal{Z}_{\beta}} \ \Bigg[ \beta  \text{Tr} \big( \hat{H} \ e^{ -\beta \hat{H}} \big) \ - \ \beta \text{Tr} \big( e^{-\beta \hat{H}} \hat{H} \big) \ - \ \text{Tr} \big( e^{-\beta \hat{H}} \log \mathcal{Z}_{\beta} \big) \Bigg] \ = \ - \frac{\text{Tr} \big( e^{-\beta \hat{H}} \big) \log \mathcal{Z}_{\beta}}{\mathcal{Z}_{\beta}} \ = \ - \log \mathcal{Z}_{\beta}$$

This makes sense, since substituting it into the original expression for relative entropy gives us a value of $0$, meaning that we have preparred a state equivalent to the thermal state.

Using the Von Neumann entropy as a part of our cost function as actually super convenient, as **entropy remains invariant under unitary transformation**. This essentially means that we don't need to "compute" the entropy of our state after passing it through our ansatz circuit. Instead, we can pick an initial state with an entropy that is very easy to determine. This means that entropy is a function of only our $\theta$ parameters, and is completely independent of our choice of $\phi$! We can prove this fairly easily. Let's consider some density matrix $\rho$, with $\rho \ \rightarrow \ \rho'$ with $\rho' \ = \ U \rho U^{\dagger}$. We compute the entropy of our initial state:

$$S(\rho) \ = \ - \text{Tr} ( \rho \log \rho)$$

And then consider the entropy of our transformed state:

$$S(\rho') \ = \ - \text{Tr} (\rho' \log \rho') \ = \ - \text{Tr} ( U \rho U^{\dagger} \log (U \rho U^{\dagger})) \ = \ - \text{Tr} ( U \rho U^{\dagger} \log \rho) \ = \ - \text{Tr} ( U \rho \log \rho U^{\dagger}) \ = \ - \text{Tr} ( \rho \log \rho U^{\dagger} U) \ = \ - \text{Tr} ( \rho \log \rho) \ = \ S(\rho)$$

Therefore the entropy remains invariant under a unitary transformation, as:

$$\log (U \rho U^{\dagger}) \ = \ \log U \ + \ \log \rho \ + \ \log U^{\dagger} \ = \ \log U \ + \ \log U^{\dagger} \ + \ \log \rho \ = \ \log (U^{\dagger} U) \ + \ \log \rho \ = \ \log \rho$$

We also take advantage of the fact that $\text{Tr}(AB) \ = \ \text{Tr}(BA)$. This can also be proved fairly easily as well:

$$\text{Tr}(AB) \ = \ \displaystyle\sum_{j} AB_{jj} \ = \ \displaystyle\sum_{j, \ k} A_{jk} B_{kj} \ = \ \displaystyle\sum_{j, \ k} B_{kj} A_{jk} \ = \ \displaystyle\sum_{k} BA_{kk} \ = \ \text{Tr} (BA)$$

#### Scalability of the Algorithm

Now, we have outlined a pretty great process for learning thermal states, however, we have yet to address one large issue: **scalability**. At the moment the algorithm scales fairly poorly. Specifically, when we consider the set of parameters $\theta$ that determine our "basic" Hamiltonian, at the moment, we assign some parameter $\theta_j$ to each of the computational basis states. This means that for $n$ qubits, we require $2^n$ parameters! That design scales **incredibly** poorly. However, we can ensure that our $\theta$ parameters scale **linearly** by **parametrizing the latent space**. This esentially means that we will break our initial mixed state into multiple subcomponents, each determined by only one parameter $\theta_j$. If each component of our latent space is given by the mixed state $\rho_j$, then our total initial state will be given as:

$$\rho(\theta) \ = \ \displaystyle\bigotimes_{j} \ \rho_j (\theta_j)$$

Where, for the purposes of our simulations, we will utilize a **factorized latent space**. In addition to breaking our initial mixed state up into smalled parts, we also break apart our initial Hamiltonian, $K_\theta$ into a series of $K_{\theta_j}$ terms, with the total Hamiltonian being the sum of all these terms:

$$K_\theta \ = \ \displaystyle\sum_{j} K_{\theta_j}$$

Remembering the exponential structure of our initial mixed state:

$$\rho_\theta \ = \ \frac{1}{\mathcal{Z}_{\theta}} e^{-K_{\theta}} \ = \ \frac{1}{\mathcal{Z}_{\theta}} e^{-{\sum_j K_{\theta_j}}} \ \Rightarrow \ \bigotimes_{j} \ \frac{1}{\mathcal{Z}_{\theta}} e^{-{K_{\theta_j}}}$$

It becomes a tensor product of the individual subspaces if we define our initial mixed states $\rho_j$ only over the smaller Hilbert space in which each "operates" (for example, the basis $|0\rangle$ and $|1\rangle$). We can now choose a basic form for our parametrized "sub-Hamiltonians":

$$K_{\theta_j} \ = \ \displaystyle\sum_{k} \ \theta_{jk} |k\rangle \langle k|_{j}$$

After we do this, we may re-write our initial state:

$$\rho_{\theta_j} \ = \ \frac{1}{\mathcal{Z}_{\theta_j}} e^{-K_{\theta_j}} \ = \ \frac{1}{\mathcal{Z}_{\theta_j}} \displaystyle\sum_{n \ = \ 0}^{\infty} \ \frac{(-K_{\theta_j})^n}{n!} \ = \ = \ \frac{1}{\mathcal{Z}_{\theta_j}} \displaystyle\sum_{n \ = \ 0}^{\infty} \ \frac{(-\sum_{k} \theta_{jk} |k\rangle \langle k|_j)^n}{n!} \ = \ \frac{1}{\mathcal{Z}_{\theta_j}} \displaystyle\sum_{n \ = \ 0}^{\infty} \displaystyle\sum_{k} \ \frac{(-\theta_{jk})^n}{n!} |k\rangle \langle k|_j \ = \ \frac{1}{\mathcal{Z}_{\theta_j}} \displaystyle\sum_{k} e^{-\theta_{jk}} |k\rangle \langle k |_j$$

So, if we are operating in a Hilbert space with dimension $2$ (we will likely be performing this experiment with qubits, thus this will be the case, however, thr algorithm can be generalized to $d$-level qudits), in the basis of $|k\rangle_j$, we have:

$$\rho_{\theta_j} \ = \ \frac{1}{\mathcal{Z}_{\theta_j}} \begin{pmatrix} e^{-\theta_{j0}} & 0 \\ 0 & e^{-\theta_{j1}} \end{pmatrix} \ = \ \begin{pmatrix} p_j(\theta_{j0}) & 0 \\ 0 & p_j(\theta_{j1}) \end{pmatrix}$$

This form suggests to us that we may be able to make the algorithm even more scalable. Right now, we require $2n$ parameters in the set $\theta$, however, due to the normalization condition, it must be true that:

$$p_j(\theta_{j0}) \ + \ p_j(\theta_{j1}) \ = \ 1$$

We wish to eliminate the parameter $\theta_{j1}$ from our initial states, leaving only one parameter for each of the mixed states. Without loss of generality:

$$p_j(\theta_{j1}) \ = \ 1 \ - \ e^{-\theta_{j0}}$$

And making it so that each $\rho_j$ is given as:

$$\rho_{\theta_j} \ = \ \begin{pmatrix} p_j(\theta_{j}) & 0 \\ 0 & 1 \ - \ p_j(\theta_{j}) \end{pmatrix} \ = \ \begin{pmatrix} e^{-\theta_{j}} & 0 \\ 0 & 1 \ - \ e^{-\theta_{j}} \end{pmatrix}$$

Where we have renamed $\theta_{j0}$ to $\theta_{j}$, as we are left with only one parameter. This is a huge improvement, we have reduced thee number of $\theta$ parameters from $2^n$ to $n$!

To conclude our discussion of the theory (so we can actually move onto the experiments), we will also make note that the construction of our unitary $\hat{U}(\phi)$ will vary from problem to problem, so we will discuss this in more depth for each simulation that we run.

#### VQT At Zero Temperature: The Variational Quantum Eigensolver

There is one more thing that we have to discuss before we start taking a look at some real simulations. This, in my humble opinion, is one of the coolest things about this entire quantum algorithm: as temperature approaches $0$, the VQT converges towards the well-known VQE! We can prove this more easily than you would think, if we investigate the cost function:

$$\mathcal{L}(\theta, \ \phi) \ = \ \beta \langle \hat{H} \rangle \ - \ S_\theta$$

We then divide by $\beta$ (keep in mind that for a fixed value of $\beta$, it just acts as some overall scaling parameter: the minimum of the function remains unchanged). We get:

$$\frac{1}{\beta} \ \mathcal{L}(\theta, \ \phi) \ = \ \langle \hat{H} \rangle \ - \ \frac{1}{\beta} \ S_\theta \ = \ \langle H \rangle \ - \ T S_\theta$$

So in the limit of $0$ temperature, since we know that the Von Neumann entropy is bounded above and below, the entropy/temperature term will approach $0$, and we will get:

$$\frac{1}{\beta} \ \mathcal{L}(\theta, \ \phi) \ \rightarrow \ \langle H \rangle \ \ \ \ \text{for} \ \ \ T \ \rightarrow \ 0$$

Which is exactly the Variational Quantum Eigensolver: preparring some arbitrary quantum state and then minimizing the exepcted value of the Hamiltonian to find the ground state! This then tells us that the VQT is really just a generalization of the VQE, for mixed states at temperatures above absolute $0$!