# NISQ algorithms


# QML frameworks


# Continuous Variable Quantum Computing

# Contextuality and the inductive bias


# Quantum Recurrent Unit on Gaussian platform

Contextual Recurrent Network was proposed in (Anschuetz et al. 2023) as an evolution and development of ideas from (Arute et al. 2019). Arute and others considered k-gram quantum models. They showed, that quantum based modification of Bayesian network performs better, than it's classical counterpart.

<!-- ![Performance plots](img/QBayCorrResults.png)\ -->
<img src=img/QBayCorrResults.png width=500>\
Comparison between classical and quantum model from (Arute et al. 2019).

This improvement was explained with quantum correlation, which are hard to capture using classical models.
Authors also provide a proof of expressive power separation in absolute perfect case. They demonstrate, that classical model with the same parameters won't be able to achieve finite KL divergence, while quantum, due to quantum nonlocality and contextuality, at least in theory is able.

Anschuetz et al generalized this proof for neural networks. Quantum model, that they proposed is continuous variable(CV) based. So it is designed to use photonic quantum computer, there instead of every qubit one quantum harmonic oscillator is used.
## QRNN architecture
For simulation and general simplicity author limit model to only gaussian states. State of CV device can be represented by very difficult nongaussian Wigner quasiprobability function. But initial vacuum state is just a gaussian with zero mean and unit dispersion. So if only gaussian operations are used (operations that transform gaussian to gaussian. e.g. shifting or squeezing), state will keep being gaussian. 

So if device has $n$ qumods, it's state is fully characterized by vector of means of length $n$ and $n\times n$ symmetric matrix of covariances. In experiments simulation, that stores these numbers is used.
Then evolution of system is described not by specific set of quantum operators (gates), but by general gaussian unitary operation. In fact just square matrix $W$ with size $n+m$ (where n -- hidden state size, and m -- output size). It is the most general gaussian operation, that can be applied to such system. And this matrix is learned with gradient based technic.

<img src=img/qrnn_visual.png width=500>\
Authors visualization of model (Anschuetz et al. 2023)

To feed the input data to the model, additional $m$ qumods are allocate in quantum device. Input is translated into covariance matrix and means of $m$ qumods via classical fully connected layers. $n$ memory qumod states are also additionally shifted with result of applying another fully connected layer to the input.
Finally the hole $n+m$ size system is transformed with $(n+m) \times (n+m)$ matrix $W$. This mixed memorized information with information obtained on current step.

To get output from this model, expectation values on temporal $m$ qumods are measured.


As a result QRNN performs in natural language translation task slightly better, than classical models in terms of final KL divergence.
<img src=img/qrnn_res.png width=500>

Independent implementation of QRNN, is presented in `QRNN` folder. It shows results similar to ones reported by the authors.

## Reference
Anschuetz, Eric R., Hong-Ye Hu, Jin-Long Huang, and Xun Gao. 2023. “Interpretable Quantum Advantage in Neural Sequence Learning.” PRX Quantum 4 (2): 020338. https://doi.org/10.1103/PRXQuantum.4.020338.

Arute, Frank, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, et al. 2019. “Quantum Supremacy Using a Programmable Superconducting Processor.” Nature 574 (7779): 505–10. https://doi.org/10.1038/s41586-019-1666-5.


# Cathegory theory and ZX-calculus


# Sentence grammar to Quantum Circuit translation.


# Quantum Graph models review

todo insert table with classical datasets results for different models

## All subgraphs model

All subgraphs quantum graph model for graph classification was introduced in paper "Graph kernels encoding features of all subgraphs by quantum superposition" (Kishi et al. 2022). 
The model is kernel based. It embeds graph into state vector in hilbert space, and than classification is done using standard [SVM kernel trick](https://en.wikipedia.org/wiki/Support_vector_machine). The kernel is just Gram matrix of embeddings in hilbert space. 

### Graph embedding

The main idea is to embed simple properties of a subgraph, like number of vertices or number of edges, into quantum state. Due to properties of quantum computer, qubits can store instantly such information about every subgraph of the embedded graph in the form of superposition.
And furthermore, due to linearity of quantum operators this embedding can be evaluated in parallel for all subgraphs. 
This is called the quantum oracle and this technic is widely used in famous quantum algorithms. In Shor algorithm quantum parallelism is used to raise the same number in multiple powers encoded in superposition to get superposition of the results.
$$|0\rangle + |1\rangle + |2\rangle + ... → |a^0\rangle + |a^1\rangle + |a^2\rangle + ...  $$    
Likewise in this algorithm exponential speedup is achieved, as computations are done instantly for all $2^{\#Edges}$ possible subgraphs.

In such embedding big scalar product will mean, that graphs have many pairs of subgraphs with the same properties.

## Reference:
Kishi, Kaito, Takahiko Satoh, Rudy Raymond, Naoki Yamamoto, and Yasubumi Sakakibara. 2022. “Graph Kernels Encoding Features of All Subgraphs by Quantum Superposition.” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 12 (3): 602–13. https://doi.org/10.1109/JETCAS.2022.3200837.

## Gaussian Boson Sampling Graph model

This graph model was proposed by Xanadu company in (Schuld et al. 2020).
Gaussian boson sampling is an analog of the boson sampling circuit, but for optical, or so called continuous variable quantum platform. Boson sampling is mostly famous for being one of the first circuits to demonstrate quantum advantage  (Arute et al. 2019). And it wasn't intended to do something practically useful.

![GBS circuit](img/gbs_circuit2.png) \
picture from [pennylane demo](https://pennylane.ai/qml/demos/gbs#id1)

In photonic quantum computer each memory/computational unit is not a system with 2 possible energy level, but the quantum harmonic oscillator with countable set of energy levels. And simple family of possible states of each channel is gaussian state. 

Gaussian boson sampling circuit consist of the input, which is squeezing and displacing of gaussian's states. Then there is constant beam splitters part. And finally in the end numbers of detected photons for each channel is counted ($n_1$, $n_2$, $n_3$, ...)

Theory says that probability of getting specific outcome of running GBS circuit is defined by

$$P\left(n_1, n_2 \ldots n_m\right)=\frac{1}{\operatorname{det}(Q)} \frac{\operatorname{Haf}\left(A_s\right)}{\sqrt{n_{1} ! n_{2} ! \cdots n_{m} !}}$$
where all the matrices are defined in terms of the covariance matrix of the gaussian state $\Sigma$ :

$$
\begin{array}{r}
Q=\Sigma+\mathbb{1} / 2 \\
A=X\left(\mathbb{1}-Q^{-1}\right) \\
X=\left[\begin{array}{ll}
0 & \mathbb{1} \\
\mathbb{1} & 0
\end{array}\right]
\end{array}
$$

The $A_s$ matrix is a matrix created from $A$ such that if $n_i=0$, we delete the rows and colums $i$ and $i+m$ of the matrix, and if $n_i \neq 0$, we repeat the rows and columns $n_i$ times.

Hafnian function can be determined mathematically for any symmetric matrix $C$ as

$$
\operatorname{haf}(C)=\sum_{\pi \in P_N^{\{2\}}} \prod_{(u, v) \in \pi} C_{u, v} .
$$
Here, $P_N^{\{2\}}$ is the set of all $N ! /\left((N / 2) ! 2^{N / 2}\right)$ ways to partition the index set $\{1,2, \ldots, N\}$ into $N / 2$ unordered pairs of size 2 , such that each index only appears in one pair. 

In case of applying GBS for graphs more interesting is case, when C is graph adjacency matrix. This lead to more simple interpretation of hafnian as a number of perfect matchings in the graph G. 

A perfect matching is a subset of edges such that every node is covered by exactly one of the edges. The Hafnian therefore sums the products of the edge weights in all perfect matchings. If all edge weights are constant, it simply counts the number of perfect matchings in G.

Therefore this algorithm uses some meaningful graph property for extracting features from the graph. As hafnian computing is a hard task for classical computer, this gives a potential for quantum supremacy.

Finally probabilities of obtaining specific result of gaussian boson sampling are used as a features for SVM kernel to solve classification tasks.


## Reference:
Schuld, Maria, Kamil Brádler, Robert Israel, Daiqin Su, and Brajesh Gupt. 2020. “Measuring the Similarity of Graphs with a Gaussian Boson Sampler.” Physical Review A 101 (3): 032314. https://doi.org/10.1103/PhysRevA.101.032314.

Arute, Frank, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, et al. 2019. “Quantum Supremacy Using a Programmable Superconducting Processor.” Nature 574 (7779): 505–10. https://doi.org/10.1038/s41586-019-1666-5.



# Generative models

The starting point of using quantum models for generative tasks was done by two articles published together (Dallaire-Demers and Killoran 2018) more practical and (Lloyd and Weedbrook 2018) more theoretical one.

Authors claimed that quantum models may have an exponential advantage over classical models.

This advantage comes from the fact, that measuring quantum system produces a random sample from exponentially big Hilbert Space.

These have more theoretical importance than practical, because sampling itself is rarely used in quantum machine learning. Most approaches (see for example [pennylane demos](https://pennylane.ai/qml/demonstrations/quantum-machine-learning) ) utilize expectation values of output over each qubit. On the one hand expectation value approach doesn't fully explore the potential of quantum computer. For example it fully ignores quantum correlations over different qubits. But on the other hand this approach is more practical. 

To estimate probabilities of each outcome on real quantum computer, you need exponentially many samples (to keep statistical error $1/\sqrt{n}$ constant with exponentially increasing number of outcomes). At the same time this makes expectation value more noise resistent, which is vital property in NISQ era. Outcomes themselves are only finite set, while in many practical tasks continuous model prediction (like expectation value) is required.
And another significant benefit of expectation value is differentiability. There is a parameter shift technic that replaces infinitely small difference in gradient definition with constant difference. So parameter shift technic makes gradient estimation base on medium number of samples averaging possible. 

Despite these difficulties, there are several attempts to demonstrate any quantum advantage with generative models on practical tasks.

One of the attempts is done in QuMolGAN paper (Li, Topaloglu, and Ghosh 2021). Authors took classical GAN model for generating molecules MolGAN (N. De Cao, De Cao, and Kipf 2018), and inserted small (2-4 qubit) quantum circuit in the beginning of the generator. After the quantum circuit goes normal classical layers. That's why authors call their model Quantum GAN with hybrid generator.

Authors conclude, that their quantum model is more efficient, than classical one. And quantum model can be reduced to only 15% of parameters to show performance comparable with results of nonreduced classical model. 

During training stage Fréchet distance was measured (see figure below)

<img src=img/qumolgan_a_plots.png width=500>\
authors plot with learning of quantum and classical GANs.

By dint of code, provided by author, we reproduced model learning.

Unfortunately, proper results visualization shows, that there is no significant advantage of quantum model over comparable classical counterpart.

In terms of Fréchet distance Quantum Mean Reduced (MR) model shows even worse performance, than Classical MR model, which can be seen on smoothed plots below

<img src=img/qumolgan_my_plots.png?0 width=500>

Moreover, plot of validity score (fraction of valid molecules from generated) shows, that for all models after certain moment there are only vanishingly few number of valid molecules.

<img src=img/qumolgan_my_plots_validity.png width=500>

## Reference:
Dallaire-Demers, Pierre-Luc, and Nathan Killoran. 2018. “Quantum Generative Adversarial Networks.” Physical Review A 98 (1): 012324. https://doi.org/10.1103/PhysRevA.98.012324.

Lloyd, Seth, and Christian Weedbrook. 2018. “Quantum Generative Adversarial Learning.” Physical Review Letters 121 (4): 040502–040502. https://doi.org/10.1103/physrevlett.121.040502.

Li, Junde, Rasit Topaloglu, and Swaroop Ghosh. 2021. “(QuMolGAN) Quantum Generative Models for Small Molecule Drug Discovery.” arXiv. http://arxiv.org/abs/2101.03438.

N. De Cao, Nicola De Cao, and Thomas Kipf. 2018. “MolGAN: An Implicit Generative Model for Small Molecular Graphs.” ArXiv: Machine Learning, May. https://arxiv.org/abs/1805.11973.

