# Quantum Recurrent Unit on Gaussian platform

<img src="../../img/rnn_translate_vis.png" width=600> \
Authors visualization of model (Anschuetz et al. 2023)

Contextual Recurrent Network was proposed in (Anschuetz et al. 2023) as an evolution and development of ideas from (Arute et al. 2019). Arute and others considered k-gram quantum models. They showed, that quantum based modification of Bayesian network performs better, than it's classical counterpart.

<!-- ![Performance plots](https://raw.githubusercontent.com/Hacker1337/QML_review/f27334cc67f0675854cfce04df67f6facd5b95de/img/QBayCorrResults.png)\ -->
<img src=https://raw.githubusercontent.com/Hacker1337/QML_review/f27334cc67f0675854cfce04df67f6facd5b95de/img/QBayCorrResults.png width=500>\
Comparison between classical and quantum model from (Arute et al. 2019).

This improvement was explained with quantum correlation, which are hard to capture using classical models.
Authors also provide a proof of expressive power separation in absolute perfect case. They demonstrate, that classical model with the same parameters won't be able to achieve finite KL divergence, while quantum, due to quantum nonlocality and contextuality, at least in theory is able.

Anschuetz et al generalized this proof for neural networks. Quantum model, that they proposed is continuous variable(CV) based. So it is designed to use photonic quantum computer, there instead of every qubit one quantum harmonic oscillator is used.
## QRNN architecture
For simulation and general simplicity author limit model to only gaussian states. State of CV device can be represented by very difficult nongaussian Wigner quasiprobability function. But initial vacuum state is just a gaussian with zero mean and unit dispersion. So if only gaussian operations are used (operations that transform gaussian to gaussian. e.g. shifting or squeezing), state will keep being gaussian. 

So if device has $n$ qumods, it's state is fully characterized by vector of means of length $n$ and $n\times n$ symmetric matrix of covariances. In experiments simulation, that stores these numbers is used.
Then evolution of system is described not by specific set of quantum operators (gates), but by general gaussian unitary operation. In fact just square matrix $W$ with size $n+m$ (where n -- hidden state size, and m -- output size). It is the most general gaussian operation, that can be applied to such system. And this matrix is learned with gradient based technic.

<img src=https://raw.githubusercontent.com/Hacker1337/QML_review/f27334cc67f0675854cfce04df67f6facd5b95de/img/qrnn_visual.png width=500>\
Authors visualization of model (Anschuetz et al. 2023)

To feed the input data to the model, additional $m$ qumods are allocate in quantum device. Input is translated into covariance matrix and means of $m$ qumods via classical fully connected layers. $n$ memory qumod states are also additionally shifted with result of applying another fully connected layer to the input.
Finally the hole $n+m$ size system is transformed with $(n+m) \times (n+m)$ matrix $W$. This mixed memorized information with information obtained on current step.

To get output from this model, expectation values on temporal $m$ qumods are measured.


As a result QRNN performs in natural language translation task slightly better, than classical models in terms of final KL divergence.\
<img src=https://raw.githubusercontent.com/Hacker1337/QML_review/f27334cc67f0675854cfce04df67f6facd5b95de/img/qrnn_res.png?0 width=500>

Independent implementation of QRNN, is presented in `QRNN` folder. It shows results similar to ones reported by the authors.

Our results are available in wandb platform https://wandb.ai/amirfvb/CRNN_translate and in interactive plot below.

## Reference
Anschuetz, Eric R., Hong-Ye Hu, Jin-Long Huang, and Xun Gao. 2023. “Interpretable Quantum Advantage in Neural Sequence Learning.” PRX Quantum 4 (2): 020338. https://doi.org/10.1103/PRXQuantum.4.020338.

Arute, Frank, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, et al. 2019. “Quantum Supremacy Using a Programmable Superconducting Processor.” Nature 574 (7779): 505–10. https://doi.org/10.1038/s41586-019-1666-5.


# Plots of reproduction results

In [1]:
import wandb
api = wandb.Api()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:

good_ids_list = [
    ["846g2vi3",
     "pz9d14ng",
     "lr393erl",
     "juuqurqw",
     "o6bzkp1b",

     "0ej81wbm",
     "rc4mqyhy",
     "y2h0j0dx",
     "ck9x4hg6",
     
    #  "8z8dbd2z",
     
     "uxyly8yb",
    #  "0h35d3wx",
     "drxq6qmg",
     
    #  "s9yfkcpp",
     
     
     "tw1zhhtn",
     "r422zn0s",
     "nhtn6yyl",
     ],
    [
        # "tbp20rlx",
        # "y5t1e4nd"
     ],
]
prefix_list = ["amirfvb/CRNN_translate",
               "amirfvb/CRNN_translate_debug"]

rows = []

for good_ids, prefix in zip(good_ids_list, prefix_list):
    for id in good_ids:
        run = api.run(f"{prefix}/{id}")
        rows.append({"model": run.config["model"],
                    "loss": run.history()['epoch/val_loss'].min(),
                    "model_size": run.config["hidden_dims"],
                    })
        
rows.append({"model": "crnn",
                    "loss":  4.266,
                    "model_size": 4,
                    })

rows.append({"model": "crnn",
                    "loss": 2.966,
                    "model_size": 15,
                    })


df = pd.DataFrame(rows)
df = df.sort_values("model_size")
df

Unnamed: 0,model,loss,model_size
2,gru,4.283298,4
10,lstm,4.248574,4
14,crnn,4.266,4
4,gru,3.604011,10
7,crnn,3.430788,10
13,lstm,3.563878,10
1,gru,3.240333,15
6,crnn,3.05851,15
12,lstm,3.076808,15
15,crnn,2.966,15


In [3]:
import altair as alt

alt.Chart(df).mark_line(point=True).encode(
    x='model_size:Q',
    y='loss:Q',
    color='model:N'
).properties(
    width=500,
    height=500,
    title="Quantum model vs Classical",
).interactive()

