##### 
Quantum Vision Transformer Tutorial:
##### 1. Description of Quantum Vision Transformer Architecture:
<p> Quantum Vision Transformer is SOTA (state of the art) neural network architecture that works on image data. 
 It was shown in numerous works that the quantum vision transformer can outperform own classical counterpart.
This tutorial demonstrates the implementation of the hybrid architecture Quantum Vision Transformer [1][2], where some of the operation is quantum (like a Linear layer, Attention Layer) and some are classical.
</p>


##### 2. Quantum operations:
###### 2.1 Angle Encoding.
###### To feed the input data into the quantum circuit, we need to decode it using the angle encoding procedure; all input tensors are expanded at used as an angle for the rotation operation.
###### 2.2 Quantum Layer:
###### The quantum layer is constructed using the Rotation Operator acting on each wire, followed by the CNOT gate.

###### 2.3 Attention mechanism:

<p> It works with the sequence representation of patched images and utilizes the Attention mechanism that is the backbone of the Transformer architecture family. The multihead attention block is traditional for the transformer architecture; for the quantumness, the classical block is replaced with VQC. Another way to compute the attention is to use the quantum orthogonal layer, and calculate the dot product of i-th and j-th vectors.
</p>

##### 3. Classical operations:
###### 3.1 Positional Encoder - the operation for the positional information of image patches incorporation.
###### 3.3 FFN - Fully Connected Block, consists of MLP and LayerNormalization, followed by the residual connection.
######  3.4. Transformer Block
##### 4. Training Procedure

In [1]:
import classiq
classiq.authenticate()

Generating a new refresh token should only be done if the current refresh token is compromised.
To do so, set the overwrite parameter to true


In [2]:
import torch
import torch.optim as optim
import tqdm
import math
from classiq import *
from classiq import (
    synthesize,
    qfunc,
    QArray,
    QBit,
    RX,
    CArray,
    Output,
    CReal,
    repeat,
    create_model,
    show
)
from classiq.execution import execute_qnn
from classiq.applications.qnn import QLayer
from classiq.qmod.symbolic import pi
from torch.nn.utils.rnn import pad_sequence
import torchvision.transforms as transforms
from torchvision import datasets
from classiq.execution import (
    ExecutionPreferences,
    execute_qnn,
    set_quantum_program_execution_preferences,
)
from classiq.synthesis import SerializedQuantumProgram
from classiq.applications.qnn.types import (
    MultipleArguments,
    ResultsCollection,
    SavedResult,
)

  Referenced from: <CFED5F8E-EC3F-36FD-AAA3-2C6C7F8D3DD9> /opt/anaconda3/envs/baler/lib/python3.11/site-packages/torchvision/image.so
  warn(


In [3]:
N_QUBITS = 4
num_shots = 1000

In [4]:
from IPython.display import Image 
  
# get the image 
Image(url="axioms-13-00323-g004-550.jpg", width=800, height=400) 

In [5]:
Image(url="axioms-13-00323-g005-550.jpg", width=800, height=300) 


In [6]:
def execute(
    quantum_program: SerializedQuantumProgram, arguments: MultipleArguments
) -> ResultsCollection:
    quantum_program = set_quantum_program_execution_preferences(
        quantum_program, preferences=ExecutionPreferences(num_shots=num_shots)
    )
    return execute_qnn(quantum_program, arguments)

In [7]:
def post_process(result: SavedResult) -> torch.Tensor:
    res = result.value
    yvec = [
        (res.counts_of_qubits(k)["1"] if "1" in res.counts_of_qubits(k) else 0)
        / num_shots
        for k in range(N_QUBITS)
    ]

    return torch.tensor(yvec)

In [8]:
def get_circuit():

    #This function produces the quantum circuit:
    @qfunc
    def vqc(weight_: CArray[CArray[CReal, N_QUBITS], N_QUBITS], res:QArray) -> None:
        
        num_qubits = N_QUBITS
        num_qlayers = N_QUBITS
        
        repeat(
            count=num_qlayers,
            iteration=lambda i: repeat(count=num_qubits,  iteration=lambda j: RX(pi * weight_[i][j], res[j]))
        )
        
        repeat(
            count=num_qubits - 1,
            iteration=lambda index: CX(ctrl=res[index], target=res[index + 1]),
        )
        
        CX(ctrl=res[num_qubits-1], target=res[0])

    
    
    @qfunc
    def main(input_: CArray[CReal, N_QUBITS], weight_: CArray[CArray[CReal, N_QUBITS], N_QUBITS], res: Output[QArray[QBit, N_QUBITS]]) -> None:
        

        encode_in_angle(input_, res)
        vqc(weight_, res)


    qmod = create_model(main)
    quantum_program  = synthesize(qmod)
    return quantum_program


In [9]:
class Patchify(torch.nn.Module):
    """
    Patchify layer implemented using the Conv2d layer
    """
    def __init__(self, in_channels:int, patch_size:int, hidden_size:int):
        super(Patchify, self).__init__()
        self.patch_size = patch_size
        self.conv = torch.nn.Conv2d(in_channels=in_channels, out_channels=hidden_size, kernel_size=self.patch_size, stride=self.patch_size)
        self.hidden_size = hidden_size
        
    def forward(self, x:torch.Tensor):
        bs, c, h, w = x.size()
        self.num_patches = (h // self.patch_size) ** 2

        x = self.conv(x)
        x = x.view(bs, self.num_patches, self.hidden_size)
        return x

#### Rotary Positional Embedding:
#### $$f_{q, k}(x,m) = R^{d}_{\theta, m} W_{q,x}x_{m}$$


In [10]:
class RotaryPositionalEmbedding(torch.nn.Module):
    """
    Rotary Positional Embedding
    """
    def __init__(self, d_model, max_seq_len):
        super(RotaryPositionalEmbedding, self).__init__()

        # Create a rotation matrix.
        self.rotation_matrix = torch.zeros(d_model, d_model)
        for i in range(d_model):
            for j in range(d_model):
                self.rotation_matrix[i, j] = math.cos(i * j * 0.01)

        # Create a positional embedding matrix.
        self.positional_embedding = torch.zeros(max_seq_len, d_model)
        for i in range(max_seq_len):
            for j in range(d_model):
                self.positional_embedding[i, j] = math.cos(i * j * 0.01)

    def forward(self, x):
        """
        Args:
            x: A tensor of shape (batch_size, seq_len, d_model).

        Returns:
            A tensor of shape (batch_size, seq_len, d_model).
        """

        # Add the positional embedding to the input tensor.
        x += self.positional_embedding

        # Apply the rotation matrix to the input tensor.
        x = torch.matmul(x, self.rotation_matrix)

        return x

##### Quantum Layer:

In [11]:
Image(url="classiq_circuit.png", width=800, height=300) 

In [20]:
class QuantumLayer(torch.nn.Module):
    """
    Quantum Layer
    """
    def __init__(self, in_dim, out_dim):
        super(QuantumLayer, self).__init__()
        self.quantum_program = get_circuit()
        self.quantum_layer = QLayer(self.quantum_program, execute_qnn, post_process)

    def forward(self, x:torch.Tensor):
        x = self.quantum_layer(x)
        return x

##### Feed Forward Neural Network:
 $$f_{i}(X) =  GELU \circ Dropout \circ QuantumLayer(X)$$

In [21]:
class FFN(torch.nn.Module):
    """
    Feed Forward Network
    """
    def __init__(self, in_dim, hidden_size):
        super().__init__()
        self.qlinear = QuantumLayer(hidden_size, hidden_size)
        self.dropout = torch.nn.Dropout(p=0.4)
        return
    
    def forward(self, x:torch.Tensor):
        seq_len = x.size()[1]
        #x = self.linear_1(x)
        x = [self.qlinear(x[:, t, :]) for t in range(seq_len)]
        x = torch.Tensor(pad_sequence(x))
        x = self.dropout(x)
        x = torch.nn.functional.gelu(x)
        return x

#### Multihead Attention:
#### $$Attention = softmax(\frac{K(X)*Q(X)^T}{\sqrt{dim}})*V(X)$$, where K, Q, V is the quantum Linear Projection of the input data;

In [22]:
class qMHA(torch.nn.Module):
    """
    Quantum Multihead Attention
    """
    def __init__(self, in_dim:int, num_heads:int) -> None:
        super().__init__()

        self.k_linear = QuantumLayer(in_dim, in_dim);
        self.q_linear = QuantumLayer(in_dim, in_dim);
        self.v_linear = QuantumLayer(in_dim, in_dim);
        self.dropout = torch.nn.Dropout(p=0.1)
        
        self.num_heads = num_heads
        self.in_dim = in_dim
        
        return

    def forward(self, X:torch.Tensor):

        seq_len = X.size()[1]
        K = [self.k_linear(X[:, t, :]) for t in range(seq_len)]
        Q = [self.q_linear(X[:, t, :]) for t in range(seq_len)]
        V = [self.v_linear(X[:, t, :]) for t in range(seq_len)]
        
        k = torch.Tensor(pad_sequence(K))
        q = torch.Tensor(pad_sequence(Q))
        v = torch.Tensor(pad_sequence(V))
    
        attention = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
        attention = torch.nn.functional.softmax(attention, dim=-1)

        attention = self.dropout(attention)
        attention = attention @ v 
        #x = self.final_l(attention)
        return attention

#### Transformer Encoder Block:
#### 

$$
 \begin{equation}
    \begin{cases}
     f_{i-1}(x) = X + GELU\circ Linear \circ Dropout \circ QuantumLinear \circ Linear \circ X\\
    f_{i}(x) = f_{i-1}(X) + GELU \circ Linear \circ Dropout \circ QuantumAttention \circ Linear \circ f_{i-1}(X)
 \end{cases}
 \end{equation}
 $$
GELU is an activation function. Linear is the linear projection of the input tensor.

In [23]:
class qTransformerEncoder(torch.nn.Module):
    """
    Quantum Transformer Encoder Layer
    """
    def __init__(self, in_dim:int, num_heads:int) -> None:
        super().__init__()
        
        self.layer_norm_1 = torch.nn.LayerNorm(normalized_shape=in_dim)
        self.layer_norm_2 = torch.nn.LayerNorm(normalized_shape=in_dim)
        
        self.qMHA = qMHA(in_dim, num_heads)
        self.qFFN = FFN(in_dim, hidden_size=in_dim)
        self.dropout = torch.nn.Dropout(p=0.1)
        

    def forward(self, X:torch.Tensor):
        x = self.qMHA(X)
        
        x = (self.layer_norm_1(x) + X)
        x = self.dropout(x)
        
        y = self.qFFN(x)
        y = self.layer_norm_2(y)+x
        return y

#### Quantum Vision Transformer:

$$
 \begin{equation}
     \begin{cases}
     X = Patrchify(X)\\
     X = PositionalEncoding (X)\\
     X = TransformerEncoder(X)\\
     X = Mean(X)\\
     X = Softmax(X)
    \end{cases}
 \end{equation}
 $$


In [24]:
class QVT(torch.nn.Module):
    """
    Quantum Vision Transformer;
    """
    def __init__(self, in_channels, patch_size, in_dim, hidden_size,  num_heads, n_classes, n_layers) -> None:
        super().__init__()
        
        self.d_model = (in_dim//patch_size)**2
        self.n_classes = n_classes

        self.patch_formation = Patchify(in_channels=in_channels, patch_size=patch_size, hidden_size=hidden_size)

        self.pos_encoding = RotaryPositionalEmbedding(hidden_size, self.d_model)
        self.transformer_blocks = torch.nn.ModuleList([qTransformerEncoder(hidden_size, num_heads) for i in range(n_layers)])
                
        self.final_normalization = torch.nn.LayerNorm(hidden_size)
        self.final_layer = torch.nn.Linear(hidden_size, self.n_classes)

    def forward(self, x: torch.Tensor) -> torch.Tensor:  
        
        x = self.patch_formation(x)
        x += self.pos_encoding(x)
        
        for trans_block in self.transformer_blocks:
            x = trans_block(x)
        
        x = self.final_normalization(x)
        x = x.mean(axis=1)
        x = self.final_layer(x)
        
        return x

#### Definition of MNIST dataset and dataloader:

In [25]:
#### Example with the MNIST Dataset:
transform=transforms.Compose([
                          transforms.ToTensor(), # first, convert image to PyTorch tensor
                          transforms.Normalize((0.1307,), (0.3081,)) # normalize inputs
                      ])
dataset1 = datasets.MNIST('../data', train=True, download=True,transform=transform)
dataset2 = datasets.MNIST('../data', train=False,transform=transform)

train_loader = torch.utils.data.DataLoader(dataset1,batch_size=256)
test_loader = torch.utils.data.DataLoader(dataset2,batch_size=256)

In [26]:
#### Classifier and optimizer definition:

In [27]:

clf = QVT(in_channels=1, patch_size=7, in_dim=28, hidden_size=4, num_heads=1, n_classes=10, n_layers=1)

opt = optim.SGD(clf.parameters(), lr=0.001, momentum=0.5)

loss_history = []
acc_history = []

#### Training Procedure:

In [28]:
def train():
    clf.train() # set model in training mode (need this because of dropout)
    
    # dataset API gives us pythonic batching 
    for data, label in tqdm.tqdm(train_loader):
        opt.zero_grad()
        preds = clf(data)
        loss = torch.nn.functional.nll_loss(preds, label)
        loss.backward()
        loss_history.append(loss)
        opt.step()
    return loss_history

def test():
    clf.eval() # set model in inference mode (need this because of dropout)
    test_loss = 0
    correct = 0
    
    for data, target in tqdm.tqdm(test_loader):
        
        output = clf(data)
        test_loss += torch.nn.functional.nll_loss(output, target).item()
        pred = output.argmax() # get the index of the max log-probability
        correct += pred.eq(target).cpu().sum()

    test_loss = test_loss
    test_loss /= len(test_loader) # loss function already averages over batch size
    accuracy = 100. * correct / len(test_loader.dataset)
    acc_history.append(accuracy)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        accuracy))

In [None]:
#for epoch in range(0, 3):
train()

  0%|                                                   | 0/235 [00:00<?, ?it/s]

In [None]:
from matplotlib import pyplot as plt

In [None]:
plt.style.use('fivethirtyeight')
plt.title('Model Loss')
plt.plot(range(len(loss_history)), [i.detach().numpy() for i in loss_history], label="training")
plt.xlabel('Steps')
plt.ylabel('Loss')
plt.legend()
plt.show()

##### List of refferences:
##### [1] Quantum Vision Transformer. https://arxiv.org/pdf/2209.08167
##### [2] Quantum Vision Transformers for Quark–Gluon Classification. https://arxiv.org/pdf/2405.10284
##### [3] Quantum Attention for Vision Transformer in High-Energy Physics. https://arxiv.org/pdf/2411.13520
##### [4] Quantum Vision Transformers for Quark–Gluon Classification. https://indico.jlab.org/event/459/papers/11832/files/1318-First_Measurements_With_A_Quantum_Vision_Transformer_A_Naive_Approach_IEEE__CHEP_refereeEdits.pdf
##### [5] Quantum Vision Transformers for Quark–Gluon Classification. https://arxiv.org/pdf/2405.10284
##### [6] Quantum Mixed-State Self-Attention Network. https://arxiv.org/html/2403.02871v1
##### [7] Quantum Self-Attention Neural Networks for Text Classification. https://arxiv.org/pdf/2205.05625