**The original writeup is**



PennyLane Documentation
Variational Quantum Classifier Tutorial

https://pennylane.ai/qml/demos/tutorial_variational_classifier/

**Intro**

Variational Quantum Classifiers (VQCs)

Variational Quantum Classifiers (VQCs) are a type of hybrid quantum machine learning algorithm that combines the power of quantum computing with the flexibility of classical machine learning algorithms to achieve state-of-the-art performance on a wide range of classification tasks.
They are a class of quantum algorithms that use a variational approach to optimize a quantum circuit for a specific machine learning task, and have been shown to be effective in a variety of applications, including image classification, natural language processing, and more.

picture for the circuit

**How VQCs Works**

A VQC consists of two parts: a classical part for pre- and post-processing data, and a quantum part for harnessing the power of quantum mechanics to perform certain calculations more efficiently. The quantum part is composed of a quantum circuit that behaves similarly to a traditional machine learning algorithm and these are done by some steps


**Step 1: Encode Classical Data into a Quantum State**

To encode classical data into a quantum state, we perform certain operations to help us work with the data in quantum circuits. One of the steps is called data embedding, which is the representation of classical data as a quantum state in Hilbert space via a quantum feature map.

Angle Encoding: One common method is angle encoding, where classical features are mapped to the angles of rotation gates applied to qubits.
Example: For a data vector
𝑥
x with features
𝑥
1
,
𝑥
2
,
…
,
𝑥
𝑛
x
1
​
 ,x
2
​
 ,…,x
n
​
 , we encode these features into quantum states using rotation gates. If we have a qubit, we apply a rotation
𝑅
𝑥
(
𝜃
)
R
x
​
 (θ) where
𝜃
=
𝑥
𝑖
⋅
scaling factor
θ=x
i
​
 ⋅scaling factor. The quantum state
∣
𝜓
⟩
∣ψ⟩ is then prepared as:

∣
𝜓
⟩
=
𝑅
𝑥
(
𝜃
1
)
𝑅
𝑦
(
𝜃
2
)
⋯
𝑅
𝑧
(
𝜃
𝑛
)
∣
0
⟩
∣ψ⟩=R
x
​
 (θ
1
​
 )R
y
​
 (θ
2
​
 )⋯R
z
​
 (θ
n
​
 )∣0⟩
Here,
∣
0
⟩
∣0⟩ is the initial state of the qubit, and
𝑅
𝑥
,
𝑅
𝑦
,
𝑅
𝑧
R
x
​
 ,R
y
​
 ,R
z
​
  are rotation operators applied to the qubit.



**Step 2: inputing our data to a Quantum Circuit**

After encoding, a parameterized quantum circuit processes the quantum state. The circuit consists of various quantum gates such as Hadamard, CNOT, and rotation gates. The parameters of these gates are optimized during training.
Quantum State Evolution: The quantum state evolves according to the unitary transformations defined by the quantum circuit. If
𝑈
(
𝜃
)
U(θ) represents the unitary operation parameterized by
𝜃
θ, the state after applying the circuit is:

∣
𝜓
(
𝜃
)
⟩
=
𝑈
(
𝜃
)
∣
𝜓
⟩
∣ψ(θ)⟩=U(θ)∣ψ⟩
where
∣
𝜓
⟩
∣ψ⟩ is the encoded state.

Step 3: Apply a Parameterized Model

The goal of the training phase is to find a set of parameters that maximizes the accuracy of the classification model. In the classifying phase, the encoded quantum state will evolve through the well-trained ansatz W(θ) with optimized variational parameters θ.


**The Cost Function and Measurement:**

The cost function measures how well the quantum classifier performs. It typically involves:

Expectation Value:

Quantum measurements provide expectation values that are used to estimate probabilities. For a quantum observable
𝑂
^
O
^
 , the expectation value is given by:
⟨
𝜓
(
𝜃
)
∣
𝑂
^
∣
𝜓
(
𝜃
)
⟩
⟨ψ(θ)∣
O
^
 ∣ψ(θ)⟩
This value represents the probability of measuring a particular outcome related to the observable
𝑂
^
O
^
 .


The loss function (or cost function) quantifies the difference between the predicted and true labels. For classification tasks, the cross-entropy loss is often used:
𝐿
(
𝜃
)
=
−
∑
𝑖
𝑦
𝑖
log
⁡
(
𝑒
𝑦
^
𝑖
∑
𝑗
𝑒
𝑦
^
𝑗
)
L(θ)=−
i
∑
​
 y
i
​
 log(
∑
j
​
 e
y
^
​
j
​
e
y
^
​  
i
​
 )


where
𝑦
^
𝑖
y
^
​
i
​
  represents the predicted probability for class
𝑖
i, and
𝑦
𝑖
y
i
​
  is the true label.


 **Step 4:Parameter Optimization:**

The goal is to find the optimal parameters
𝜃
θ that minimize the cost function. This is typically done using classical optimization techniques.
Gradient-Based Optimization: Gradients of the cost function with respect to the parameters
𝜃
θ are computed to update the parameters. This can be done using methods like gradient descent or more advanced algorithms.

Parameter Shift Rule: A common method to compute gradients in quantum circuits is the parameter shift rule. For a parameterized gate
𝑈
(
𝜃
)
U(θ), the derivative of the cost function
𝐿
(
𝜃
)
L(θ) with respect to
𝜃
θ can be estimated as:

∂
𝐿
(
𝜃
)
∂
𝜃
≈
𝐿
(
𝜃
+
𝛿
)
−
𝐿
(
𝜃
−
𝛿
)
2
𝛿
∂θ
∂L(θ)
​
 ≈
2δ
L(θ+δ)−L(θ−δ)
​


where
𝛿
δ is a small shift value.

**Training:**

The optimization algorithm iteratively adjusts the parameters to minimize the cost function. This process continues until convergence.

**Now we will do these steps using VQCS with a real world data**

In this codebook we will use Iris dataset

but frist you need to make sure to upload the data before

## 1.Fitting the parity function

## setup

In [None]:
pip install pennylane pennylane

Collecting pennylane
  Downloading PennyLane-0.37.0-py3-none-any.whl.metadata (9.3 kB)
Collecting rustworkx (from pennylane)
  Downloading rustworkx-0.15.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.9 kB)
Collecting appdirs (from pennylane)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting semantic-version>=2.7 (from pennylane)
  Downloading semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting autoray>=0.6.11 (from pennylane)
  Downloading autoray-0.6.12-py3-none-any.whl.metadata (5.7 kB)
Collecting pennylane-lightning>=0.37 (from pennylane)
  Downloading PennyLane_Lightning-0.37.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (23 kB)
Downloading PennyLane-0.37.0-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading autoray-0.6.12-py3-none-any.whl (50 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

##Imports

In [None]:
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer

##Defining the Quantum Circuit

In [None]:
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer


##Defining the Variational Quantum Classifier

In [None]:
def variational_classifier(weights, x):
    return circuit(weights, x)

##Quantum and Classical Nodes


We then create a quantum device that will run our circuits.

In [None]:
dev = qml.device("default.qubit", wires=4)

@qml.qnode(dev)
def circuit(weights, x):
    qml.templates.AngleEmbedding(x, wires=range(4))
    qml.templates.StronglyEntanglingLayers(weights, wires=range(4))
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

Variational classifiers usually define a “layer” or “block”, which is an elementary circuit architecture that gets repeated to build the full variational circuit.

Our circuit layer will use four qubits, or wires, and consists of an arbitrary rotation on every qubit, as well as a ring of CNOTs that entangles each qubit with its neighbor. Borrowing from machine learning, we call the parameters of the layer weights.

In [None]:
def layer(layer_weights):
    for wire in range(4):
        qml.Rot(*layer_weights[wire], wires=wire)

    for wires in ([0, 1], [1, 2], [2, 3], [3, 0]):
        qml.CNOT(wires)

We also need a way to encode data inputs into the circuit, so that the measured output depends on the inputs. In this first example, the inputs are bitstrings, which we encode into the state of the qubits. The quantum state after state preparation is a computational basis state that has 1s where the input bitstring has 1s.

The BasisState function provided by PennyLane is made to do just this. It expects the input to be a list of zeros and ones.

In [None]:
def state_preparation(x):
    qml.BasisState(x, wires=[0, 1, 2, 3])

 we define the variational quantum circuit as this state preparation routine, followed by a repetition of the layer structure.

In [None]:
@qml.qnode(dev)
def circuit(weights, x):
    state_preparation(x)

    for layer_weights in weights:
        layer(layer_weights)

    return qml.expval(qml.PauliZ(0))

If we want to add a “classical” bias parameter, the variational quantum classifier also needs some post-processing. We define the full model as a sum of the output of the quantum circuit, plus the trainable bias.

In [None]:
def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

## The cost function



In supervised learning, the cost function typically combines a loss function with a regularization term. For simplicity, we focus on the standard squared loss function, which evaluates how far the model's predictions are from the true target labels.

Additionally, to evaluate the performance of the classifier, we use accuracy, which measures the percentage of predictions that match the true target labels. Accuracy provides a straightforward metric for assessing how often the classifier correctly identifies the correct class.



First we initialize the variables.



In [None]:
def square_loss(labels, predictions):
    # We use a call to qml.math.stack to allow subtracting the arrays directly
    return np.mean((labels - qml.math.stack(predictions)) ** 2)

To monitor how many inputs the current classifier predicted correctly, we also define the accuracy, or the proportion of predictions that agree with a set of target labels.



In [None]:
def accuracy(labels, predictions):
    acc = sum(abs(l - p) < 1e-5 for l, p in zip(labels, predictions))
    acc = acc / len(labels)
    return acc

During the training of our model, the cost function's computation hinges on the specific data used—namely, the features and labels that are processed in each optimization iteration.

In [None]:
def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

##Optimization

You need to load and preprocess some data.

In [None]:
def state_preparation(x):
    qml.BasisState(x, wires=[0, 1, 2, 3])

 Loading the Iris Dataset and Preparing the Data

We initialize the variables randomly but fix a seed for reproducibility

In [None]:
np.random.seed(0)
num_qubits = 4
num_layers = 2
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

print("Weights:", weights_init)
print("Bias: ", bias_init)

Weights: [[[ 0.01764052  0.00400157  0.00978738]
  [ 0.02240893  0.01867558 -0.00977278]
  [ 0.00950088 -0.00151357 -0.00103219]
  [ 0.00410599  0.00144044  0.01454274]]

 [[ 0.00761038  0.00121675  0.00443863]
  [ 0.00333674  0.01494079 -0.00205158]
  [ 0.00313068 -0.00854096 -0.0255299 ]
  [ 0.00653619  0.00864436 -0.00742165]]]
Bias:  0.0


Then  we  need to create an optimizer instance and choose a batch size…

In [None]:
opt = NesterovMomentumOptimizer(0.5)
batch_size = 5

Run the optimizer to train the model

In [None]:
weights = weights_init
bias = bias_init
for it in range(100):

    # Update the weights by one optimizer step, using only a limited batch of data
    batch_index = np.random.randint(0, len(X), (batch_size,))
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]
    weights, bias = opt.step(cost, weights, bias, X=X_batch, Y=Y_batch)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X]

    current_cost = cost(weights, bias, X, Y)
    acc = accuracy(Y, predictions)

    print(f"Iter: {it+1:4d} | Cost: {current_cost:0.7f} | Accuracy: {acc:0.7f}")

With this in mind, let's evaluate the performance of our classifier on a separate test set of examples that were not used during training. This will give us a better understanding of how well our model can generalize to new, unseen data.

In [None]:
data = np.loadtxt("variational_classifier/data/parity_test.txt", dtype=int)
X_test = np.array(data[:, :-1])
Y_test = np.array(data[:, -1])
Y_test = Y_test * 2 - 1  # shift label from {0, 1} to {-1, 1}

predictions_test = [np.sign(variational_classifier(weights, bias, x)) for x in X_test]

for x,y,p in zip(X_test, Y_test, predictions_test):
    print(f"x = {x}, y = {y}, pred={p}")

acc_test = accuracy(Y_test, predictions_test)
print("Accuracy on unseen data:", acc_test)

##2. Iris classification

We're now going to tackle a more complex classification task using the Iris dataset, where each data point is represented as a 2-dimensional real-valued vector. To prepare this data for our quantum classifier, we'll add some "latent dimensions" to the vectors, allowing us to encode the inputs into a 2-qubit quantum state. This will enable us to explore the capabilities of our variational quantum classifier on a more realistic dataset

##Quantum and classical nodes

**State Preparation for Real-Valued Vectors**

When working with real-valued vectors, state preparation becomes more intricate compared to representing bitstrings with basis states. Each input x needs to be translated into a set of angles, which are then fed into a small routine for state preparation. To simplify this process, we'll focus on data from the positive subspace, allowing us to ignore signs and avoid additional rotations around the Z-axis.

**Circuit Implementation**


Our circuit is implemented based on the scheme outlined in Möttönen et al. (2004) and Schuld and Petruccione (2018), which is adapted for positive vectors only. Additionally, we've decomposed controlled Y-axis rotations into more basic gates, following the approach described in Nielsen and Chuang (2010).

Let me know if you'd like me to summarize this as well!




Share
New Chat
Scroll to bottom



In [None]:
def get_angles(x):
    beta0 = 2 * np.arcsin(np.sqrt(x[1] ** 2) / np.sqrt(x[0] ** 2 + x[1] ** 2 + 1e-12))
    beta1 = 2 * np.arcsin(np.sqrt(x[3] ** 2) / np.sqrt(x[2] ** 2 + x[3] ** 2 + 1e-12))
    beta2 = 2 * np.arcsin(np.linalg.norm(x[2:]) / np.linalg.norm(x))

    return np.array([beta2, -beta1 / 2, beta1 / 2, -beta0 / 2, beta0 / 2])


def state_preparation(a):
    qml.RY(a[0], wires=0)

    qml.CNOT(wires=[0, 1])
    qml.RY(a[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(a[2], wires=1)

    qml.PauliX(wires=0)
    qml.CNOT(wires=[0, 1])
    qml.RY(a[3], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(a[4], wires=1)
    qml.PauliX(wires=0)

we could test this code by adding

In [None]:
x = np.array([0.53896774, 0.79503606, 0.27826503, 0.0], requires_grad=False)
ang = get_angles(x)


@qml.qnode(dev)
def test(angles):
    state_preparation(angles)

    return qml.state()


state = test(ang)

print("x               : ", np.round(x, 6))
print("angles          : ", np.round(ang, 6))
print("amplitude vector: ", np.round(np.real(state), 6))

Now that we're working with 2-qubit circuits, we need to modify the layer function to accommodate this change. This update will ensure that our variational quantum circuit is tailored to the new 2-qubit architecture.

In [None]:
def layer(layer_weights):
    for wire in range(2):
        qml.Rot(*layer_weights[wire], wires=wire)
    qml.CNOT(wires=[0, 1])


def cost(weights, bias, X, Y):
    # Transpose the batch of input data in order to make the indexing
    # in state_preparation work
    predictions = variational_classifier(weights, bias, X.T)
    return square_loss(Y, predictions)

##The data

**Preparing the Iris Dataset**


We load the Iris dataset and preprocess it for our quantum classifier. This involves:

Adding two "latent dimensions" to each data point to match the quantum state vector size
Normalizing the data to prevent feature dominance
Converting inputs to rotation angles using the get_angles function

**Remember**

Download the Iris dataset and place it in the variational_classifer/data subfolder before proceeding.

In [None]:
data = np.loadtxt("variational_classifier/data/iris_classes1and2_scaled.txt")
X = data[:, 0:2]
print(f"First X sample (original)  : {X[0]}")

# pad the vectors to size 2^2=4 with constant values
padding = np.ones((len(X), 2)) * 0.1
X_pad = np.c_[X, padding]
print(f"First X sample (padded)    : {X_pad[0]}")

# normalize each input
normalization = np.sqrt(np.sum(X_pad**2, -1))
X_norm = (X_pad.T / normalization).T
print(f"First X sample (normalized): {X_norm[0]}")

# the angles for state preparation are the features
features = np.array([get_angles(x) for x in X_norm], requires_grad=False)
print(f"First features sample      : {features[0]}")

Y = data[:, -1]

We've transformed the original data into a new set of features, represented as angles. That's why we've renamed the input data X to features. Now, let's visualize the preprocessing steps and experiment with different dimension combinations (like dim1 and dim2). We'll see that some of these new features still effectively distinguish between classes, while others are less useful for classification.

In [None]:
import matplotlib.pyplot as plt

plt.figure()
plt.scatter(X[:, 0][Y == 1], X[:, 1][Y == 1], c="b", marker="o", ec="k")
plt.scatter(X[:, 0][Y == -1], X[:, 1][Y == -1], c="r", marker="o", ec="k")
plt.title("Original data")
plt.show()

plt.figure()
dim1 = 0
dim2 = 1
plt.scatter(X_norm[:, dim1][Y == 1], X_norm[:, dim2][Y == 1], c="b", marker="o", ec="k")
plt.scatter(X_norm[:, dim1][Y == -1], X_norm[:, dim2][Y == -1], c="r", marker="o", ec="k")
plt.title(f"Padded and normalised data (dims {dim1} and {dim2})")
plt.show()

plt.figure()
dim1 = 0
dim2 = 3
plt.scatter(features[:, dim1][Y == 1], features[:, dim2][Y == 1], c="b", marker="o", ec="k")
plt.scatter(features[:, dim1][Y == -1], features[:, dim2][Y == -1], c="r", marker="o", ec="k")
plt.title(f"Feature vectors (dims {dim1} and {dim2})")
plt.show()

Now, we want to take our model to the next level by enabling it to make predictions on unseen data. To achieve this, we'll train our model on one dataset and evaluate its performance on a separate dataset that it hasn't seen before. This is known as generalization. To keep track of how well our model generalizes, we'll divide our data into two subsets: a training set for model training and a validation set for performance monitoring.

In [None]:
np.random.seed(0)
num_data = len(Y)
num_train = int(0.75 * num_data)
index = np.random.permutation(range(num_data))
feats_train = features[index[:num_train]]
Y_train = Y[index[:num_train]]
feats_val = features[index[num_train:]]
Y_val = Y[index[num_train:]]

# We need these later for plotting
X_train = X[index[:num_train]]
X_val = X[index[num_train:]]

**Optimization**
First we initialize the variables.

In [None]:
opt = NesterovMomentumOptimizer(0.01)
batch_size = 5

# train the variational classifier
weights = weights_init
bias = bias_init
for it in range(60):
    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, num_train, (batch_size,))
    feats_train_batch = feats_train[batch_index]
    Y_train_batch = Y_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, feats_train_batch, Y_train_batch)

    # Compute predictions on train and validation set
    predictions_train = np.sign(variational_classifier(weights, bias, feats_train.T))
    predictions_val = np.sign(variational_classifier(weights, bias, feats_val.T))

    # Compute accuracy on train and validation set
    acc_train = accuracy(Y_train, predictions_train)
    acc_val = accuracy(Y_val, predictions_val)

    if (it + 1) % 2 == 0:
        _cost = cost(weights, bias, features, Y)
        print(
            f"Iter: {it + 1:5d} | Cost: {_cost:0.7f} | "
            f"Acc train: {acc_train:0.7f} | Acc validation: {acc_val:0.7f}"
        )

We can plot the continuous output of the variational classifier for the first two dimensions of the Iris data set.

In [None]:
plt.figure()
cm = plt.cm.RdBu

# make data for decision regions
xx, yy = np.meshgrid(np.linspace(0.0, 1.5, 30), np.linspace(0.0, 1.5, 30))
X_grid = [np.array([x, y]) for x, y in zip(xx.flatten(), yy.flatten())]

# preprocess grid points like data inputs above
padding = 0.1 * np.ones((len(X_grid), 2))
X_grid = np.c_[X_grid, padding]  # pad each input
normalization = np.sqrt(np.sum(X_grid**2, -1))
X_grid = (X_grid.T / normalization).T  # normalize each input
features_grid = np.array([get_angles(x) for x in X_grid])  # angles are new features
predictions_grid = variational_classifier(weights, bias, features_grid.T)
Z = np.reshape(predictions_grid, xx.shape)

# plot decision regions
levels = np.arange(-1, 1.1, 0.1)
cnt = plt.contourf(xx, yy, Z, levels=levels, cmap=cm, alpha=0.8, extend="both")
plt.contour(xx, yy, Z, levels=[0.0], colors=("black",), linestyles=("--",), linewidths=(0.8,))
plt.colorbar(cnt, ticks=[-1, 0, 1])

# plot data
for color, label in zip(["b", "r"], [1, -1]):
    plot_x = X_train[:, 0][Y_train == label]
    plot_y = X_train[:, 1][Y_train == label]
    plt.scatter(plot_x, plot_y, c=color, marker="o", ec="k", label=f"class {label} train")
    plot_x = (X_val[:, 0][Y_val == label],)
    plot_y = (X_val[:, 1][Y_val == label],)
    plt.scatter(plot_x, plot_y, c=color, marker="^", ec="k", label=f"class {label} validation")

plt.legend()
plt.show()

picture for the output graf