# TOPIC 4: Classical Multi-Armed Bandit → Quantum Decision Model

(Strategic decision making — HIGH VALUE FOR PAPER)

In [1]:
%pip install pennylane pennylane-lightning torch scikit-learn matplotlib

Note: you may need to restart the kernel to use updated packages.


## 4A. Classical Bandit (ε-greedy)

In [2]:
# TOPIC: Classical ML - Multi-Armed Bandit

import numpy as np

arms = [0.2, 0.5, 0.8]
Q = np.zeros(len(arms))
epsilon = 0.1

for t in range(100):
    if np.random.rand() < epsilon:
        a = np.random.randint(len(arms))
    else:
        a = np.argmax(Q)

    reward = np.random.rand() < arms[a]
    Q[a] += 0.1 * (reward - Q[a])


## 4B. Quantum Bandit (VQC Policy)

In [3]:
# TOPIC: Quantum ML - Quantum Policy for Strategic Decision

import pennylane as qml
import torch
import numpy as np

dev = qml.device("default.qubit", wires=3)

@qml.qnode(dev, interface="torch")
def policy(state, weights):
    qml.AngleEmbedding(state, wires=[0,1,2])
    qml.StronglyEntanglingLayers(weights, wires=[0,1,2])
    return [qml.expval(qml.PauliZ(i)) for i in range(3)]

weights = torch.nn.Parameter(0.01 * torch.randn(3, 3, 3))
optimizer = torch.optim.Adam([weights], lr=0.1)

for episode in range(50):
    state = torch.rand(3)
    probs = torch.stack(policy(state, weights))
    action = torch.argmax(probs).item()


**Paper framing:**
“Quantum policies model strategic decisions via entangled action representations.”