## QSVC for Fraud Detection

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from jax_utils import square_kernel_matrix_jax, kernel_matrix_jax, target_alignment_jax
import pennylane.numpy as pnp
import pennylane as qml
import jax
import jax.numpy as jnp
import optax

from jax.config import config
config.update("jax_enable_x64", True)

seed = 42
np.random.seed(seed)
pnp.random.seed(seed)

jax.devices()

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)


[CpuDevice(id=0)]

## Splitting data
 
We split the saved train data to train and validation datasets. We also change the labels from `0, 1` to `-1, 1` because we will be using `Pauli-Z` observable in the Quantum Circuit whose eigenvalues are `-1, 1`. Therefore, a 1 means that the stock price is manipulated.

In [2]:
train_df = pd.read_csv('./data/processed_train.csv')
test_df = pd.read_csv('./data/processed_test.csv')

In [3]:
target = 'manipulated'

In [4]:
X, y = train_df.drop(target, axis=1).to_numpy(), train_df[target].to_numpy()

y = 2*y -1

X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.3, random_state=seed,
    stratify=y,
)

X_test, y_test = test_df.drop(target, axis=1).to_numpy(), test_df[target].to_numpy()
y_test = 2*y_test - 1

print(X_train.shape, y_train.shape)
print(X_val.shape, y_val.shape)
print(X_test.shape, y_test.shape)

(421, 3) (421,)
(181, 3) (181,)
(2170, 3) (2170,)


In [5]:
print("Not manipulated and manipulated stocks count")
print("Train:", sum(y_train == -1), sum(y_train == 1))
print("Val:", sum(y_val == -1), sum(y_val == 1))
print("Test:", sum(y_test == -1), sum(y_test == 1))

Not manipulated and manipulated stocks count
Train: 210 211
Val: 91 90
Test: 2142 28


## Classical Model

We first fit the dat with a classical model, Logisitic Regression. 

In [6]:
clf = LogisticRegression().fit(X_train,y_train)

In [7]:
preds = clf.predict(X_train)
print(classification_report(y_train, preds))

              precision    recall  f1-score   support

          -1       1.00      0.94      0.97       210
           1       0.95      1.00      0.97       211

    accuracy                           0.97       421
   macro avg       0.97      0.97      0.97       421
weighted avg       0.97      0.97      0.97       421



In [8]:
preds = clf.predict(X_val)
print(classification_report(y_val, preds))

              precision    recall  f1-score   support

          -1       1.00      0.96      0.98        91
           1       0.96      1.00      0.98        90

    accuracy                           0.98       181
   macro avg       0.98      0.98      0.98       181
weighted avg       0.98      0.98      0.98       181



In [9]:
preds = clf.predict(X_test)
print(classification_report(y_test, preds))

              precision    recall  f1-score   support

          -1       1.00      0.99      1.00      2142
           1       0.64      1.00      0.78        28

    accuracy                           0.99      2170
   macro avg       0.82      1.00      0.89      2170
weighted avg       1.00      0.99      0.99      2170



### Inference

We can observe that in all the datasets, a recall score of `1.00` was achieved for the manipulated stocks.

## QSVC

We created a data re-uploading ansatz inspired by the paper [Data re-uploading for a universal quantum classifier](https://quantum-journal.org/papers/q-2020-02-06-226/).  In brief, every feature of the input vector is multiplied with weight and added with a bias i.e. $z_i = w_ix_i + b$. $z_i$ will be passed into the PennyLane's Rot Gate ($R_zR_yR_z$). We use the Rot gate which is $R_zR_yR_z$ repeatedly and for this, the input feature vector has to be a multiple of 3. This block can then be repeated on the single qubit or on more qubits (in this case we can have CZ entanglement).

This will be the $U(x)$ where $x$ is the input feature vector. The actual ansatz will be $U(x)$ $U(x)$† followed by measuring the probs.

We use a single layer of the circuit with 4 qubits.

In [10]:
def feature_map(x, params, n_layers, n_wires):
    """The embedding ansatz"""
    steps = x.shape[0]//3
    qubits = list(range(n_wires))
    
    # Start with the |+ > state
    for q in qubits:
        qml.Hadamard(wires=q)
    
    for l in range(n_layers):
        for q in qubits:
            for i in range(steps):
                
                # Rotation layer
                z = x[3*i:3*i+3]*params[l,q,0,3*i:3*i+3] + params[l,q,1,3*i:3*i+3]
                qml.Rot(z[0], z[1], z[2], wires=q)
        
        # Entanglement layer
        for i in range(n_wires - 1):
            qml.CZ((i, i + 1))

In [11]:
n_l = 1
n_w = 4
in_shape = 3

params_shape = (n_l, n_w, 2, in_shape)
params = pnp.random.uniform(0, 2 * np.pi, params_shape, requires_grad=True)

dev = qml.device("default.qubit.jax", wires=n_w)

Now, we proceed to define the quantum circuit that implements the kernel. To determine the overlap of the quantum states, we start by applying the embedding of the first data point and then the adjoint of the embedding of the second data point. Finally, we extract the probabilities associated with each basis state for observation.

The kernel function is derived by examining the probability of observing the all-zero state at the end of the kernel circuit.

In [12]:
@qml.qnode(dev, interface = 'jax')
def kernel_circuit(x1, x2, params):
    feature_map(x1, params, n_l, n_w)
    qml.adjoint(feature_map)(x2, params, n_l, n_w)
    return qml.probs(wires=range(n_w))

def kernel(x1, x2, params):
    return kernel_circuit(x1, x2, params)[0]

In [13]:
print(qml.draw(kernel_circuit)(X_train[0], X_train[1], params))

0: ──H──Rot(4.81,1.21,-0.37)─╭●───────────────────────────────────╭●────────────────────
1: ──H──Rot(4.61,0.33,4.99)──╰Z─╭●──────────╭●────────────────────╰(Z)†─────────────────
2: ──H──Rot(3.47,1.96,2.96)─────╰Z─╭●─╭●────╰(Z)†──────────────────Rot(11.99,1.61,3.17)†
3: ──H──Rot(2.08,1.90,1.17)────────╰Z─╰(Z)†──Rot(6.50,1.43,1.87)†──H†───────────────────

───Rot(8.64,-0.35,0.46)†──H†─┤ ╭Probs
───Rot(5.21,-1.09,5.67)†──H†─┤ ├Probs
───H†────────────────────────┤ ├Probs
─────────────────────────────┤ ╰Probs


In [14]:
jit_kernel = jax.jit(kernel)

## QSVC with Random Parameters

In [15]:
init_kernel = lambda x1, x2: jit_kernel(x1, x2, params)
kernel_matrix = lambda X1, X2: kernel_matrix_jax(X1, X2, init_kernel)
qsvc = SVC(kernel=kernel_matrix, random_state=seed)
qsvc.fit(X_train, y_train)

In [16]:
train_preds = qsvc.predict(X_train)
val_preds = qsvc.predict(X_val)
test_preds = qsvc.predict(X_test)

print("Train Recall Score (random parameters)")
print('-'*50)
print(classification_report(y_train, train_preds))

print()
print("Val Recall Score (random parameters)")
print('-'*50)
print(classification_report(y_val, val_preds))

print()
print("Test Recall Score (trained parameters)")
print('-'*50)
print(classification_report(y_test, test_preds))

Train Recall Score (random parameters)
--------------------------------------------------
              precision    recall  f1-score   support

          -1       0.96      0.96      0.96       210
           1       0.96      0.96      0.96       211

    accuracy                           0.96       421
   macro avg       0.96      0.96      0.96       421
weighted avg       0.96      0.96      0.96       421


Val Recall Score (random parameters)
--------------------------------------------------
              precision    recall  f1-score   support

          -1       0.97      0.91      0.94        91
           1       0.92      0.97      0.94        90

    accuracy                           0.94       181
   macro avg       0.94      0.94      0.94       181
weighted avg       0.94      0.94      0.94       181


Test Recall Score (trained parameters)
--------------------------------------------------
              precision    recall  f1-score   support

          -1       1.

In [17]:
params

tensor([[[[2.35330497, 5.97351416, 4.59925358],
          [3.76148219, 0.98029403, 0.98014248]],

         [[0.3649501 , 5.44234523, 3.77691701],
          [4.44895122, 0.12933619, 6.09412333]],

         [[5.23039137, 1.33416598, 1.14243996],
          [1.15236452, 1.91161039, 3.2971419 ]],

         [[2.71399059, 1.82984665, 3.84438512],
          [0.87646578, 1.83559896, 2.30191935]]]], requires_grad=True)

### Inference

We were able to obtain `0.96` test recall score on the manipulated stocks class with free parameters. In the next section, we try to improve this score by training the embedding kernel.

## QSVC with Trained Parameters

We use the [kernel-target alignment](https://link.springer.com/article/10.1007/s10462-012-9369-4) to train the embedding kernel.

In [18]:
opt = optax.adam(learning_rate=0.05)
opt_state = opt.init(params)

for i in range(500):
    # Choose subset of datapoints to compute the KTA on.
    subset = np.random.choice(list(range(len(X_train))), 4)
        
    # Define the cost function for optimization
    cost = lambda _params: -target_alignment_jax(
        X_train[subset],
        y_train[subset],
        lambda x1, x2: jit_kernel(x1, x2, _params),
        assume_normalized_kernel=True,
    )
        
    # Optimization step
    grads = jax.grad(cost)(params)
    updates, opt_state = opt.update(grads, opt_state)
    params = optax.apply_updates(params, updates)

    # Report the alignment on the full dataset every 50 steps.
    if (i + 1) % 50 == 0:
        current_alignment = target_alignment_jax(
            X_train,
            y_train,
            lambda x1, x2: jit_kernel(x1, x2, params),
            assume_normalized_kernel=True,
        )
        print(f"Step {i+1} - Alignment = {current_alignment:.3f}")

Step 50 - Alignment = 0.201
Step 100 - Alignment = 0.234
Step 150 - Alignment = 0.232
Step 200 - Alignment = 0.257
Step 250 - Alignment = 0.258
Step 300 - Alignment = 0.255
Step 350 - Alignment = 0.252
Step 400 - Alignment = 0.236
Step 450 - Alignment = 0.249
Step 500 - Alignment = 0.238


In [19]:
# First create a kernel with the trained parameter baked into it.
trained_kernel = lambda x1, x2: jit_kernel(x1, x2, params)

# Second create a kernel matrix function using the trained kernel.
trained_kernel_matrix = lambda X1, X2: kernel_matrix_jax(X1, X2, trained_kernel)

# Note that SVC expects the kernel argument to be a kernel matrix function.
qsvc_trained = SVC(probability=True, kernel=trained_kernel_matrix, random_state=seed)

qsvc_trained.fit(X_train, y_train)

In [20]:
train_preds = qsvc_trained.predict(X_train)
val_preds = qsvc_trained.predict(X_val)

print("Train Recall Score (trained parameters)")
print('-'*50)
print(classification_report(y_train, train_preds))

print()
print("Val Recall Score (trained parameters)")
print('-'*50)
print(classification_report(y_val, val_preds))

Train Recall Score (trained parameters)
--------------------------------------------------
              precision    recall  f1-score   support

          -1       0.99      0.95      0.97       210
           1       0.95      0.99      0.97       211

    accuracy                           0.97       421
   macro avg       0.97      0.97      0.97       421
weighted avg       0.97      0.97      0.97       421


Val Recall Score (trained parameters)
--------------------------------------------------
              precision    recall  f1-score   support

          -1       0.97      0.92      0.94        91
           1       0.93      0.97      0.95        90

    accuracy                           0.94       181
   macro avg       0.95      0.94      0.94       181
weighted avg       0.95      0.94      0.94       181



In [21]:
test_preds = qsvc_trained.predict(X_test)

print("Test Recall Score (trained parameters)")
print('-'*50)
print(classification_report(y_test, test_preds))

Test Recall Score (trained parameters)
--------------------------------------------------
              precision    recall  f1-score   support

          -1       1.00      0.89      0.94      2142
           1       0.10      0.96      0.18        28

    accuracy                           0.89      2170
   macro avg       0.55      0.93      0.56      2170
weighted avg       0.99      0.89      0.93      2170



In [22]:
params

Array([[[[ 2.97325131,  4.80120932,  6.69701353],
         [ 4.33892484,  1.65979749,  0.98014248]],

        [[-0.21688676,  3.8526415 ,  2.90745938],
         [ 5.72399049, -1.90861894,  6.09412333]],

        [[ 5.619755  ,  2.37567723,  1.62187717],
         [ 1.75155034,  1.59029223,  3.2971419 ]],

        [[ 3.23121033,  0.03304983,  3.80698805],
         [ 1.34658006,  2.00163159,  2.30191935]]]], dtype=float64)

### Inference

We were able to improve the score to `0.99` and `0.97` on the train and test dataset for manipulated stock prices.