# **Stacking with VQC as base classifier and LGBM as meta classifier**
In this tutorial, we will see the classical stacking of VQC and LGBM algorithms, with VQC as base classifier and LGBM as meta classifier. The stacking algorithm works as below\
1) The base classifiers are trained with training data\
2) The trained base classifiers are used to test both training data and testing data\
3) The output labels from base classifiers on training and testing data are appended as features to original training and testing data\
4) Now we train the meta classifier with appended training data and test it on appended testing data to get final prediction values

Below is detailed explanation and implementation of each of above steps

We first need to install qiskit and qiskit_machine_learning modules if not already installed. We start by importing all necessary libraries from qiskit,  qiskit\_machine\_learning and some related modules as shown in below cells.

In [None]:
from qiskit import BasicAer
from qiskit.utils import QuantumInstance, algorithm_globals
from qiskit.algorithms.optimizers import COBYLA
from qiskit.circuit.library import TwoLocal
from qiskit_machine_learning.algorithms import VQC
from qiskit_machine_learning.circuit.library import RawFeatureVector
from qiskit.circuit.library import RealAmplitudes, ZZFeatureMap, ZFeatureMap, NLocal
from qiskit import QuantumCircuit
from qiskit.circuit import Parameter

In [None]:
import sys
print(sys.path)
sys.path.append("")

['/content', '/env/python', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.7/dist-packages/IPython/extensions', '/root/.ipython']


In [None]:
import numpy as np
from sklearn import svm, metrics
from sklearn.metrics import classification_report, confusion_matrix
from time import time
import lightgbm as lgb

###Data set
We use ethereum networks data to perform stacking. We consider 7 features for data set\
1) Degree - In degree, Out degree, Total degree\
2) Strength - In strength, Out strength, Total strength\
3) Number of neighbours

The files 'train_data.npy' and 'train_labels.npy' have 960 training data points and corresponding labels respectively, first 160 training data points are of phishing nodes and next 800 are of non-phishing nodes. The files 'test_data.npy' and 'test_labels.npy' have 11000 testing data points and corresponding labels respectively, first 1000 testing data points are of phishing nodes and next 10000 are of non-phishing nodes.


let us choose 320 training data points to perform stacking, 160 phishing and 160 non-phishing. Even though the real life data is highly imbalanced, we choose equal number of phishing and non-phishing nodes for training our classifiers and then test on imbalanced testing data. With different experiments on this data set, taking balanced training data proved more effective in giving good results

In [None]:
train_data = np.load('train_data.npy')
train_labels = np.load('train_labels.npy')
test_data = np.load('test_data.npy')
test_labels = np.load('test_labels.npy')
train_data = train_data[:320]
train_labels = train_labels[:320]

# Classical SVM training and testing
We perform classical support vector machine classification to compare our final results

In [None]:
#training classical SVM with rbf kernel

print("*** Training a classical SVM classifier with rbf Kernel ***")

#converting two dimensional labels to 1D
train_labels_svm = train_labels[:,0]
test_labels_svm = test_labels[:,0]

clf = svm.SVC(kernel='rbf')
start_time = time()
clf.fit(train_data, train_labels_svm)
end_time = time()
duration = end_time - start_time
print("training time for classical SVM : ", duration)
y_pred=clf.predict(test_data)
print(confusion_matrix(test_labels_svm, y_pred))
print(classification_report(test_labels_svm, y_pred))

*** Training a classical SVM classifier with rbf Kernel ***
training time for classical SVM :  0.006948947906494141
[[9676  324]
 [ 559  441]]
              precision    recall  f1-score   support

           0       0.95      0.97      0.96     10000
           1       0.58      0.44      0.50      1000

    accuracy                           0.92     11000
   macro avg       0.76      0.70      0.73     11000
weighted avg       0.91      0.92      0.91     11000



# **Stacking**
###Initializing the base and meta classifiers
Initialize the base classifier VQC. ZZ feature map is used to encode classical data into quantum for VQC. it can be described using below equations\
$ZZ feature map = U_{\phi(x)}H^{\otimes m}$\
where $U_{\phi(x)} = exp\left(j\sum_{k\in [m]}\phi_k(x_i)\prod_{l\in k}Z_l\right)$, 
$[m]=\{1,\dots,m,(1,2),(1,3),\dots,(m-1,m)\}$,\
where $\phi_p(x_i) = x_i^{(p)}$ and $\phi_{(p,q)}(x_i)=(\pi-x_i^{(p)})(\pi-x_i^{(q)})$\
$Z=pauli-Z$ $gate$, $j = imaginary$ $unit$

We use TwoLocal ansatz(parametrized quantum circuit with trainable parameters) for this classifier. Two local ansatz is a parameterized circuit consisting of alternating rotation layers and entanglement layers. The rotation layers are single qubit gates applied on all qubits. The entanglement layer uses two-qubit gates to entangle the qubits according to a strategy set using entanglement. For more information about two local ansatz refer [Two local ansatz reference](https://qiskit.org/documentation/stubs/qiskit.circuit.library.TwoLocal.html#:~:text=The%20two-local%20circuit%20is%20a%20parameterized%20circuit%20consisting,rotation%20and%20entanglement%20gates%20can%20be%20specified%20as)

Multiple ansatzes can be used in one VQC, however after experimentation with different number of repetitions of ansatzes, we identified that for this data set shallow circuit is more suitable, thus we are using only two repetitions of ansatz in this tutorial and not going beyond it. For reference we can look at figure 10 in below paper

VQC is trained with maxiter 100 and COBYLA is chosen as our optimizer to update its parameters. 

Statevector simulator is used to simulate the results of quantum computer, it can be replaced with backend for harware results

Initialize LGBM meta classifier

In [None]:
seed = 1376

#feature dimensions
feature_dim = train_data.shape[1]

#feature map of VQC
feature_map = ZZFeatureMap(feature_dim)

#ansatz we use in VQC
ansatz = TwoLocal(feature_dim, ['ry', 'rz'], 'cz', reps = 2)

#initialize VQC
vqc = VQC(feature_map=feature_map,
                 ansatz=ansatz,
                 optimizer=COBYLA(maxiter=100),
                 quantum_instance=QuantumInstance(BasicAer.get_backend('statevector_simulator'),
                                                 seed_simulator=seed,
                                                 seed_transpiler=seed)
                 )

#initialize LGBM classifier
clf = lgb.LGBMClassifier()

###Training the base classifier
In the following cells we train the base classifier VQC using 320 training data points, then we predict the labels of both training data and testing data using the trained VQC, the labels we obtained are added as features to initial training and testing data sets

In [None]:
#train base classifiers and append features to data

def level_0():
    
    #Use VQC and train data, append the predicted labels on train data to train data features 
    vqc.fit(train_data, train_labels)
    a = vqc.predict(train_data)
    label_1 = np.delete(a,1,1)
    train_added = np.append(train_data,label_1,1)
    
    #append the predicted labels on test data to test data features
    d = vqc.predict(test_data)
    label_4 = np.delete(d,1,1)
    test_added = np.append(test_data,label_4,1)
    
    return train_added,test_added

In [None]:
#get appended train and test data

start_time = time()
train_added, test_added = level_0()
end_time = time()
duration = end_time - start_time
print("training time : ", duration)

training time :  1414.9394099712372


###Training the meta classifier
The LGBM classifier is now trained with training data of 320 data points with 8 features(one additional feature from base classifier VQC). We already have 7 features from original data set, the 8th feature is the prediction label of that particular training data point using the base classifier VQC. It is appended to training data to get 8 features

In [None]:
#use meta classifier LGBM on appended train data

start_time = time()
print("training_data", train_data.shape)
clf.fit(train_added, train_labels_svm)
end_time = time()
duration = end_time - start_time
print("training time : ", duration)

training_data (320, 7)
training time :  0.09905099868774414


###Get final prediction
We test the appended data set with 8 features (the 8th feature is appended to original 7 features, 8th feature is the prediction label of that particular test data point using trained base classifier) using the trained meta classifier (LGBM) and get our final prediction results

In [None]:
# predict on appended test data

start_time1 = time()
y_pred_1 = clf.predict(test_added)
end_time1 = time()
duration1 = end_time1 - start_time1
print("testing time : ", duration1)
print(classification_report(test_labels_svm,y_pred_1))
print(confusion_matrix(test_labels_svm,y_pred_1))

testing time :  0.016481876373291016
              precision    recall  f1-score   support

           0       1.00      0.92      0.96     10000
           1       0.55      0.97      0.70      1000

    accuracy                           0.92     11000
   macro avg       0.77      0.94      0.83     11000
weighted avg       0.96      0.92      0.93     11000

[[9201  799]
 [  32  968]]
