# QBoost: Binary Classification with a Quantum Computer

The D-Wave quantum computer has been widely studied as a discrete optimization engine that accepts any problem formulated as quadratic unconstrained  binary  optimization  (QUBO). In 2008, Google and D-Wave published a paper, [Training a Binary Classifier with the Quantum Adiabatic Algorithm](https://arxiv.org/pdf/0811.0416.pdf), which describes how the `QBoost` ensemble method makes binary classification amenable to quantum computing: the problem is formulated as a thresholded linear superposition of a set of weak classifiers and the D-Wave quantum computer is  used to optimize the weights in a learning process that strives to minimize the training error and number of weak classifiers.

This notebook demonstrates and explains how the QBoost algorithm can be used to solve a binary classification problem. 

We have a set of data as shown below. We want to divide the data into two sets - can you see a clear delineation of the two sets based on the pattern of the data?

![Unclassified_Training_Set](images/DataSet_Unclassified.png)

Once we've figured out what our two sets are, the dividing line between them is a $\textbf{classifier}$. A classifier can help us to determine which set new data points might belong to.  For example, which set do you think each of the data points below belongs to?

![Unclassified_Test_Set](images/TrainingData_Unclassified.png)

There are many different algorithms available for building a classifier.  In this notebook we will explore both CPU-based algorithms (AdaBoost, Decision Trees, and Random Forest) and QPU-based algorithms (QBoost and QBoostPlus).

## Setting up our Data Set

To set up our initial data set, we will use the tools available in the  `sklearn` library.  We generate a set of training data (items with labels in $\{-1, 1\}$) and a set of test data (items with labels in $\{-1,1\}$).  We will use the training data to build each classifier, then use the test data to compare the classifier-produced labels with the provided known test labels.

In [None]:
from __future__ import print_function
%matplotlib inline

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

X1, Y1 = make_blobs(n_samples=15000, n_features=2, centers=2, random_state=1)
Y1 = 2*Y1-1

X_train = X1[0:9999]
y_train = Y1[0:9999]
X_test = X1[10000:14999]
y_test = Y1[10000:14999]

plt.title("Training Data", fontsize='small')
plt.scatter(X_train[:, 0], X_train[:, 1], marker='o')

plt.show()

plt.title("Test Data", fontsize='small')
plt.scatter(X_test[:, 0], X_test[:, 1], marker='x', color='r')
plt.show()

## Decision Trees

A decision tree uses a tree structure to classify the data.  It uses the non-leaf nodes to map the data to a set of  decision rules and leaf nodes to determine labels for each data item. For more information on decision trees as classifiers, check out [scikit-learn's page](http://scikit-learn.org/stable/modules/tree.html#tree-classification).

In [None]:
def Decision_Tree(X_train, y_train, X_test, y_test):
    from sklearn import tree

    clf1 = tree.DecisionTreeClassifier()
    clf1.fit(X_train, y_train)
    y_train1 = clf1.predict(X_train)
    y_test1 = clf1.predict(X_test)

    from sklearn.metrics import accuracy_score

    print('Accuracy for training data: \t', (accuracy_score(y_train, y_train1)))
    print('Accuracy for test data: \t', (accuracy_score(y_test, y_test1)))
    
    return clf1
    
clf1 = Decision_Tree(X_train, y_train, X_test, y_test)

The output above shows us the accuracy of the classifier that we built on our training data and on our test data.  Note that these are percentages, so a score of 1.00 would indicate that 100% of our data in the given set is labeled correctly using the classifier that we generated.

These great results are expected because of the nature of this simple data set - we would not expect to see results this good on real-world data!

## Random Forest

Random forest is an ensemble method, which combines several weaker classifiers to create one strong classifier. It typically uses a set of decision trees as weak classifiers that are individually weighted. By introducing randomness into the underlying decision trees, the ensemble diversifies the weightings of its collection of weak classifiers, generally resulting in an improved model. For more information random forests as classifiers, check out [scikit-learn's page.](http://scikit-learn.org/stable/modules/ensemble.html#forest)

In [None]:
def Random_Forest(X_train, y_train, X_test, y_test):
    from sklearn.ensemble import RandomForestClassifier

    clf2 = RandomForestClassifier(max_depth=2, n_estimators=30)
    clf2.fit(X_train, y_train)
    y_train2 = clf2.predict(X_train)
    y_test2 = clf2.predict(X_test)

    from sklearn.metrics import accuracy_score

    print('Accuracy for training data: \t', (accuracy_score(y_train, y_train2)))
    print('Accuracy for test data: \t', (accuracy_score(y_test, y_test2)))
    
    return clf2

clf2 = Random_Forest(X_train, y_train, X_test, y_test)

## AdaBoost

AdaBoost is an ensemble method in which a classifer is constructed in an iterative fashion. In each iteration, one weak classifier is selected and re-learned to minimize a weighted error function.  The final classification model will be decided by a weighted “vote” of all the weak classifiers.  The scikit-learn package implements its AdaBoost method with decision trees of depth 1, also known as tree stumps.  For more information on AdaBoost, check out [scikit-learn's page.](http://scikit-learn.org/stable/modules/ensemble.html#adaboost)

In [None]:
def AdaBoost(X_train, y_train, X_test, y_test):
    from sklearn.ensemble import AdaBoostClassifier

    clf3 = AdaBoostClassifier(n_estimators=30)
    clf3.fit(X_train, y_train)
    y_train3 = clf3.predict(X_train)
    y_test3 = clf3.predict(X_test)

    from sklearn.metrics import accuracy_score

    print('Accuracy for training data: \t', (accuracy_score(y_train, y_train3)))
    print('Accuracy for test data: \t', (accuracy_score(y_test, y_test3)))
    
    return clf3
    
clf3 = AdaBoost(X_train, y_train, X_test, y_test)

## QBoost

Like AdaBoost, QBoost is an ensemble method.  To make use of the optimization power of D-Wave quantum annealer, we need to formulate a quadratic unconstrained binary optimization (QUBO) objective function. To do this, we modify AdaBoost by replacing the traditional weighted error function with a QUBO.  Be sure to enter your D-Wave token into the `my_token` variable.

In [None]:
def QBoost(X_train, y_train, X_test, y_test):
    NUM_READS = 1000
    DW_PARAMS = {'num_reads': NUM_READS,
                 'auto_scale': True,
                 'num_spin_reversal_transforms': 10,
                 'postprocess': 'optimization',
                 }

    from dwave.system.samplers import DWaveSampler
    from dwave.system.composites import EmbeddingComposite

    dwave_sampler = DWaveSampler(solver={'qpu': True}) # Some accounts need to replace this line with the next:
    # dwave_sampler = DWaveSampler(token='ENTER TOKEN HERE', solver='ENTER SOLVER NAME HERE')
    emb_sampler = EmbeddingComposite(dwave_sampler)

    from qboost import WeakClassifiers, QBoostClassifier

    clf4 = QBoostClassifier(n_estimators=30, max_depth=2)
    clf4.fit(X_train, y_train, emb_sampler, lmd=1.0, **DW_PARAMS)
    y_train4 = clf4.predict(X_train)
    y_test4 = clf4.predict(X_test)

    from sklearn.metrics import accuracy_score

    print('Accuracy for training data: \t', (accuracy_score(y_train, y_train4)))
    print('Accuracy for test data: \t', (accuracy_score(y_test, y_test4)))
    
    return clf4
    
clf4 = QBoost(X_train, y_train, X_test, y_test)

## QBoostPlus

QBoostPlus uses all of our previous classifiers to generate a new classifier.  You must run all of the previous classifiers before running QBoostPlus.  Be sure to enter your D-Wave token into the `my_token` variable.

In [None]:
def QBoostPlus(X_train, y_train, X_test, y_test, clf1, clf2, clf3, clf4):
    NUM_READS = 1000
    DW_PARAMS = {'num_reads': NUM_READS,
                 'auto_scale': True,
                 'num_spin_reversal_transforms': 10,
                 'postprocess': 'optimization',
                 }

    from dwave.system.samplers import DWaveSampler
    from dwave.system.composites import EmbeddingComposite

    dwave_sampler = DWaveSampler(solver={'qpu': True}) # Some accounts need to replace this line with the next:
    # dwave_sampler = DWaveSampler(token='ENTER TOKEN HERE', solver='ENTER SOLVER NAME HERE')
    emb_sampler = EmbeddingComposite(dwave_sampler)
    
    from qboost import QboostPlus

    clf5 = QboostPlus([clf1, clf2, clf3, clf4])
    clf5.fit(X_train, y_train, emb_sampler, lmd=0.2, **DW_PARAMS)
    y_train5 = clf5.predict(X_train)
    y_test5 = clf5.predict(X_test)

    from sklearn.metrics import accuracy_score

    print('Accuracy for training data: \t', (accuracy_score(y_train, y_train5)))
    print('Accuracy for test data: \t', (accuracy_score(y_test, y_test5)))
    
    return clf5
    
clf5 = QBoostPlus(X_train, y_train, X_test, y_test, clf1, clf2, clf3, clf4)

# Experiments
Now we're ready to run some experiments on real data.


## Experiment 1: Binary Classfication on the MNIST Dataset 
This example transforms the MNIST dataset (handwritten digits) into a binary classification problem. We assume all digits that are smaller than 5 are labelled as -1 and the rest digits are labelled as +1.

First, let us load the MNIST dataset:

In [None]:
import numpy as np
from sklearn.datasets.mldata import fetch_mldata

# Loading the data set
mnist = fetch_mldata('MNIST original', data_home='data')

# Gathering the indices for the data labelled with numbers <= 9
idx_01 = np.where(mnist.target <= 9)[0]

# Shuffling the data for a random selection for training and test data
np.random.shuffle(idx_01)

# Selecting 15,000 items for our total data set
idx_01 = idx_01[:15000]

# Using 2/3 of our data set for training, 1/3 for testing
idx_train = idx_01[:2*len(idx_01)//3]
idx_test = idx_01[2*len(idx_01)//3:]

# Setting up the data points for training and testing
X_train = mnist.data[idx_train]
X_test = mnist.data[idx_test]

# Setting up the labels for training and testing.  Labels should be -1, +1 for QBoost and QBoostPlus.
y_train = 2*(mnist.target[idx_train] >4) - 1
y_test = 2*(mnist.target[idx_test] >4) - 1

print("Training data size: \t%d samples with %d features" %(X_train.shape[0], X_train.shape[1]))
print("Testing data size: \t%d samples" %(X_test.shape[0]))

Let us visualize the digits: digits with class $+1$ are shown as images with a black background while digits with class $-1$  as images with a white background.

In [None]:
import matplotlib.pyplot as plt
for i in range(16):
    if y_train[i] == 1:
        COLORMAP = 'gray'
    else:
        COLORMAP = 'gray_r'
    plt.subplot(4,4, i+1)
    plt.imshow(X_train[i].reshape(28,28), cmap=COLORMAP)
    plt.axis('off')

Now train the model and compare the results of the selected classifiers.

In [None]:
print('=======================================')
# Decision Tree
print('Decision Tree: ')
clf1 = Decision_Tree(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# Random Forest
print('Random Forest: ')
clf2 = Random_Forest(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# AdaBoost
print('AdaBoost: ')
clf3 = AdaBoost(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# QBoost
print('QBoost: ')
clf4 = QBoost(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# QBoostPlus
print('QBoostPlus: ')
clf5 = QBoostPlus(X_train, y_train, X_test, y_test, clf1, clf2, clf3, clf4)
print('=======================================')

## Experiment 2: Wisconsin Breast Cancer

This example classifies tumors in scikit-learn's Wisconsin breast cancer dataset as either malignant or benign (binary classification).

First, let us load the dataset.

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer

# Loading the data set
wisc = load_breast_cancer()

# Shuffling the data for a random selection for training and test data
idx = np.arange(len(wisc.target))
np.random.shuffle(idx)

# Using 2/3 of our data set for training, 1/3 for testing
idx_train = idx[:2*len(idx)//3]
idx_test = idx[2*len(idx)//3:]

# Setting up the data points for training and testing
X_train = wisc.data[idx_train]
X_test = wisc.data[idx_test]

# Setting up the labels for training and testing.  Labels should be -1, +1 for QBoost and QBoostPlus.
y_train = 2 * wisc.target[idx_train] - 1  
y_test = 2 * wisc.target[idx_test] - 1

print("Training data size: \t%d samples with %d features" %(X_train.shape[0], X_train.shape[1]))
print("Testing data size: \t%d samples" %(X_test.shape[0]))

Now train the model and compare the results of the selected classifiers.

In [None]:
print('=======================================')
# Decision Tree
print('Decision Tree: ')
clf1 = Decision_Tree(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# Random Forest
print('Random Forest: ')
clf2 = Random_Forest(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# AdaBoost
print('AdaBoost: ')
clf3 = AdaBoost(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# QBoost
print('QBoost: ')
clf4 = QBoost(X_train, y_train, X_test, y_test)
print('---------------------------------------')
# QBoostPlus
print('QBoostPlus: ')
clf5 = QBoostPlus(X_train, y_train, X_test, y_test, clf1, clf2, clf3, clf4)
print('=======================================')

## Experiment 3:  Try it Yourself

In the block below, follow the prompts to import a dataset from scikit-learn and try building the different classifiers on the data set.

We will use scikit-learn's wine data set.  This data set is divided into classes 0, 1, and 2.  We will work to classify the sets {class 0, class 1} and {class 2}.  Use the examples in Experiment 1 and Experiment 2 to fill in this code outline and classify the data.

In [None]:
# import numpy to work with numpy arrays


# Import scikit-learn's wine data set library.


# Load the scikit-learn's wine data set.


# Gather the indices of the data we want to use from the dataset (there is a lot more than just points and labels!).


# Shuffle the data for a random selection for training and test data.


# Divide the data into 2/3 for training, 1/3 for testing


# Set up the data points for training and testing


# Set up the labels for training and testing.  Labels should be -1, +1 for QBoost and QBoostPlus.  
# Remember we need classes 0 and 1 to map to set -1 and class 2 to map to set +1.


# Run the different classifiers we have set up in this notebook and compare their performance on this data set.

# Decision Tree


# Random Forest


# AdaBoost


# QBoost


# QBoostPlus


In [None]:
# import numpy to work with numpy arrays
import numpy as np

# Import scikit-learn's wine data set library.
from sklearn.datasets import load_wine

# Load the scikit-learn's wine data set.
wine = load_wine()

# Gather the indices of the data we want to use from the dataset (there is a lot more than just points and labels!).
indices = np.where(wine.target <= 2)[0]

# Shuffle the data for a random selection for training and test data.
np.random.shuffle(indices)

# Divide the data into 2/3 for training, 1/3 for testing
indices_train = indices[:2*len(indices)//3]
indices_test = indices[2*len(indices)//3:]

# Set up the data points for training and testing
X_train = wine.data[indices_train]
X_test = wine.data[indices_test]

# Set up the labels for training and testing.  Labels should be -1, +1 for QBoost and QBoostPlus.  
# Remember we need classes 0 and 1 to map to set -1 and class 2 to map to set +1.
y_train = 2*(wine.target[indices_train] > 1) - 1
y_test = 2*(wine.target[indices_test] > 1) - 1

print("Training data size: \t%d samples with %d features" %(X_train.shape[0], X_train.shape[1]))
print("Testing data size: \t%d samples" %(X_test.shape[0]))

# Run the different classifiers we have set up in this notebook and compare their performance on this data set.
print('=======================================')
# Decision Tree
print('Decision Tree: ')
clf1 = Decision_Tree(X_train, y_train, X_test, y_test) 
print('---------------------------------------')
# Random Forest
print('Random Forest: ')
clf2 = Random_Forest(X_train, y_train, X_test, y_test) 
print('---------------------------------------')
# AdaBoost
print('AdaBoost: ')
clf3 = AdaBoost(X_train, y_train, X_test, y_test) 
print('---------------------------------------')
# QBoost
print('QBoost: ')
clf4 = QBoost(X_train, y_train, X_test, y_test) 
print('---------------------------------------')
# QBoostPlus
print('QBoostPlus: ')
clf5 = QBoostPlus(X_train, y_train, X_test, y_test, clf1, clf2, clf3, clf4) 
print('=======================================')

# A Few More Words on Ensemble Methods

Ensemble methods build a strong classifier (an improved model) by combining weak classifiers with the goal of:

* decreasing variance (bagging)
* decreasing bias (boosting)
* improving prediction (voting)

![Boosting Algorithm](images/boosting.jpg)

### Bagging, Boosting, and Voting

The ensemble method produces new training data sets by random sampling with replacement from the original set. In _bagging_, any element has the same probability to appear in a new dataset; in _boosting_, data elements are weighted before they are collected in the new dataset. Another distinction is that bagging is parallelizable but boosting has to be executed sequentially. You can learn more about the differences between these methods here: https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/.

Voting operates on labels only. Unlike boosting, the aggeragated classification performance is not used to further polish each weak classifier. Voting has two typical requirements of its collection of  weak classifiers: that there be __many__ and that they be __diverse__.  

# Under the Hood:  Comparing AdaBoost and QBoost

### AdaBoost
AdaBoost combines a number of $N$ weak classifiers into a strong one as
$$C(x) = sign\left(\sum_i^N w_i c_i(x)\right),$$
with $c_i(x) \in [-1, +1]$ being the $i$-th weak classifier:

$$c_i(x) = sign(w'*x + b)$$

The loss function of AdaBoost is defined as
$$
L = \sum_{n=1}^N \exp\left\{ - y_n \sum_{s=1}^S w_sc_k(x_n)\right\}.
$$

The strong classifier $C(\cdot)$ is constructed in an iterative fashion. In each iteration, one weak classifier
is selected and re-learned to minimize the weighted error function. Its weight is adjusted and renormalized to make sure the sum of all weights equals 1. 

The final classification model will be decided by a weighted “vote” of all the weak classifiers.

### QBoost
To create QBoost, we replace the exponential loss function in AdaBoost with the following quadratic loss function.
$$
w* = \arg\min_w\left(\sum_s \left(\frac{1}{N}\sum_n^N w_nc_n(x_s) - y_s\right)^2\right) + \lambda ||w||_0,
$$
where the regularization term is added to enable controlling of weight sparsity.

Note in QBoost, the weight vector is binary.

# Research Using QBoost
For more information on how QBoost is appearing in published research, check out the following references.

Boyda, Edward, et al. "Deploying a quantum annealing processor to detect tree cover in aerial imagery of California." PloS one 12.2 (2017): e0172505. 

Denchev, Vasil S., et al. "Robust classification with adiabatic quantum optimization." Proceedings of the 29th International Coference on International Conference on Machine Learning. Omnipress, 2012.

Li, Richard Y., et al. "Quantum annealing versus classical machine learning applied to a simplified computational biology problem." NPJ quantum information 4.1 (2018): 14.  

Mott, Alex, et al. "Solving a Higgs optimization problem with quantum annealing for machine learning." Nature 550.7676 (2017): 375.

Neven, H., et al. "Binary classification using hardware implementation of quantum annealing Demonstrations at NIPS-09." 24th Annual Conf. on Neural Information Processing Systems. 2009.

# Loading your Own Dataset

If you're interesting in trying out these classifiers on your own data set, check out [scikit-learn's page](http://scikit-learn.org/stable/datasets/index.html#external-datasets) on loading external datasets.