## Introduction to Handwritten Digit Classification

*Copyright (c) 2022 Institute for Quantum Computing, Baidu Inc. All Rights Reserved.*

Computer Vision (CV) refers to enabling computers to obtain meaningful information from images, videos, or other visual inputs. It is a fundamental and important field in artificial intelligence. In CV, handwritten digit classification is a relatively basic task. It is trained and tested on the MNIST dataset \[1\] to verify whether the model has the basic ability of CV.

The MNIST dataset contains handwritten digits as shown in the figure below. MNIST contains a total of 10 categories from 0-9, and each digit is a grayscale image of 28\*28 pixels. There are 60,000 images in the training set and 10,000 images in the test set. Suppose we design a model that can be used for image classification, then we can test the classification ability of the model on the MNIST dataset.

![mnist-example](mnist_example.png)

## MNIST Classification Using VSQL Model

### Data Encoding

In the handwritten digit classification problem, the input is a picture of a handwritten digit and the output is the category corresponding to the picture (i.e., the digits 0-9). And since quantum computers deal with inputs that are quantum states, we need to encode the picture into a quantum state. Here, we first represent a picture using a two-dimensional matrix. This matrix is then expanded into a 1D vector, and the length of the vector is padded to an integer power of 2 by padding with zeros. The vector is then normalized to obtain a quantum state that can be processed by a quantum computer.


### Introduction to the VSQL Model

Variational shadow quantum learning (VSQL) is a hybrid quantum-classical algorithm under the framework of supervised learning. It uses the parameterized quantum circuit (PQC) and the classical shadow. Unlike the common variational quantum algorithm (VQA), VSQL only obtains local features from the subspace rather than from the whole Hilbert space where the quantum states are formed.

The schematic diagram of the VSQL model is as follows.

![vsql-model](vsql_model.png)

The input to the VSQL process is a quantum state. For the input quantum state, a local parameterized quantum circuit is iteratively applied and measured to obtain local shadow features. Then all the obtained shadow features are calculated using the classical neural network and the predicted labels are obtained.

### Workflow

Based on the above principles, we only need to train the VSQL model using the MNIST dataset to obtain a converged model. The model can be used to classify handwritten digits. The training process of the model is as follows.

![vsql-pipeline](vsql_pipeline_en.png)

## How to Use

### Predict Using the Model

Here, we have given a trained model that can be used directly for the prediction of 0 and 1 images. Just make the corresponding configuration in the `example.toml` configuration file and enter the command `python vsql_classification.py --config example.toml` to test the input images with the trained VSQL model.

### Online Demo

Here, we give a version of the online demo that can be tested online. First define the contents of the configuration file.

In [1]:
test_toml = r"""
# The overall configuration file of the model.
# Enter the current task, which can be 'train' or 'test', representing training and prediction respectively. Here we use test, indicating that we want to make a prediction.
task = 'test'
# The file path of the image to be predicted.
image_path = 'data_0.png'
# Whether the image path above is a folder or not. For folder paths, we will predict all image files inside the folder. This way you can test multiple images at once.
is_dir = false
# The file path of the trained model parameter file.
model_path = 'vsql.pdparams'
# The number of qubits that the quantum circuit contains.
num_qubits = 10
# The number of qubits that the shadow circuit contains.
num_shadow = 2
# Circuit depth.
depth = 1
# The class to be predicted by the model. Here, 0 and 1 are classified.
classes = [0, 1]
"""


Next is the code for the prediction section.

In [3]:
import os
import warnings

warnings.filterwarnings('ignore')
os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'

import toml
from paddle_quantum.qml.vsql import train, inference

config = toml.loads(test_toml)
task = config.pop('task')
if task == 'train':
    train(**config)
elif task == 'test':
    prediction, prob = inference(**config)
    if config['is_dir']:
        print(f"The prediction results of the input pictures are {str(prediction)[1:-1]} respectively.")
    else:
        prob = prob[0]
        msg = 'For the input image, the model has'
        for idx, item in enumerate(prob):
            if idx == len(prob) - 1:
                msg += 'and'
            label = config['classes'][idx]
            msg += f' {item:3.2%} confidence that it is {label:d}'
            msg += '.' if idx == len(prob) - 1 else ', '
        print(msg)
else:
    raise ValueError("Unknown task, it can be train or test.")

For the input image, the model has 89.22% confidence that it is 0, and 10.78% confidence that it is 1.


Here, we only need to modify the image path in the configuration file, and then run the entire code to quickly test other images.

## Note

The model we provide is a binary classification model that can only be used to distinguish handwritten digits 0 and 1. For other classification tasks, it needs to be retrained.

### Dataset Structure

If you want to use a custom dataset for training, you just need to prepare the dataset according to the rules. Prepare `train.txt` and `test.txt` in the dataset folder, and `dev.txt` if a validation set is needed. Use one line in each file to represent one piece of data. Each line contains the file path and label of the image, separated by tabs.

### Introduction to the Configuration File

In `test.toml`, there is a complete reference to the configuration files needed for testing. In `train.toml`, there is a complete reference to the configuration files needed for training. You can use the configuration file to quickly use the model to train and test.

## Citation

```tex
@inproceedings{li2021vsql,
  title={VSQL: Variational shadow quantum learning for classification},
  author={Li, Guangxi and Song, Zhixin and Wang, Xin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={9},
  pages={8357--8365},
  year={2021}
}
```

## Reference

\[1\] "THE MNIST DATABASE of handwritten digits". Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond.