In [2]:
# Imports, as always...
from os import listdir
import pandas as pd

# Fingerprint Classification

This notebook aims to replicate and advance the approaches to classification in [Martina et al. (2021)](https://arxiv.org/pdf/2109.11405), in which an SVM is used to binarily classify whether a given classical measurement was or was not produced by a given quantum circuit.

This is a simple task which, when taken together with their *very* small circuit, yields for near perfect accuracy. Here, we will complicate things to push the capability of the models to investigate more thoroughly what may or may not be possible with regard to classifying the membership of a quantum state to a quantum device by its "noise fingerprint". 

The ideas fitting into the work of this notebook are as follows:
- *Multi-class classification*. Using the data produced by Martina et al. (2021), can we present a multi-class prediction model that is not given any bias towards any particular model -- given a measurement of a quantum state, which device produced it?
- *Larger/deeper circuits*. How does the performance degrade as the number of qubits increases, or as the circuit depth increases?
- *Noise severity analysis*. Under which severities/forms of noise is performance best? Ideally, we can produce a visualisation of performance (e.g. accuracy) vs. noise intensity/severity. We might expect poor performance with little/no noise (not enough distinguishing information between membership classes), good performance with moderate noise, then poor performance again with large amounts of noise (too much randomness).

For clarity, 'membership to a quantum device' in this context refers to 'being produced by that device'.

## Martina et al. (2021)'s (FAST) Dataset

In [7]:
# List all files in the "walker" directory.
filelist = listdir('./martina-data/walker')

# List of machines.
machines = ['ibmq_athens', 'ibmq_bogota', 'ibmq_casablanca', 'ibmq_lima', 'ibmq_quito', 'ibmq_santiago', 'ibmq_5_yorktown']

# How many files does each machine have.
counts = {machine : 0 for machine in machines}
for file in filelist:
    for machine in machines:
        if machine in file:
            counts[machine] += 1

display(counts)

{'ibmq_athens': 750,
 'ibmq_bogota': 250,
 'ibmq_casablanca': 500,
 'ibmq_lima': 250,
 'ibmq_quito': 250,
 'ibmq_santiago': 250,
 'ibmq_5_yorktown': 250}

## Multi-class Classification

Given the measurement of a quantum state, what is the probability distribution over the set of devices (for the likelihood of membership), and subsequently which device is most likely to have produced the state?