<img src="https://s3-ap-southeast-1.amazonaws.com/he-public-data/wordmark_black65ee464.png" width="700">

# Week 2 : Final Challenge

**Welcome to the final challenge!**  

In the previous notebook we've seen how we can use the VQC class in Aqua to classify the digits `0` and `1`. However, classifying `0` and `1` is relatively simple as digits `0` and `1` are easily distinguishable. `4` and `9` however, are notoriously similar, with a _loop_ on the top and a _line_ on the bottom. This can be corroborated looking at our 2-D t-SNE plot from the previous notebook (Fig.2), we see that `0` and `1` are clustered relatively far from each other making them easily distinguishable, however `4` and `9` are overlapping. In this challenge we are providing you with a dataset with digits reduced to **dimension 3**. For example, in Fig.1 we can see the dimension reduction of the 784 dimension vector for digit `4` into a dimension 3 feature vector. 

**Fig.1 : Features of the digit `4` after reducing dimension to 3:** 
<img src="https://s3-ap-southeast-1.amazonaws.com/he-public-data/four2a7701f.png" width="700">

**Fig.2 : MNIST dataset after dimension reduction to 2 as given in the previous notebook:**
<img src="https://s3-ap-southeast-1.amazonaws.com/he-public-data/mnist_plot53adb39.png" width="400">

## Challenge Question   
Use the VQC method from Aqua to classify the digits `4` and `9` as given in the dataset **challenge_dataset_4_9_dim3_training.csv** provided to you. 

## Rules and Guidelines

* Your `QuantumCircuit` can have a **maximum of 6 qubits**.
* **Cost of the circuit should be less than 2000**.  
* You should not change names of the functions `feature_map()` , `variational_circuit()`  and `return_optimal_params()`.
* All the functions must return the value types as mentioned. 
* All circuits must be Qiskit generated.
* Best of all submissions is considered for grading.

## Judging criteria 

* Primary judgement is based on the **accuracy of the model**, higher the better. **Accuracies which differ by less than 0.005 will be considered to be equal**. ex: Accuracies 0.7783 and 0.7741 will be considered to be equal.
* If the accuracies are tied, the tie will be broken using **cost of the circuit** as the metric, lower the better. 
* In the case that both accuracy of the model and cost of the circuit are equal, **time of submission** is taken into account, Earlier the better. 

_**Important Note:**_ The **leaderboard shown during the progress of the competition** will only display accuracy of the model and is **not the final leaderboard**. Breaking ties between accuracy of the model by considering lower **cost of circuit** will only be done after the competition ends. **The final leaderboard will be announced post the event** which will take into consideration cost of the circuit and time of submission. 

## Certificate Eligibility

Everyone who scores an **accuracy greater than 0.70 (i.e, 70%) will be eligible for a certificate**. 


An explanation on how to calculate the accuracy of the model and the cost of the circuit is given in the end inside the `grade()` function. Before you submit, make sure the grading function is running on your device. To save time you can also use the grading function provided to calculate the accuracy and circuit cost without having to submit your solution onto HackerEarth. Remember, your final score will be determined using the same grading methods as given in this notebook, but will be evaluated on unseen datapoints.

In [None]:
# installing a few dependencies
!pip install --upgrade seaborn==0.10.1
!pip install --upgrade scikit-learn==0.23.1
!pip install --upgrade matplotlib==3.2.0
!pip install --upgrade pandas==1.0.4
!pip install --upgrade qiskit==0.19.6 
!pip install --upgrade plotly==4.9.0

# the output will be cleared after installation
from IPython.display import clear_output
clear_output()

In [None]:
# we have imported a few libraries we thing might be useful 
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
from qiskit import Aer

from qiskit import *
import numpy as np
from qiskit.visualization import plot_bloch_multivector, plot_histogram
%matplotlib inline
import matplotlib.pyplot as plt

import time
from qiskit.circuit.library import ZZFeatureMap, ZFeatureMap, PauliFeatureMap, RealAmplitudes, EfficientSU2
from qiskit.aqua.utils import split_dataset_to_data_and_labels, map_label_to_class_name
from qiskit.aqua import QuantumInstance
from qiskit.aqua.algorithms import VQC
from qiskit.aqua.components.optimizers import COBYLA


# The the write_and_run() magic function creates a file with the content inside the cell that it is run. 
# You have used this in previous exercises for creating your submission files. 
# It will be used for the same purpose here.

from IPython.core.magic import register_cell_magic
@register_cell_magic
def write_and_run(line, cell):
    argz = line.split()
    file = argz[-1]
    mode = 'w'
    with open(file, mode) as f:
        f.write(cell)
    get_ipython().run_cell(cell)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Solution

## Data loading 

This notebook has helper functions and code snippets to save your time and help you concentrate on what's important: Increasing the accuracy of your model. Running the cell below will import the challenge dataset and will be available to you as `data`. Before running the cell below store the dataset in this file structure (or change the `data_path` accordingly):  

- `challenge_notebook.ipynb`
- `dataset`
    - `challenge_dataset_4_9.csv`


In [None]:
data_path="/content/drive/My Drive/QML-49-Challenge/"
data = np.loadtxt(data_path + "challenge_dataset_4_9_dim3_training.csv", delimiter=",")
np.random.shuffle(data)
# extracting the first column which contains the labels
data_labels = data[:, :1].reshape(data.shape[0],)
# extracting all the columns but the first which are our features
data_features = data[:, 1:]
print(len(data_features[1,:]))

3


## Visualizing the dataset

Before we dive into solving the question it is always beneficial to look at the dataset pictographically. This will help us understand patterns which we could leverage when designing our feature maps and variational circuits for example.

In [None]:
import plotly.express as px
import pandas as pd

# creating a dataframe using pandas only for the purpose fo plotting
df = pd.DataFrame({'Component 0':data_features[:,0], 'Component 1':data_features[:,1], 
                   'Component 2':data_features[:,2], 'label':data_labels})

fig = px.scatter_3d(df, x='Component 0', y='Component 1', z='Component 2', color='label')
fig.show()

## Extracting the training dataset

The given dataset has already been reduced in dimension and normalized, so, further pre-processing isn't techincally required. You can do so if you want to, but the testing dataset will be of the same dimension and normalisation as the training dataset provided. Training a dataset of size 6,000 will take multiple hours so you'll need to extract a subset of the dataset to use as a training dataset. The accuracy of the model may vary based on the datapoints and size of the training dataset you choose. Thus, experimenting with various sizes and datapoints will be necessary. For example, Increasing the training dataset size may increase the accuracy of the model however it will increase the training time as well.

Use the space below to extract your training dataset from `data`. For your convenience `data` has been segregated into `data_labels` and `data_features`.

* `data_labels` : 6,000 $\times$ 1 column vector with each entry either `4` or `9` 
* `data_features` : 6,000 $\times$ 3 matrix with each row having the feature corresponding to the label in `data_labels`

**Note:** This process was done in the previous [VQC notebook](https://github.com/Qiskit-Challenge-India/2020/blob/master/Day%206%2C%207%2C8/VQC_notebook.ipynb) with `0` and `1` labels and can be modified and used here as well. 

In [None]:
### WRITE YOUR CODE BETWEEN THESE LINES - START

# do your classical pre-processing here

# store your training and testing datasets to be input in the VQC optimizer in the "training_input" and 
# "testing_input" variables respectively. These variables will eb accessed whiile creating a VQC instance later. 
from sklearn.model_selection import KFold

#X = ["a", "b", "c", "d"]

spl = 8
kf = KFold(n_splits=spl)
training_input_glob = {}
test_input_glob = {}
i = 0
for train, test in kf.split(data_labels):
    dummy_train = {}
    dummy_test = {}
    
    f_train = []
    n_train = []
    f_test = []
    n_test = []

    for k in train:
      if(data_labels[k]==4):
        f_train.append(data_features[k])
      else:
        n_train.append(data_features[k])

    for k in test:
      if(data_labels[k]==4):
        f_test.append(data_features[k])
      else:
        n_test.append(data_features[k])
    dummy_train = {'A':f_train, 'B':n_train}
    dummy_test = {'A':f_test, 'B':n_test}
    training_input_glob[i] = dummy_train
    test_input_glob[i] = dummy_test
    i+=1

print(len(training_input_glob))
class_label = {'A':4,'B':9}

print(len(training_input_glob[0]['A']))
#print(len(test_input_glob[0]))

8
2628


In [None]:
'''
zero_datapoints_normalized = []
one_datapoints_normalized = []

j=0
for t in data_labels:
  if(t==4):
    zero_datapoints_normalized.append(data_features[j])
  else:
    one_datapoints_normalized.append(data_features[j])
  j+=1

print(j)
'''

In [None]:
'''
train_size = 4000
test_size = 500
dp_size_zero = 500
dp_size_one = 500

zero_train = zero_datapoints_normalized[:train_size]
one_train = one_datapoints_normalized[:train_size]

zero_test = zero_datapoints_normalized[train_size + 1:train_size + test_size + 1]
one_test = one_datapoints_normalized[train_size + 1:train_size + test_size + 1]

training_input = {'A':zero_train, 'B':one_train}
test_input = {'A':zero_test, 'B':one_test}

# datapoints is our validation set
datapoints = []
dp_zero = zero_datapoints_normalized[train_size + test_size + 2:train_size + test_size + 2 + dp_size_zero]
dp_one = one_datapoints_normalized[train_size + test_size + 2:train_size + test_size + 2 + dp_size_one]
datapoints.append(np.concatenate((dp_zero, dp_one)))
dp_y = np.array([4, 4, 4, 4, 4, 9, 9, 9, 9, 9])
datapoints.append(dp_y)

class_to_label = {'A': 4, 'B': 9}
#print(datapoints[0])
'''

## Building a Quantum Feature Map

Given below is the `feature_map()` function. It takes no input and has to return a feature map which is either a `FeatureMap` or `QuantumCircuit` object. In the previous notebook you've learnt how feature maps work and the process of using existing feature maps in Qiskit or creating your own. In the space given **inside the function** you have to create a feature map and return it.   


**IMPORTANT:** 
* If you require Qiskit import statements other than the ones provided in the cell below, please include them inside the appropriate space provided. **All additional import statements must be Qiskit imports.** 
* the first line of the cell below must be `%%write_and_run feature_map.py`. This function stores the content of the cell below in the file `feature_map.py`

In [None]:
%%write_and_run feature_map.py
# the write_and_run function writes the content in this cell into the file "feature_map.py"

### WRITE YOUR CODE BETWEEN THESE LINES - START
    
# import libraries that are used in the function below.
from qiskit import QuantumCircuit
from qiskit.circuit import ParameterVector
from qiskit.circuit.library import ZZFeatureMap, ZFeatureMap, PauliFeatureMap
    
### WRITE YOUR CODE BETWEEN THESE LINES - END

def feature_map(): 
    # BUILD FEATURE MAP HERE - START
    feature_dim = 3
    r=5
    feature_map = PauliFeatureMap(feature_dimension=feature_dim, reps=r, paulis =['Z', 'ZZ'] )
    #feature_map = ZZFeatureMap(feature_dimension=feature_dim, reps=r, entanglement='linear')
    # build the feature map
    # BUILD FEATURE MAP HERE - END
    
    #return the feature map which is either a FeatureMap or QuantumCircuit object
    return feature_map
  

## Building a Variational Circuit

Given below is the `variational_circuit()` function. It takes no input and has to return a variational circuit which is either a `VariationalForm` or `QuantumCircuit` object. In the previous notebook you've learnt how variational circuits work and the process of using existing variational circuit in Qiskit or creating your own. You have to create a variational circuit in the space given **inside the function** and return it. You can find various variational circuits in the [Qiskit Circuit Library](https://qiskit.org/documentation/apidoc/circuit_library.html) under N-local circuits.

**IMPORTANT:** 
* If you require Qiskit import statements other than the ones provided in the cell below, please include them inside the appropriate space provided. **All additional import statements must be Qiskit imports.** 
* the first line of the cell below must be `%%write_and_run feature_map.py`. This function stores the content of the cell below in the file `variational_circuit.py`

In [None]:
%%write_and_run variational_circuit.py
# the write_and_run function writes the content in this cell into the file "variational_circuit.py"

### WRITE YOUR CODE BETWEEN THESE LINES - START
    
# import libraries that are used in the function below.
from qiskit import QuantumCircuit
from qiskit.circuit import ParameterVector
from qiskit.circuit.library import  RealAmplitudes, EfficientSU2
    
### WRITE YOUR CODE BETWEEN THESE LINES - END

def variational_circuit():
    # BUILD VARIATIONAL CIRCUIT HERE - START
    
    # import required qiskit libraries if additional libraries are required

    num_qubits = 3            
    rep1 = 2              
    # number of times you'd want to repeat the circuit 
    #custom_circ.draw()
        # build the variational circuit
    from qiskit.circuit.library import EfficientSU2, RealAmplitudes
    var = EfficientSU2(num_qubits, reps=rep1)
    #var = RealAmplitudes(num_qubits,reps=rep1,entanglement='sca')
    var.draw()
    var_circuit = var

    # BUILD VARIATIONAL CIRCUIT HERE - END
    
    # return the variational circuit which is either a VaritionalForm or QuantumCircuit object
    return var_circuit

In [None]:
'''


    vec = ParameterVector('vec', length=3*num_qubits)  # creating a list of Parameters
    custom_circ = QuantumCircuit(num_qubits)

    # defining our parametric form
    for _ in range(rep1):
        for i in range(num_qubits):
            custom_circ.y(vec[i], i)
            custom_circ.z(vec[i+1], i+1)
        
        for i in range(num_qubits):
            custom_circ.cx(i, j)
            custom_circ.y(vec[i], i)
            custom_circ.z(vec[i+1], i+1)
               
   '''

"\n\n\n    vec = ParameterVector('vec', length=num_qubits)  # creating a list of Parameters\n    custom_circ = QuantumCircuit(num_qubits)\n\n    # defining our parametric form\n    for _ in range(rep1):\n        for i in range(num_qubits):\n            custom_circ.y(vec[i], i)\n            custom_circ.z(vec[i], i)\n        \n        for i in range(num_qubits):\n            for j in range(i + 1, num_qubits):\n                custom_circ.cx(i, j)\n                custom_circ.u1(y[i] * y[j], j)\n                custom_circ.cx(i, j)\n                \n   "

## Choosing a Classical Optimizer

In the `classical_optimizer()` function given below you will have to import the optimizer of your choice from [`qiskit.aqua.optimizers`](https://qiskit.org/documentation/apidoc/qiskit.aqua.components.optimizers.html) and return it. This function will not be called by the grading function `grade()` and thus the name of the function `classical_optimizer()`can be changed if needed. 

In [None]:
def classical_optimizer():
    # CHOOSE AND RETURN CLASSICAL OPTIMIZER OBJECT - START
    
    # import the required clasical optimizer from qiskit.aqua.optimizers
    from qiskit.aqua.components.optimizers import AQGD,NFT,SPSA,ADAM,COBYLA

    #cls_opt = NFT(maxiter=500, maxfev=1024)
    cls_opt = SPSA(800,50)
    #cls_opt = ADAM(maxiter=1000,amsgrad=True)
    # create an optimizer object
    #cls_opt = COBYLA(maxiter=1000,tol=0.00001)
    
    # CHOOSE AND RETURN CLASSICAL OPTIMIZER OBJECT - END
    return cls_opt

### Callback Function

The `VQC` class can take in a callback function to which the following parameters will be passed after every optimization cycle of the algorithm:

* `eval_count` : the evaulation counter
* `var_params` : value of parameters of the variational circuit
* `eval_val`  : current cross entropy cost 
* `index` : the batch index

In [None]:
def call_back_vqc(eval_count, var_params, eval_val, index):
    #print("eval_count: {}".format(eval_count))
    print("var_params: {}".format(var_params))
    #print("length var_params: {}".format(len(var_params)))
    print("eval_val: {}".format(eval_val))
    print("index: {}".format(index))

## Optimization Step

This is where the whole VQC algorithm will come together. First we create an instance of the `VQC` class. 

In [None]:
# a fixed seed so that we get the same answer when the same input is given. 
seed = 10598

# setting our backend to qasm_simulator with the "statevector" method on. This particular setup is given as it was 
# found to perform better than most. Feel free to play around with different backend options.
backend = Aer.get_backend('qasm_simulator')
backend_options = {"method": "statevector"}

# creating a quantum instance using the backend and backend options taken before
quantum_instance = QuantumInstance(backend, shots=1024, seed_simulator=seed, seed_transpiler=seed, 
                                   backend_options=backend_options)

# creating a VQC instance which you will be used for training. Make sure you input the correct training_dataset and 
# testing_dataset as defined in your program.

mbs = 100
training_input = training_input_glob[0]
test_input = test_input_glob[0]
vqc = VQC(optimizer=classical_optimizer(), 
          feature_map=feature_map(), 
          var_form=variational_circuit(), 
          callback=call_back_vqc, 
          training_dataset=training_input,   # training_input must be initialized with your training dataset
          test_dataset=test_input, minibatch_size=mbs)  
           # testing_input must be initialized with your testing dataset

Now, let's run the VQC classification routine

In [None]:
def vqc_running_fn(i,mbs,arr):
  training_input = training_input_glob[i]
  test_input = test_input_glob[i]
  
  vqc = VQC(optimizer=classical_optimizer(), 
          feature_map=feature_map(), 
          var_form=variational_circuit(), 
          callback=call_back_vqc, 
          training_dataset=training_input,   # training_input must be initialized with your training dataset
          test_dataset=test_input, minibatch_size=mbs)  
  if(i!=0):
    vqc.initial_point = np.asarray(arr)

  result = vqc.run(quantum_instance)
  print("testing success ratio: {}".format(result['testing_accuracy']))
  print(repr(vqc.optimal_params))
  f = open("results.txt","a+")
  f.write(result['testing_accuracy'])
  f.write(repr(vqc.optimal_params))
  return result,result['testing_accuracy'],np.asarray(repr(vqc.optimal_params))


for i in range(4):
  if(i==0):
    vqc_running_fn(i,50,[])
  else:
    arr = np.asarray(repr(vqc.optimal_params))
    vqc_running_fn(i,50,arr)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
var_params: [ 1.31759868 -3.28168776 -2.84298632 -4.9206499   1.33583139  0.30027152
 -0.67775337  3.42393497 -0.29089864 -2.56235075 -2.62212161 -1.5837207
  0.07770692  0.32882828  0.41778972  1.04043558 -1.69771728 -6.29198753]
eval_val: 0.6562248186960264
index: 671
var_params: [ 1.42676832 -3.3908574  -2.73381668 -4.81148026  1.44500103  0.29746635
 -0.68055855  3.31476532 -0.40006829 -2.6715204  -2.73129125 -1.47455106
 -0.03146273  0.43799792  0.30862008  1.04324076 -1.70052246 -6.29479271]
eval_val: 0.5979884028954782
index: 672
var_params: [ 1.3147935  -3.27888258 -2.8457915  -4.92345508  1.33302621  0.40944117
 -0.56858373  3.42674014 -0.28809346 -2.55954558 -2.61931643 -1.58652588
  0.08051209  0.3260231   0.4205949   0.93126594 -1.58854763 -6.18281789]
eval_val: 0.59773716085375
index: 673
var_params: [ 1.42646263 -3.39055171 -2.73412237 -4.81178595  1.44469534  0.29777204
 -0.68025286  3.42700965 -0.39976259 

TypeError: ignored

In [None]:
start = time.process_time()

result = vqc.run(quantum_instance)

print("time taken: ")
print(time.process_time() - start)

print("testing success ratio: {}".format(result['testing_accuracy']))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
eval_val: 0.6036383018902218
index: 432
var_params: [-1.45234826  2.93127023 -0.31764092 -1.88711325  2.46054271 -0.6751363
 -2.09493311  3.11902865  1.21018744  5.57785277 -3.51169502  2.63880852
 -1.82270681 -0.66097293 -6.65634489 -2.60044736  0.66246263 -1.60770538]
length var_params: 18
eval_val: 0.6657981540907217
index: 433
var_params: [-1.39437129  3.10678802 -0.25966395 -1.94509023  2.51851968 -0.49961851
 -1.91941532  3.17700562  1.03466965  5.75337056 -3.45371804  2.81432631
 -1.88068378 -0.83649072 -6.83186268 -2.77596516  0.48694484 -1.43218758]
length var_params: 18
eval_val: 0.5802438802811795
index: 434
var_params: [-1.27683047  2.9892472  -0.14212313 -2.06263105  2.6360605  -0.61715932
 -2.03695613  3.29454644  1.15221047  5.63582974 -3.33617723  2.69678549
 -1.9982246  -0.7189499  -6.71432186 -2.65842434  0.60448566 -1.5497284 ]
length var_params: 18
eval_val: 0.6120777262426319
index: 435
var_params: [-

In [None]:
print(repr(vqc.optimal_params))

array([-1.83582417,  3.77919169, -0.11305252, -2.17535393,  3.48514804,
       -0.96419281, -2.30299219,  2.48032294,  0.90725111,  5.82763834,
       -3.19577584,  2.89011003, -1.72820388, -0.8319336 , -6.93210425,
       -2.15331737, -0.25224409, -1.66591165])


In [None]:
#print(repr(vqc.optimal_params))
from qiskit.aqua.components.optimizers import ADAM

i=1
training_input = training_input_glob[i]
test_input = test_input_glob[i]

vqc = VQC(optimizer=ADAM(maxiter=500), 
          feature_map=feature_map(), 
          var_form=variational_circuit(), 
          callback=call_back_vqc, 
          training_dataset=training_input,   # training_input must be initialized with your training dataset
          test_dataset=test_input, minibatch_size=mbs)  
vqc.initial_point = np.asarray([-1.83582417,  3.77919169, -0.11305252, -2.17535393,  3.48514804,
       -0.96419281, -2.30299219,  2.48032294,  0.90725111,  5.82763834,
       -3.19577584,  2.89011003, -1.72820388, -0.8319336 , -6.93210425,
       -2.15331737, -0.25224409, -1.66591165])
result = vqc.run(quantum_instance)


var_params: [-1.83582417  3.77919169 -0.11305252 -2.17535393  3.48514804 -0.96419281
 -2.30299219  2.48032294  0.90725111  5.82763834 -3.19577584  2.89011003
 -1.72820388 -0.8319336  -6.93210425 -2.15331737 -0.25224409 -1.66591165]
length var_params: 18
eval_val: 0.565652071158361
index: 0
var_params: [-1.83582416  3.77919169 -0.11305252 -2.17535393  3.48514804 -0.96419281
 -2.30299219  2.48032294  0.90725111  5.82763834 -3.19577584  2.89011003
 -1.72820388 -0.8319336  -6.93210425 -2.15331737 -0.25224409 -1.66591165]
length var_params: 18
eval_val: 0.565652071158361
index: 0
var_params: [-1.83582417  3.7791917  -0.11305252 -2.17535393  3.48514804 -0.96419281
 -2.30299219  2.48032294  0.90725111  5.82763834 -3.19577584  2.89011003
 -1.72820388 -0.8319336  -6.93210425 -2.15331737 -0.25224409 -1.66591165]
length var_params: 18
eval_val: 0.565652071158361
index: 0
var_params: [-1.83582417  3.77919169 -0.11305251 -2.17535393  3.48514804 -0.96419281
 -2.30299219  2.48032294  0.90725111  5.82

In [None]:
print("testing success ratio: {}".format(result['testing_accuracy']))
#print(np.ndarray(vqc.optimal_params))
print(repr(vqc.optimal_params))
'''
vqc.training_dataset = training_input_glob[1]
vqc.test_dataset = test_input_glob[1]
res = vqc.run(quantum_instance)

data_pts = []
data_labs = []

for k12 in training_input['A']:
  data_pts.append(k12)
  data_labs.append(4)

#data_pts.append(training_input['B'])
for k13 in training_input['B']:
  data_pts.append(k13)
  data_labs.append(9)

#print(np.array(data_pts).shape)
#print(len(data_labs))
#vqc.train(np.array(data_pts),np.array(data_labs),quantum_instance,minibatch_size=100)
'''

testing success ratio: 0.6641666666666667
array([ 0.16788022, -0.21433899, -0.56039106,  0.73329745, -0.15987959,
       -1.60074713, -0.06422788, -0.12448296,  0.22344761,  0.22575428,
        0.46350456,  0.01415431,  0.6410356 ,  0.0235812 ,  1.08156761,
       -0.23201777,  0.6861721 , -0.88586962])


"\nvqc.training_dataset = training_input_glob[1]\nvqc.test_dataset = test_input_glob[1]\nres = vqc.run(quantum_instance)\n\ndata_pts = []\ndata_labs = []\n\nfor k12 in training_input['A']:\n  data_pts.append(k12)\n  data_labs.append(4)\n\n#data_pts.append(training_input['B'])\nfor k13 in training_input['B']:\n  data_pts.append(k13)\n  data_labs.append(9)\n\n#print(np.array(data_pts).shape)\n#print(len(data_labs))\n#vqc.train(np.array(data_pts),np.array(data_labs),quantum_instance,minibatch_size=100)\n"

In [None]:
print(repr(vqc.optimal_params))

array([ 0.93836901,  1.66094894,  0.33946952, -0.56326243,  1.32086339,
        1.1047427 ,  1.02720028, -0.84099704,  0.28679475, -0.39123291,
       -0.94092688,  1.39366975])


In [None]:
%%write_and_run optimal_params.py
# # the write_and_run function writes the content in this cell into the file "optimal_params.py"

### WRITE YOUR CODE BETWEEN THESE LINES - START
    
# import libraries that are used in the function below.
import numpy as np
    
### WRITE YOUR CODE BETWEEN THESE LINES - END

def return_optimal_params():
    # STORE THE OPTIMAL PARAMETERS AS AN ARRAY IN THE VARIABLE optimal_parameters 
    
#    optimal_parameters = [-0.17220612,  3.7452231 , -2.56806578, -2.49679404,  8.57955158,
 #       2.95204199, -0.15742801, -0.04722189,  3.64109488, -2.15771571,
  #      1.53205487,  2.51904802]
#    optimal_parameters = [ 2.38214527,  6.93614767,  6.76713521,  1.71674236, -0.28096424,
 #       1.22562998,  0.73173688, -1.00503826, 12.1711244 ,  6.20755871,
  #      5.20980614,  3.87807578]

  #  optimal_parameters = [-2.03166672, -0.87014008, -2.31947263, -1.97793026,  1.06065705,
   #     1.23789332, -1.21445619,  1.25799323, -1.28535287, -0.55364733,
    #    1.12190672, -1.681177  ]
  #  optimal_parameters= [ 0.93836901,  1.66094894,  0.33946952, -0.56326243,  1.32086339,
   #     1.1047427 ,  1.02720028, -0.84099704,  0.28679475, -0.39123291,
    #   -0.94092688,  1.39366975]
  #  optimal_parameters = [ 1.74222711, -0.95743734,  2.38861375,  1.22859355, -2.93216338,
   #     -5.13315939, -0.49767292,  2.70484726,  3.56006376,  0.37819003,
    #    -1.139581  ,  0.58041328,  1.71643732,  0.88513928,  8.58641562,
     #   -0.84542103,  6.10141353,  6.19587347]
    optimal_parameters = [ 0.16788022, -0.21433899, -0.56039106,  0.73329745, -0.15987959,
       -1.60074713, -0.06422788, -0.12448296,  0.22344761,  0.22575428,
        0.46350456,  0.01415431,  0.6410356 ,  0.0235812 ,  1.08156761,
       -0.23201777,  0.6861721 , -0.88586962]
    # STORE THE OPTIMAL PARAMETERS AS AN ARRAY IN THE VARIABLE optimal_parameters 
    return np.array(optimal_parameters)

## Submission

Before we go any further, check that you have the three files `feature_map.py`, `variational_circuit.py` and `optimal_params.py` in the **same working directory as this notebook**. If you do not, then go back to the start and run the notebook making sure you have filled in the code where its required. When you run the cell below, all the three files `feature_map.py`, `variational_circuit.py` and `optimal_params.py` are combined into one file named **"answer.py"**. Now your working directory will have four python (.py) files out of which **"answer.py"** is the submission file: 
* `answer.py` <- upload this file onto HackerEarth and click on "Submit and Evaluate"
* `feature_map.py`
* `variational_circuit.py`
* `optimal_params.py`

In [None]:
solution = ['feature_map.py','variational_circuit.py','optimal_params.py']
file = open("answer.py","w")
file.truncate(0)
for i in solution:    
    with open(i) as f:
        with open("answer.py", "a") as f1:
            for line in f:
                f1.write(line)
file.close()

## Grading Function

Given below is the grading function that we shall use to grade your submission with a test dataset that is of the same format as `challenge_dataset_4_9.csv`. You can use it to grade your submission by extracting a few points out of the `challenge_dataset_4_9.csv` to get a basic idea of how your model is performing. 

In [None]:
#imports required for the grading function 
from qiskit import *
from qiskit.aqua import QuantumInstance
from qiskit.aqua.algorithms import VQC
from qiskit.aqua.components.feature_maps import FeatureMap
from qiskit.aqua.components.variational_forms import VariationalForm
import numpy as np

### Working of the grading function

The grading function `grade()` takes as **input**: 

* `test_data`: (`np.ndarray`) -- **no. of datapoints $\times$ dimension of data** : the datapoints against which we want to test our model. 


* `test_labels`: (`np.ndarray`) -- **no. of datapoints $\times$ 1** : A column vector with each entry either 0 or 1 as entries.


* `feature_map`: (`QuantumCircuit` or `FeatureMap`) -- A quantum feature map which is the output of `feature_map()` defined earlier.


* `variational_form`: (`QuantumCircuit` or `VariationalForm`) -- A variational form which is the output of `variational_circuit()` defined earlier.


* `optimal_params`: (`numpy.ndarray`) -- the optimal parameters obtained after running the VQC algorithm above. These are the values obtained when the function `return_optimal_params()` is run. 


* `find_circuit_cost` : (`bool`) -- Calculates the circuit cost if set to `True`. Circuit cost is calculated by converting the circuit to the basis gate set `\[ 'u3', 'cx'\]` and then applying the formula **cost = 1$\times$(no.of u3 gates) + 10$\times$(no.of cx gates)**.


* `verbose` : (`bool`) -- prints the result message if set to `True`.

And gives as **output**: 

* `model_accuracy` : (`numpy.float64`) -- percent accuracy of the model. 


* `circuit_cost`: (`int`) -- circuit cost as explained above.


* `ans`: (`tuple`) -- Output of the `VQC.predict()` method. 


* `result_msg`: (`str`) -- Result message which also outputs the error message in case of one.


* `unrolled_circuit`: (`QuantumCircuit` or `None`) -- the circuit obtained after unrolling the full VQC circuit and substituting the optimal parameters to the basis gate set `\[ 'u3', 'cx'\]`.

**Note:** if you look inside the `grade()` function in Section 2 you'll see that we have initialized a COBYLA optimizer though the prediction step will not require one. Similarily we have given a dataset to `training dataset`. Both of these are dummy variables. The reason for this is because these are not optional variables the `VQC` class instantiation.  

In [None]:
def grade(test_data, test_labels, feature_map, variational_form, optimal_params, find_circuit_cost=True, verbose=True):
    seed = 10598
    model_accuracy = None 
    circuit_cost=None 
    ans = None
    unrolled_circuit = None
    result_msg=''
    data_dim = np.array(test_data).shape[1]
    dataset_size = np.array(test_data).shape[0]
    dummy_training_dataset=training_input = {'A':np.ones((2,data_dim)), 'B':np.ones((2, data_dim))}
    
    # converting 4's to 0's and 9's to 1's for checking 
    test_labels_transformed = np.where(test_labels==4, 0., 1.)
    max_qubit_count = 6
    max_circuit_cost = 2000
    
    # Section 1
    if feature_map is None:
        result_msg += 'feature_map variable is None. Please submit a valid entry' if verbose else ''
    elif variational_form is None: 
        result_msg += 'variational_form variable is None. Please submit a valid entry' if verbose else ''
    elif optimal_params is None: 
        result_msg += 'optimal_params variable is None. Please submit a valid entry' if verbose else ''
    elif test_data is None: 
        result_msg += 'test_data variable is None. Please submit a valid entry' if verbose else ''
    elif test_labels is None: 
        result_msg += 'test_labels variable is None. Please submit a valid entry' if verbose else ''
    elif not isinstance(feature_map, (QuantumCircuit, FeatureMap)):
        result_msg += 'feature_map variable should be a QuantumCircuit or a FeatureMap not (%s)' % \
                      type(feature_map) if verbose else ''
    elif not isinstance(variational_form, (QuantumCircuit, VariationalForm)):
        result_msg += 'variational_form variable should be a QuantumCircuit or a VariationalForm not (%s)' % \
                      type(variational_form) if verbose else ''
    elif not isinstance(test_data, np.ndarray):
        result_msg += 'test_data variable should be a numpy.ndarray not (%s)' % \
                      type(test_data) if verbose else ''
    elif not isinstance(test_labels, np.ndarray):
        result_msg += 'test_labels variable should be a numpy.ndarray not (%s)' % \
                      type(test_labels) if verbose else ''
    elif not isinstance(optimal_params, np.ndarray):
        result_msg += 'optimal_params variable should be a numpy.ndarray not (%s)' % \
                      type(optimal_params) if verbose else ''
    elif not dataset_size == test_labels_transformed.shape[0]:
        result_msg += 'Dataset size and label array size must be equal'
    # Section 2
    else:
        
        # setting up COBYLA optimizer as a dummy optimizer
        from qiskit.aqua.components.optimizers import COBYLA
        dummy_optimizer = COBYLA()

        # setting up the backend and creating a quantum instance
        backend = Aer.get_backend('qasm_simulator')
        backend_options = {"method": "statevector"}
        quantum_instance = QuantumInstance(backend, 
                                           shots=2000, 
                                           seed_simulator=seed, 
                                           seed_transpiler=seed, 
                                           backend_options=backend_options)

        # creating a VQC instance and running the VQC.predict method to get the accuracy of the model 
        vqc = VQC(optimizer=dummy_optimizer, 
                  feature_map=feature_map, 
                  var_form=variational_form, 
                  training_dataset=dummy_training_dataset)
        
        from qiskit.transpiler import PassManager
        from qiskit.transpiler.passes import Unroller
        pass_ = Unroller(['u3', 'cx'])
        pm = PassManager(pass_)
        # construct circuit with first datapoint
        circuit = vqc.construct_circuit(data[0], optimal_params)
        unrolled_circuit = pm.run(circuit)
        gates = unrolled_circuit.count_ops()
        if 'u3' in gates: 
            circuit_cost = gates['u3']
        if 'cx' in gates: 
            circuit_cost+= 10*gates['cx']
        
        if circuit.num_qubits > max_qubit_count:
            result_msg += 'Your quantum circuit is using more than 6 qubits. Reduce the number of qubits used and try again.'
        elif circuit_cost > max_circuit_cost:
            result_msg += 'The cost of your circuit is exceeding the maximum accpetable cost of 2000. Reduce the circuit cost and try again.'
        else: 
            
            ans = vqc.predict(test_data, quantum_instance=quantum_instance, params=np.array(optimal_params))
            model_accuracy = np.sum(np.equal(test_labels_transformed, ans[1]))/len(ans[1])

            result_msg += 'Accuracy of the model is {}'.format(model_accuracy) if verbose else ''
            result_msg += ' and circuit cost is {}'.format(circuit_cost) if verbose else ''
            
    return model_accuracy, circuit_cost, ans, result_msg, unrolled_circuit

## Process of grading using a dummy grading dataset

Let us create a dummy grading dataset with features and labels `grading_features` and `grading_labels` created from the last 2000 datapoints from `data_features` and `data_labels`so that we can a rough estimate of our accuaracy. It must be noted that this may not be a balanced dataset, i.e, may not have equal number of `4`'s and `9`'s and is not best practice. This is only given for the purpose of the demo of `grade()` function. In the final scoring done on HackerEarth, the testing dataset used will have a balanced number of class labels `4` and `9`.

In [None]:
grading_dataset_size=800    # this value is not per digit but in total
grading_features = data_features[-grading_dataset_size:]
grading_labels = data_labels[-grading_dataset_size:]

In [None]:
start = time.process_time()

accuracy, circuit_cost, ans, result_msg, full_circuit  =  grade(test_data=grading_features, 
                                                                test_labels=grading_labels, 
                                                                feature_map=feature_map(), 
                                                                variational_form=variational_circuit(), 
                                                                optimal_params=return_optimal_params())

print("time taken: {} seconds".format(time.process_time() - start))
print(result_msg)

NameError: ignored

You can also check your **accuracy**, **circuit_cost** and **full_circuit** which is the result of combining the feature map and variational circuit and unrolling into the basis \['u3', 'cx'\].

In [None]:
print("Accuracy of the model: {}".format(accuracy))
print("Circuit Cost: {}".format(circuit_cost))
print("The complete unrolled circuit: ")
full_circuit.draw()

Accuracy of the model: 0.908
Circuit Cost: 126
The complete unrolled circuit: 
