# Quantum Principal Component Analysis

In data analysis, it is common to have many features, some of which are redundant or correlated. Often, during data cleaning or data transformation steps, one is interested in reducing the number of features to the few, most important features. One of the way to do that is to capture features with largest variance in the data. To determine which features capture the largest variance is the goal of <i>Principal Component Analysis (PCA) </i>. 

Mathematically, this involves taking the raw data and computing the covariance matrix. 

In [79]:
import numpy as np
# Dataset obtained from https://arxiv.org/pdf/1804.03719.pdf
X1 = [4,3,4,4,3,3,3,3,4,4,4,5,4,3,4]
X2 = [3028,1365,2726,2538,1318,1693,1412,1632,2875,3564,4412,4444,4278,3064,3857]
X1 = X1 - np.average(X1)
X2 = (X2 - np.average(X2)) / 1000

In [3]:
data = np.array([X1, X2])
data

array([[ 0.33333333, -0.66666667,  0.33333333,  0.33333333, -0.66666667,
        -0.66666667, -0.66666667, -0.66666667,  0.33333333,  0.33333333,
         0.33333333,  1.33333333,  0.33333333, -0.66666667,  0.33333333],
       [ 0.21426667, -1.44873333, -0.08773333, -0.27573333, -1.49573333,
        -1.12073333, -1.40173333, -1.18173333,  0.06126667,  0.75026667,
         1.59826667,  1.63026667,  1.46426667,  0.25026667,  1.04326667]])

In [80]:
from QPCA import get_covariance_matrix
cov = get_covariance_matrix(data)

print(cov)

[[0.38095238 0.57347619]
 [0.57347619 1.29693364]]


In [81]:
from QPCA import get_density_matrix
den = get_density_matrix(cov)
print(den)

[[0.22704306 0.34178495]
 [0.34178495 0.77295694]]


In [82]:
from QPCA import purify_density_matrix
pure_den = purify_density_matrix(den)
print(pure_den)

[[-0.22545283 -0.41977861]
 [ 0.10847494 -0.8724621 ]]


In [83]:
from braket.circuits import Circuit

In [91]:
circ = Circuit()
circ.unitary(matrix=pure_den, targets=[1,2])
circ.unitary(matrix=pure_den, targets=[3,4])
circ.h(0)
circ.cswap(0, 1, 3)
circ.h(0)

ValueError: Dimensions of the supplied unitary are incompatible with the targets

In [85]:
pure_den_reshaped = pure_den.reshape(4)

In [86]:
Uprep = np.concatenate((pure_den_reshaped, pure_den_reshaped), axis=0)
print(Uprep)

[-0.22545283 -0.41977861  0.10847494 -0.8724621  -0.22545283 -0.41977861
  0.10847494 -0.8724621 ]


In [87]:
Uprep = Uprep.reshape(4,2)

In [88]:
Uprep

array([[-0.22545283, -0.41977861],
       [ 0.10847494, -0.8724621 ],
       [-0.22545283, -0.41977861],
       [ 0.10847494, -0.8724621 ]])

In [89]:
circ = Circuit()
circ.unitary(matrix=Uprep, targets=[1,2])
circ.unitary(matrix=Uprep, targets=[3,4])
circ.h(0)
circ.cswap(0, 1, 3)
circ.h(0)

ValueError: [[-0.22545283 -0.41977861]
 [ 0.10847494 -0.8724621 ]
 [-0.22545283 -0.41977861]
 [ 0.10847494 -0.8724621 ]] is not a two-dimensional square matrix

In [92]:
from qiskit import QuantumCircuit

circ = QuantumCircuit(5, 1)
circ.initialize([1,0], (0,))
circ.initialize(pure_den_reshaped, (1,2))
circ.initialize(pure_den_reshaped, (3,4))
circ.h(0)
circ.cswap(0,1,3)
circ.h(0)
circ.measure(0,0)
circ.draw()