**Quantum Data Processing**

This notebook covers the topic of quantum data processing. We will see how quantum data processing algorithms are implemented in day-to-day life. Code examples are presented for algorithms such as k-means, k-medians, and quantum clustering.


**Quantum k-means**

The quantum k-means technique is applied using a quantum circuit that can have training vectors. The training vectors are the initial centroids, and the centroids are computed after every step. The circuit can be changed by using a rotated training vector. The number of training vectors is increased based on the number of clusters. The pseudo-algorithm steps are shown here: 

1. Input the k cluster groups, and input the data points. 

2. Initialize the k centroids randomly. 
3. Iterate all the input data points to identify the cluster group to them. 

4. Compute the quantum interference circuit. 

5. Filter the interference probabilities. 

6. Add the high probability centroids into the dictionary. 

7. The previous three methods are computed for the set of data points.

For k clusters, the N number of quantum machines can be modeled for the quantum k-means technique. The quantum state consists of the group of centroids that are the result of the algorithm. Parallelization can be done by breaking down the input data into different sets for each quantum machine to execute the algorithm. The quantum state is aggregated and used for iterations. The Iris data set that was discussed in the previous chapter can be clustered in three groups (setosa, versicolor, virginica) based on the petal’s and sepal’s width and height. The technique tries to identify the cluster groups with lesser variation within the cluster. The other application is to classify people based on their income and spending into k clusters. [1]

In [None]:
import matplotlib.pyplot as plot
import pandas as pand!pip install qiskit
from qiskit import QuantumRegister, ClassicalRegister
from qiskit import QuantumCircuit
from qiskit import Aer, execute
from numpy import pi

figure, axis = plot.subplots()
axis.set(xlabel='Feature 1', ylabel='Feature 2')


data_input = pand.read_csv('kmeans_input.csv',
    usecols=['Feature 1', 'Feature 2', 'Class'])


isRed = data_input['Class'] == 'Red'
isGreen = data_input['Class'] == 'Green'
isBlack = data_input['Class'] == 'Black'

# Filter data
redData = data_input[isRed].drop(['Class'], axis=1)
greenData = data_input[isGreen].drop(['Class'], axis=1)
blackData = data_input[isBlack].drop(['Class'], axis=1)


y_p = 0.141
x_p = -0.161

xgc = sum(redData['Feature 1']) / len(redData['Feature 1'])
xbc = sum(greenData['Feature 1']) / len(greenData['Feature 1'])
xkc = sum(blackData['Feature 1']) / len(blackData['Feature 1'])

# Finding the y-coords of the centroids
ygc = sum(redData['Feature 2']) / len(redData['Feature 2'])
ybc = sum(greenData['Feature 2']) / len(greenData['Feature 2'])
ykc = sum(blackData['Feature 2']) / len(blackData['Feature 2'])

# Plotting the centroids
plot.plot(xgc, ygc, 'rx')
plot.plot(xbc, ybc, 'gx')
plot.plot(xkc, ykc, 'kx')


plot.plot(x_p, y_p, 'bo')

# Setting the axis ranges
plot.axis([-1, 1, -1, 1])

plot.show()

# Calculating theta and phi values
phi_list = [((x + 1) * pi / 2) for x in [x_p, xgc, xbc, xkc]]
theta_list = [((x + 1) * pi / 2) for x in [y_p, ygc, ybc, ykc]]

quantumregister = QuantumRegister(3, 'quantumregister')


classicregister = ClassicalRegister(1, 'classicregister')

quantum_circuit = QuantumCircuit(quantumregister, classicregister, name='qc')


backend = Aer.get_backend('qasm_simulator')


quantum_results_list = []


for i in range(1, 4):
    quantum_circuit.h(quantumregister[2])

   
    quantum_circuit.u3(theta_list[0], phi_list[0], 0, quantumregister[0])           
    quantum_circuit.u3(theta_list[i], phi_list[i], 0, quantumregister[1]) 

    quantum_circuit.cswap(quantumregister[2], quantumregister[0], quantumregister[1])
    quantum_circuit.h(quantumregister[2])

    quantum_circuit.measure(quantumregister[2], classicregister[0])

    quantum_circuit.reset(quantumregister)

    job = execute(quantum_circuit, backend=backend, shots=1024)
    result = job.result().get_counts(quantum_circuit)
    quantum_results_list.append(result['1'])

print(quantum_results_list)



class_list = ['Red', 'Green', 'Black']


quantum_p_class = class_list[quantum_results_list.index(min(quantum_results_list))]


distances_list = [((x_p - i[0])**2 + (y_p - i[1])**2)**0.5 for i in [(xgc, ygc), (xbc, ybc), (xkc, ykc)]]
classical_p_class = class_list[distances_list.index(min(distances_list))]

print("""using quantumdistance algorithm,
 the new data point is related to the""", quantum_p_class, 
 'class.\n')
print('Euclidean distances are listed: ', distances_list, '\n')
print("""based on euclidean distance calculations,
 the new data point is related to the""", classical_p_class, 
 'class.')

The input points are read from the Excel file that has the colors red, green, and black. A new point is added to use the quantum k-means algorithm to find which class color it belongs to. Using the quantum distance algorithm, the new data point [72, 60, 125] is related to the Green class. The distances measured are based on Euclidean distances. 

[0.520285324797846, 0.4905204028376393, 0.7014755294377704]

**Quantum K-Medians**

The quantum k-medians technique is the computation of the cluster centroids in a cluster group by using the median of the data points. A median is computed based on the minimal distance from all the data points. The technique is related to minimizing the sum of the distance between a point and the median of the data set. The minimization of the distance is computed based on a quantum minimization algorithm using Grover’s search. The Euclidean distance between the points is used as the distance algorithm.

The steps for the pseudo-algorithm are shown here: 

1. The input is the k cluster groups and input data points. 

2. Initialize the k centroids randomly based on the median of the data points. 

3. All the input data points are iterated to identify the cluster group to them.

4. The quantum interference circuit is computed. 

5. The interference probabilities are filtered. 

6. The high probability centroids are added into the dictionary. 

7. The previous three methods are computed for the set of data points.

In [None]:
import matplotlib.pyplot as plot
import pandas as pand
from qiskit import QuantumRegister, ClassicalRegister
from qiskit import QuantumCircuit
from qiskit import Aer, execute
from numpy import pi

figure, axis = plot.subplots()
axis.set(xlabel='Feature 1', ylabel='Feature 2')


data_input = pand.read_csv('kmeans_input.csv',
    usecols=['Feature 1', 'Feature 2', 'Class'])


isRed = data_input['Class'] == 'Red'
isGreen = data_input['Class'] == 'Green'
isBlack = data_input['Class'] == 'Black'

# Filter data
redData = data_input[isRed].drop(['Class'], axis=1)
greenData = data_input[isGreen].drop(['Class'], axis=1)
blackData = data_input[isBlack].drop(['Class'], axis=1)


y_p = 0.141
x_p = -0.161

xgc = sum(redData['Feature 1']) / len(redData['Feature 1'])
xbc = sum(greenData['Feature 1']) / len(greenData['Feature 1'])
xkc = sum(blackData['Feature 1']) / len(blackData['Feature 1'])

# Finding the y-coords of the centroids
ygc = sum(redData['Feature 2']) / len(redData['Feature 2'])
ybc = sum(greenData['Feature 2']) / len(greenData['Feature 2'])
ykc = sum(blackData['Feature 2']) / len(blackData['Feature 2'])

# Plotting the centroids
plot.plot(xgc, ygc, 'rx')
plot.plot(xbc, ybc, 'gx')
plot.plot(xkc, ykc, 'kx')


plot.plot(x_p, y_p, 'bo')

# Setting the axis ranges
plot.axis([-1, 1, -1, 1])

plot.show()

# Calculating theta and phi values
phi_list = [((x + 1) * pi / 2) for x in [x_p, xgc, xbc, xkc]]
theta_list = [((x + 1) * pi / 2) for x in [y_p, ygc, ybc, ykc]]

quantumregister = QuantumRegister(3, 'quantumregister')


classicregister = ClassicalRegister(1, 'classicregister')

quantum_circuit = QuantumCircuit(quantumregister, classicregister, name='qc')


backend = Aer.get_backend('qasm_simulator')


quantum_results_list = []


for i in range(1, 4):
    quantum_circuit.h(quantumregister[2])

   
    quantum_circuit.u3(theta_list[0], phi_list[0], 0, quantumregister[0])           
    quantum_circuit.u3(theta_list[i], phi_list[i], 0, quantumregister[1]) 

    quantum_circuit.cswap(quantumregister[2], quantumregister[0], quantumregister[1])
    quantum_circuit.h(quantumregister[2])

    quantum_circuit.measure(quantumregister[2], classicregister[0])

    quantum_circuit.reset(quantumregister)

    job = execute(quantum_circuit, backend=backend, shots=1024)
    result = job.result().get_counts(quantum_circuit)
    quantum_results_list.append(result['1'])

print(quantum_results_list)



class_list = ['Red', 'Green', 'Black']


quantum_p_class = class_list[quantum_results_list.index(min(quantum_results_list))]


distances_list = [((x_p - i[0])**2 + (y_p - i[1])**2)**0.5 for i in [(xgc, ygc), (xbc, ybc), (xkc, ykc)]]
classical_p_class = class_list[distances_list.index(min(distances_list))]

print("""using quantumdistance algorithm,
 the new data point is related to the""", quantum_p_class, 
 'class.\n')
print('Euclidean distances are listed: ', distances_list, '\n')
print("""based on euclidean distance calculations,
 the new data point is related to the""", classical_p_class, 
 'class.')

**Quantum Clustering** 

The quantum clustering algorithm (see Figure 10-9) is based on the gradient descent technique. It is used to find the quantum potential at a constant learning rate and computing the cluster center. Using quantum mechanics–based principles, quantum clustering techniques find groups with complex shapes. The technique identifies the group of any shape by computing the group center. Quantum clustering is based on the inversion problem in quantum mechanics. The quantum clustering technique helps to get a particle distribution that is estimated based on the potential function. The algorithm finds the group center. For each group in the center, it assigns the center. The wave function is identified from the Schrodinger equation solution. Let’s look at the steps for the quantum clustering algorithm: 

1. The weights for each data feature are identified, and the parameters are selected based on the input data. 

2. The number of groups/measurement scale is set to zero.

3. The weighted measures are calculated based on the quantum clustering distance method. 

4. The parameter and the potential energy based on the data are estimated.

5. The group number is increased by 1. 

6. The minimum potential energy is identified at the group’s center. 

7. All the data points are grouped by the distance metric, and it needs to be less than the measurement scale. The algorithm ends when the number of data points is zero for the distance metric criterion. Otherwise, the algorithm goes back to step 5. 

The quantum clustering algorithm has a single parameter. The algorithm is partition based and is an unsupervised learning technique. The processing time is higher because of the preprocessing required for grouping. The measurement scale is static, and the method is not dependent on the features. In this method, the precision of the grouping is dependent on the amount of learning. [1]

In [None]:
import numpy as nump
from scipy.spatial import distance as spatDistance
from sklearn.decomposition import PCA as decomPCA
import matplotlib.pyplot as plot
from mpl_toolkits.mplot3d import Axes3D as mplAxes3D

def getVGradient(data:nump.ndarray,sigma,x:nump.ndarray=None,coeffs:nump.ndarray=None):
   
    
    if x is None:
        x = data.copy()
    
    if coeffs is None:
        coeffs = nump.ones((data.shape[0],))
    
        
    twoSigmaSquared = 2*sigma**2
        
    data = data[nump.newaxis,:,:]
    x = x[:,nump.newaxis,:]
    differences = x-data
    squaredDifferences = nump.sum(nump.square(differences),axis=2)
    gaussian = nump.exp(-(1/twoSigmaSquared)*squaredDifferences)
    laplacian = nump.sum(coeffs*gaussian*squaredDifferences,axis=1)
    parzen = nump.sum(coeffs*gaussian,axis=1)
    v = 1 + (1/twoSigmaSquared)*laplacian/parzen

    dv = -1*(1/parzen[:,nump.newaxis])*nump.sum(differences*((coeffs*gaussian)[:,:,nump.newaxis])*(twoSigmaSquared*(v[:,nump.newaxis,nump.newaxis])-(squaredDifferences[:,:,nump.newaxis])),axis=1)
    
    v = v-1
    
    return v, dv

def getSGradient(data:nump.ndarray,sigma,x:nump.ndarray=None,coeffs:nump.ndarray=None):
   
    if x is None:
        x = data.copy()
        
    if coeffs is None:
        coeffs = nump.ones((data.shape[0],))
    
    twoSigmaSquared = 2 * sigma ** 2
    
    data = data[nump.newaxis, :, :]
    x = x[:, nump.newaxis, :]
    differences = x - data
    squaredDifferences = nump.sum(nump.square(differences), axis=2)
    gaussian = nump.exp(-(1 / twoSigmaSquared) * squaredDifferences)
    laplacian = nump.sum(coeffs*gaussian * squaredDifferences, axis=1)
    parzen = nump.sum(coeffs*gaussian, axis=1)
    v = (1 / twoSigmaSquared) * laplacian / parzen
    s = v + nump.log(nump.abs(parzen))
    
    ds = (1 / parzen[:, nump.newaxis]) * nump.sum(differences * ((coeffs*gaussian)[:, :, nump.newaxis]) * (
    twoSigmaSquared * (v[:, nump.newaxis, nump.newaxis]) - (squaredDifferences[:, :, nump.newaxis])), axis=1)
    
    return s, ds

def getPGradient(data:nump.ndarray,sigma,x:nump.ndarray=None,coeffs:nump.ndarray=None):

    if x is None:
        x = data.copy()
        
    if coeffs is None:
        coeffs = nump.ones((data.shape[0],))
    
    twoSigmaSquared = 2 * sigma ** 2
    
    data = data[nump.newaxis, :, :]
    x = x[:, nump.newaxis, :]
    differences = x - data
    squaredDifferences = nump.sum(nump.square(differences), axis=2)
    gaussian = nump.exp(-(1 / twoSigmaSquared) * squaredDifferences)
    p = nump.sum(coeffs*gaussian,axis=1)
    
    dp = -1*nump.sum(differences * ((coeffs*gaussian)[:, :, nump.newaxis]) * twoSigmaSquared,axis=1)
    
    return p, dp

def getApproximateParzenValues(data:nump.ndarray,sigma,voxelSize):

    newData = getUniqueRows(nump.floor(data/voxelSize)*voxelSize+voxelSize/2)[0]
    
    nMat = nump.exp(-1*spatDistance.squareform(nump.square(spatDistance.pdist(newData)))/(4*sigma**2))
    mMat = nump.exp(-1 * nump.square(spatDistance.cdist(newData,data)) / (4 * sigma ** 2))
    cMat = nump.linalg.solve(nMat,mMat)
    coeffs = nump.sum(cMat,axis=1)
    coeffs = data.shape[0]*coeffs/sum(coeffs)
    
    return newData,coeffs

def getUniqueRows(x):
    y = nump.ascontiguousarray(x).view(nump.dtype((nump.void, x.dtype.itemsize * x.shape[1])))
    _, inds,indsInverse,counts = nump.unique(y, return_index=True,return_inverse=True,return_counts=True)

    xUnique = x[inds]
    return xUnique,inds,indsInverse,counts

def getGradientDescent(data,sigma,repetitions=1,stepSize=None,clusteringType='v',recalculate=False,returnHistory=False,stopCondition=True,voxelSize=None):
    
    n = data.shape[0]

    useApproximation = (voxelSize is not None)
    
    if stepSize is None:
        stepSize = sigma/10
    
    if clusteringType == 'v':
        gradientFunction = getVGradient
    elif clusteringType == 's':
        gradientFunction = getSGradient
    else:
        gradientFunction = getPGradient

    if useApproximation:
        newData, coeffs = getApproximateParzenValues(data, sigma, voxelSize)
    else:
        coeffs = None

    if recalculate:
        if useApproximation:
            x = nump.vstack((data,newData))
            data = x[data.shape[0]:]
        else:
            x = data
    else:
        if useApproximation:
            x = data
            data = newData
        else:
            x = data.copy()
        
        
    if returnHistory:
        xHistory = nump.zeros((n,x.shape[1],repetitions+1))
        xHistory[:,:,0] = x[:n,:].copy()
        
    if stopCondition:
        prevX = x[:n].copy()

    for i in range(repetitions):
        if ((i>0) and (i%10==0)):
            if stopCondition:
                if nump.all(nump.linalg.norm(x[:n]-prevX,axis=1) < nump.sqrt(3*stepSize**2)):
                    i = i-1
                    break
                prevX = x[:n].copy()
            
        f,df = gradientFunction(data,sigma,x,coeffs)
        df = df/nump.linalg.norm(df,axis=1)[:,nump.newaxis]
        x[:] = x + stepSize*df

        if returnHistory:
            xHistory[:, :, i+1] = x[:n].copy()
            
    x = x[:n]

    if returnHistory:
        xHistory = xHistory[:,:,:(i+2)]
        return x,xHistory
    else:
        return x

def PerformFinalClusteringAlgo(data,stepSize):
    clusters = nump.zeros((data.shape[0]))
    i = nump.array([0])
    c = 0
    spatDistances = spatDistance.squareform(spatDistance.pdist(data))
    while i.shape[0]>0:
        i = i[0]
        inds = nump.argwhere(clusters==0)
        clusters[inds[spatDistances[i,inds] <= 3*stepSize]] = c
        c += 1
        i = nump.argwhere(clusters==0)
    return clusters

def displayClusteringValues(xHistory,clusters=None):

    plot.ion()
    plot.figure(figsize=(20, 12))
    if clusters is None:
        clusters = nump.zeros((xHistory.shape[0],))
    if xHistory.shape[1] == 1:

        sc = plot.scatter(xHistory[:,:,0],xHistory[:,:,0]*0,c=clusters,s=10)
        plot.xlim((nump.min(xHistory),nump.max(xHistory)))
        plot.ylim((-1,1))
        for i in range(xHistory.shape[2]):
            sc.set_offsets(xHistory[:, :, i])
            plot.title('step #' + str(i) + '/' + str(xHistory.shape[2]-1))
            plot.pause(0.05)
    elif xHistory.shape[1] == 2:

        sc = plot.scatter(xHistory[:, 0, 0], xHistory[:, 1, 0] , c=clusters, s=20)
        plot.xlim((nump.min(xHistory[:,0,:]), nump.max(xHistory[:,0,:])))
        plot.ylim((nump.min(xHistory[:, 1, :]), nump.max(xHistory[:, 1, :])))
        for i in range(xHistory.shape[2]):
            sc.set_offsets(xHistory[:, :, i])
            plot.title('step #' + str(i) + '/' + str(xHistory.shape[2]-1))
            plot.pause(0.2)
    else:
        if xHistory.shape[1] > 3:
            pca = decomPCA(3)
            pca.fit(xHistory[:,:,0])
            newXHistory = nump.zeros((xHistory.shape[0],3,xHistory.shape[2]))
            for i in range(xHistory.shape[2]):
                newXHistory[:,:,i] = pca.transform(xHistory[:,:,i])
            xHistory = newXHistory


        ax = plot.axes(projection='3d')
        sc = ax.scatter(xHistory[:, 0, 0], xHistory[:, 1, 0],xHistory[:, 2, 0], c=clusters, s=20)
        ax.set_xlim((nump.min(xHistory[:, 0, :]), nump.max(xHistory[:, 0, :])))
        ax.set_ylim((nump.min(xHistory[:, 1, :]), nump.max(xHistory[:, 1, :])))
        ax.set_zlim((nump.min(xHistory[:, 2, :]), nump.max(xHistory[:, 2, :])))
        for i in range(xHistory.shape[2]):
            sc._offsets3d =  (nump.ravel(xHistory[:, 0, i]),nump.ravel(xHistory[:, 1, i]),nump.ravel(xHistory[:, 2, i]))
            plot.gcf().suptitle('step #' + str(i) + '/' + str(xHistory.shape[2]-1))
            plot.pause(0.01)
            
    plot.show()

In [None]:
import numpy as nump
import quantum_cluster



types = ['f8', 'f8', 'f8','f8','U50']

data_res = nump.loadtxt('iris_input.csv', delimiter=',')

print(data_res)
data_res = data_res[:,:4]

sigma_val=0.55
repetitionsVal=100
stepSizeVal=0.1
clusteringTypeVal='v'
isRecalculate=False
isReturnHistory=True
isStopCondition=True
voxelSizeVal = None

xval,xHistoryVal = quantum_cluster.getGradientDescent(data_res,sigma=sigma_val,repetitions=repetitionsVal,stepSize=stepSizeVal,clusteringType=clusteringTypeVal,recalculate=isRecalculate,returnHistory=isReturnHistory,stopCondition=isStopCondition,voxelSize=voxelSizeVal)

clusters_res = quantum_cluster.PerformFinalClusteringAlgo(xval,stepSizeVal)

quantum_cluster.displayClusteringValues(xHistoryVal,clusters_res)

**Quantum Manifold Embedding** 

Quantum manifold embedding has sections that are Euclidean spaces. Euclidean spaces represent the quantum state in a quantum system. The Euclidean space has m vectors related to m dimensions. [1]


References
[1] Bhagvan Kommadi