# linear regression

Here we describe linear regression using annealing.

Reference: https://arxiv.org/abs/2008.02355

### Loss function for linear regression

Let $\boldsymbol{x}$ be the data (vector) with $n$ variables, and $y$ be the target value you want to predict for it.  

Assuming that $y$ is predictable by the weighted sum of each variable, we can consider the following weight vector $\boldsymbol{w}$.

$$
\boldsymbol{x}^T \cdot \boldsymbol{w} = y
$$

Next, suppose we have a training data set $X$ ($m \times n$ matrix) consisting of $m$ data and target data $Y$ ($m$-dimensional vector).  
The goal of linear regression is to find a single weight vector $\boldsymbol{w}$ that predicts the corresponding $y$ for all the data $\boldsymbol{x}$ in $X$.

This becomes a function minimization problem as follows.

$$
\min_{\boldsymbol{w}} E(\boldsymbol{w}) =  || X\boldsymbol{w}  - Y ||^2
$$

By transforming $E(\boldsymbol{w})$

$$
\min_{\boldsymbol{w}} E(\boldsymbol{w}) = \boldsymbol{w}^T X^T X \boldsymbol{w} - 2\boldsymbol{w}^T X^T Y + Y^T Y
$$

Let $\boldsymbol{w} = P\hat{\boldsymbol{w}}$ to encode the weights $\boldsymbol{w}$ into qubit measurement results.  
$\hat{\boldsymbol{w}}$ is a vector consisting of qubit measurement results $\{0, 1\}$.  
We also omit $Y^T Y$, which does not affect minimization.

$$
\min_{\boldsymbol{w}} E(\boldsymbol{w}) = 　\hat{\boldsymbol{w}}^T P^T  X^T X P\hat{\boldsymbol{w}} - 2\hat{\boldsymbol{w}}^T P^T X^T Y
$$

Then it can be reduced to a general QUBO problem, which can be solved by annealing.

$$
\min_{\boldsymbol{w}} E(\boldsymbol{w}) = 　\hat{\boldsymbol{w}}^T A \hat{\boldsymbol{w}} + \hat{\boldsymbol{w}}^T \boldsymbol{b} \\
A = P^T  X^T X P \\
\boldsymbol{b} = -2P^T X^T Y
$$

It is worth noting that the size of the QUBO matrix does not depend on the number of data contained in the data set.  
The size is  (the number of variables in the data) $\times$ (the number of qubits used to encode the weight values).  

Let's implement this using blueqat.

In [73]:
from blueqat.wq import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

The first step is to create a data set.  
This time, we will set the weights we estimate in advance.  
Generate the data set $X$ randomly, and calculate the weighted and noise-added value to be the target data $Y$.

In [165]:
w = np.array([0.25, 0.75, 0.5])
X = np.random.rand(100, 3)

y = X @ w + np.random.normal(scale = 0.05, size = X.shape[0])

Let's do a classical linear regression using scikit-learn.  
Although there is a slight deviation due to the addition of noise to the target data $Y$, the predetermined weights can be estimated with good accuracy.

In [166]:
from sklearn import linear_model

skmodel = linear_model.LinearRegression()
skmodel.fit(X, y)
w_sk = skmodel.coef_
print("Predicted weight:", w_sk)
print("True weight:", w)

Predicted weight: [0.23514126 0.72070147 0.48488868]
True weight: [0.25 0.75 0.5 ]


次にアニーリングで線形回帰を行います。  

Annealing encodes each value of the weights $\boldsymbol{w}$ into a number of qubits.  
What value to encode can be set arbitrarily with the transformation matrix $P$ in the above formula.  

Here we simply encode one weight parameter into two qubits and predict it from four values of $[0, 0.25, 0.5, 0.75]$.  
The number of qubits to use is $(\text{number of qubits used for encoding}) \times (\text{number of variables}) = 2 \times 3 = 6$.

In [167]:
K = 2 # bit number for weight
d = 3 # Number of features

p = [2 ** (-i-1) for i in range(K)]
I = np.eye(d)
P = np.kron(I, p)

In [169]:
A = np.dot(np.dot(P.T, X.T), np.dot(X, P))
b = -2 * np.dot(np.dot(P.T, X.T), y)
QUBO = A + np.diag(b)

In [320]:
from collections import defaultdict
import dimod

Q = defaultdict(int)

for i in range(QUBO.shape[0]):
    for j in range(QUBO.shape[1]):
        Q[(i,j)]+= QUBO[i,j]
        
bqm = dimod.AdjDictBQM('BINARY')
bqm_QUBO = bqm.from_qubo(Q)

ising = bqm.from_qubo(Q).to_ising()
J = np.zeros((len(ising[0]), len(ising[0])))
for i in range(len(ising[0])):
    J[i, i] = ising[0][i]

for i in range(len(ising[0])):
    for j in range(i, len(ising[0])):
        if i != j:
            J[i, j] = ising[1][(i, j)]

In [340]:
a = Opt()

a.J = J
res = a.run(shots = 10, sampler = 'fast')

In [341]:
res

[[0, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 0],
 [1, 0, 1, 1, 0, 1],
 [1, 0, 1, 0, 1, 0],
 [0, 1, 1, 1, 1, 0]]

Convert the result of annealing into the value of the weight parameter and compare it with the value that was originally set.

In [342]:
w_qa = P @ res[0]

print("Predicted weight:", w_qa)
print("True weight:", w)

Predicted weight: [0.25 0.75 0.5 ]
True weight: [0.25 0.75 0.5 ]


It was correctly predicted.  
Here, the encoding is determined according to the value of the originally set weight $\boldsymbol{w}$, so it can be estimated exactly.  

In reality, there is an error due to the encoding.  
It is necessary to prepare more qubits to reduce the encoding error.