Notebook version 1.0, 31 Aug 2021. Written by Otto Salmenkivi / CSC - IT Center for Science Ltd. otto.salmenkivi@gmail.com

Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
***
# A quantum solution for housing market, CSC and OP Lab


In this notebook we use the Quantum Amplitude Estimation (QAE) algorithm for predicting the Finnish housing market. In a nutshell, we have data about the average price per square metre from different postal codes within Helsinki from 2010 to 2020. We say that the average price for 2021 is given by the average price of 2020 multiplied by an annual growth ratio. From the data, we can recover probability distrubutions for the two variables, based on which we can calculate an expected value for the prediction. Classical calculations give a value of 5009 EUR. While the problem is somewhat trivial, it serves as a good quantum demonstration.

This example is a prime candidate for QAE, because classically it could be solved with Monte Carlo methods. QAE promises a quadratically faster convergence to the exact value with respect to sampling times.

QAE was first introduced by [Brassard et al.][1] in 2000. In the financial sector, the use of QAE has been demonstrated for example in [risk analysis][2] and [option pricing][3].

[1]: https://arxiv.org/abs/quant-ph/0005055
[2]: https://www.nature.com/articles/s41534-019-0130-6\\
[3]: https://arxiv.org/abs/1905.02666


## Import and process housing market data

In [None]:
import csv
import numpy as np
import matplotlib.pyplot as plt
import time

We import the data from CSV files.

In [None]:
# Importing data from CSV files and creating arrays
hki_2020 = []
with open('Hki_hintadata_2020.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';')
    line_count = 0
    for row in csv_reader:
        hki_2020.append(float(row[0]))
        line_count += 1
    print(f'Processed {line_count} lines.')
hki_2020 = np.array(hki_2020)

hki_growth = []
with open('Hki_kasvudata.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';')
    line_count = 0
    for row in csv_reader:
        hki_growth.append(float(row[0].replace(',','.')))
        line_count += 1
    print(f'Processed {line_count} lines.')
hki_growth = np.array(hki_growth)

We bin the variable data to eight bins.

In [None]:
# Function to bin the data, np.histogram could also be used
def create_bins(array, nbbins):
    minimum = min(array)
    maximum = max(array)
    breakpoints = np.linspace(minimum, maximum, nbbins, endpoint = False)
    counts = np.zeros((nbbins))
    
    for value in array:
        for i in range(len(breakpoints)):
            if value >= breakpoints[len(breakpoints)-1-i]:
                counts[len(counts)-1-i] += 1
                break
    return np.array([breakpoints, counts])

In [None]:
# Binning the data and counting corresponding probablities 
price_bins = create_bins(hki_2020,8)
sum_price_freqs = sum(price_bins[1])
price_probs = [freq/sum_price_freqs for freq in price_bins[1]]

growth_bins = create_bins(hki_growth,8)
sum_growth_freqs = sum(growth_bins[1])
growth_probs = [freq/sum_growth_freqs for freq in growth_bins[1]]

print('--These values are used for numerical analysis--')
print(f'Growth function constant : {growth_bins[0][0]+ (growth_bins[0][1]-growth_bins[0][0])/2}')
print(f'Growth function coefficient: {(growth_bins[0][1]-growth_bins[0][0])}')
print(f'Price function constant : {price_bins[0][0]+ (price_bins[0][1]-price_bins[0][0])/2}')
print(f'Price function coefficient: {(price_bins[0][1]-price_bins[0][0])}')

For illustrative purposes, we create pretty figures for each variable.

In [None]:
# plot for 2020 price bins

#strings for states
states = [bin(i)[2:].zfill(3).join(('|','>')) for i in range(8)]

# Weights to normalise the histograms.
priceweights = np.ones(len(hki_2020))/sum_price_freqs
growthweights = np.ones(len(hki_growth))/sum_growth_freqs

plt.figure()
plt.hist(hki_2020,bins = 8,
         weights = priceweights,
         color = 'lightskyblue',
         ec = 'black')
plt.ylim(0,0.32)
plt.ylabel('Probability',fontsize = 13)
plt.xlabel(r'Average price for square metre in euros', fontsize = 13)


plt.plot(price_bins[0]+799/2,price_probs, marker = 'o',c = 'black', ls= '--', lw ='1')
plt.text(price_bins[0][0]+799/2 - 250, price_probs[0]-0.02, states[0])
plt.text(price_bins[0][1]+799/2 - 250, price_probs[1]+0.01, states[1])
plt.text(price_bins[0][2]+799/2 - 250, price_probs[2]-0.02, states[2])
plt.text(price_bins[0][3]+799/2 - 250, price_probs[3]+0.01, states[3])
plt.text(price_bins[0][4]+799/2 - 250, price_probs[4]-0.02, states[4])
plt.text(price_bins[0][5]+799/2 - 250, price_probs[5]+0.01, states[5])
plt.text(price_bins[0][6]+799/2 - 250, price_probs[6]+0.015, states[6])
plt.text(price_bins[0][7]+799/2 - 250, price_probs[7]+0.02, states[7])

for i in range(8):
    lowertext = '{:.0f}'.format(price_bins[0][i])
    uppertext = '{:.0f}'.format(price_bins[0][i]+798)
    text = '-\n'.join((lowertext,uppertext))
    plt.text(price_bins[0][i]+799/2 - 230,0.007,text, fontsize = '9')

#plt.savefig('price_figure', dpi = 300)
plt.show()

# Plot for 2010-2020 growth bins
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)

plt.figure()
plt.hist(hki_growth,
         bins = 8,
         weights = growthweights,
         color = 'orange',
         ec = 'black')

plt.ylim(0,0.47)
plt.ylabel('Probability', fontsize = 13)
plt.xlabel('Growth factor', fontsize = 13)
plt.tick_params(axis='both', which='major', labelsize='11')

plt.plot(growth_bins[0]+0.051518362/2,growth_probs, marker = 'o',c = 'black', ls= '--', lw ='1')
for i, state in enumerate(states):
    if i == 2 or i == 5:
        plt.text(growth_bins[0][i]+0.01, growth_probs[i]-0.04, state)
        continue
    if i == 1:
        plt.text(growth_bins[0][i]+0.01, growth_probs[i]+0.035, state)
        continue
    plt.text(growth_bins[0][i]+0.01, growth_probs[i]+0.02, state)
plt.axes().xaxis.set_minor_locator(MultipleLocator(0.02))
#plt.savefig('growth_factor_figure', dpi = 300)
plt.show()

The variables are linearly mapped to the quantum states. This also allows for the contruction of the function for 2021 price, and consequently its normalisation to [0,1], as required for the QAE algorithm. 

In [None]:
# The mapping used to for discrete qubit states
def price_mapping(growth_var,price_var):
    # first order approximation of the final value based on mapping
    return 1817.1710025 + 284.12916*growth_var + 826.066125*price_var
def scaled_mapping(growth_var, price_var):
    fmin = price_mapping(0,0)
    fmax = price_mapping(7,7)
    return (price_mapping(growth_var, price_var)-fmin)/(fmax-fmin)

print(f'Original values between {price_mapping(0,0)} and {price_mapping(7,7)} is mapped linearly ',
      f'to values between {scaled_mapping(0,0)} and {scaled_mapping(7,7)}')

# Quantum algorithm
Next up, we build the QAE circuit.

In [None]:
from qat.lang.AQASM import Program, QRoutine, AbstractGate, RY, H, CNOT, build_gate
from qat.lang.AQASM.qftarith import QFT
from qat.qpus import LinAlg

### 1. Distribution loading based on housing data

In [None]:
# Three qubits for each variable
nb_growth_qbits = 3
nb_price_qbits = 3

In [None]:
# We could use these function to define real circuits for ditsrubution loading,
# but this time we cheat a little by using the matrix formation.

@build_gate('GROWTH',[])
def growth_routine() -> QRoutine:
    probs = growth_probs
    nb_qbits = nb_growth_qbits
    amps = [np.sqrt(prob) for prob in probs]

    # Initialise a matrix with amps as first column
    matrix = np.identity(2**nb_qbits)
    for i,prob in enumerate(probs):
        matrix[i][0] = amps[i]
    
    # QR decomposition of the matrix to get unitary., minus to make amplitudes positive
    dist_matrix = np.linalg.qr(matrix)[0]

    # Turn that matrix into a gate
    dist_gate = AbstractGate('GROWTH2',[],arity = nb_qbits, matrix_generator = lambda: dist_matrix)
    
    # Apply that gate in a routine
    rout = QRoutine()
    rout.apply(dist_gate(), range(nb_qbits))
    return rout

@build_gate('PRICE',[])
def price_routine() -> QRoutine:
    probs = price_probs
    nb_qbits = nb_price_qbits
    
    amps = [np.sqrt(prob) for prob in probs]
    
    # Initialise a matrix with amps as first column
    matrix = np.identity(2**nb_qbits)
    for i,prob in enumerate(probs):
        matrix[i][0] = amps[i]
    
    # QR decomposition of the matrix to get unitary
    dist_matrix = np.linalg.qr(matrix)[0]
    # Checking for unitarity

    # Turn that matrix into a gate
    dist_gate = AbstractGate('PRICE2',[],arity = nb_qbits, matrix_generator = lambda: dist_matrix)
    
    # Apply that gate in a routine
    rout = QRoutine()
    rout.apply(dist_gate(), range(nb_qbits))
    return rout

### 2. The objective function

This stucture is responsible for multiplying the variables. The key is to rotate an additional objective qubit based on the distributions. The probability to measure the objective qubit in state |1>, we call $a$, codes for the information we want.

In [None]:
c = 1/2      #  c is a parameter needed in the estimation of the expected value, defined within (0,1]
f_min = price_mapping(0,0)
f_max = price_mapping(7,7)

@build_gate('F',[float,float,float])
def F_routine(c:float, f_min:float, f_max:float) -> QRoutine:
    # for clarity we define the cotrolled gate angles individually
    constant = (-c/2 + np.pi/4) *2
    # For x variable 
    xfactor = 0.036561027
    control0 = 4*xfactor*c *2
    control1 = 2*xfactor*c *2
    control2 = xfactor*c *2
    
    # For y variable
    yfactor = 0.1062961155
    control3 = 4*yfactor*c *2
    control4 = 2*yfactor*c *2
    control5 = yfactor*c *2
    
    rout = QRoutine()
    rout.apply(RY(constant), 6)
    rout.apply(RY(control0).ctrl(), 0, 6)
    rout.apply(RY(control1).ctrl(), 1, 6)
    rout.apply(RY(control2).ctrl(), 2, 6)
    rout.apply(RY(control3).ctrl(), 3, 6)
    rout.apply(RY(control4).ctrl(), 4, 6)
    rout.apply(RY(control5).ctrl(), 5, 6)
    
    return rout

### Trying out the distribution and objective

We can run the routines we have created so far to make sure they work as expected.

In [None]:
kok_prog = Program()   # Initialize a quantum program
kok_qbits = kok_prog.qalloc(7)   # Allocate 7 qubits
kok_prog.apply(price_routine(),kok_qbits[3:6])  # Apply first distribution
kok_prog.apply(growth_routine(), kok_qbits[0:3])  # And the second
kok_prog.apply(F_routine(c,f_min,f_max),kok_qbits) # Apply objebtive routine

kok_circ = kok_prog.to_circ() # turn into a circuit

#display the circuit
%qatdisplay kok_circ --depth 1

kok_result = LinAlg().submit(kok_circ.to_job(qubits= [6]))  # Run on quantum simulator

# Print results
print('Last qubit is measured in :')
for sample in kok_result:
    print(f'State: {sample.state},  amplitude: {sample.amplitude},  probability: {sample.probability}')
    
# Visualizing the initial measurement results
plt.figure()
states = [str(sample.state) for sample in kok_result]
probs = [sample.probability for sample in kok_result]
plt.bar(states,probs)
plt.ylabel('Probability')
plt.draw()

### A gate
The $A$ gate is defined as the total effect of the distribution loading and the objective routine, meaning the circuit above. To make the construction of the $Q$ gate easier, we define it next.

In [None]:
# Combine distribution and objective routines to a single routine A in order to build Q
@build_gate('A',[])
def A_routine() -> QRoutine:
    rout = QRoutine()
    if nb_growth_qbits > 0:rout.apply(growth_routine(),range(nb_growth_qbits))
    if nb_price_qbits > 0:rout.apply(price_routine(),range(nb_growth_qbits,nb_growth_qbits+nb_price_qbits))
    rout.apply(F_routine(c,f_min,f_max), range(nb_growth_qbits+nb_price_qbits+1))
    return rout

### 3. Quantum Phase Estimation circuit
QPE is well used tool in the world of quantum algorithms. For it we need additional evaluation qubits, the $Q$ gate and an inverse Quantum Fourier Transform.

In [None]:
# Choose number of evaluation qubits
nb_eval_qbits = 5

In [None]:
# A quick function to apply Hadamards to all eval qubits
def hadamard_all(nbqbits):
    rout = QRoutine()
    for i in range(nbqbits):
        rout.apply(H,i)
    return rout

In [None]:
# This function is used to run the A gate on quantum simulator and to retrive the amplitude of state psi,
# which is the state of the rest of the qubits when the last qubits is 0.
# This could not be implemented with real quantum processors, since we take advantage of the simulated amplitudes.
    
def get_psi():
    prog = Program()
    qbits = prog.qalloc(nb_growth_qbits+nb_price_qbits+1)
    prog.apply(A_routine(), qbits)
    A_result = LinAlg().submit(prog.to_circ().to_job())
    statevector = A_result.statevector
    
    # Extracting psi
    psi = np.empty((2**(nb_growth_qbits+nb_price_qbits),1),dtype= complex)
    for i in range(0,2**(nb_growth_qbits+nb_price_qbits)):
        psi[i][0] = statevector[2*i]

    # Normilize statevector
    psi_sum = sum([abs(i)**2 for i in psi])
    
    # Based on psi, we can actually retrive the value of a, which we are evaluating in the first place
    print(f'Because we are simulating the circuit, we can calculate that the objective qubit is measured as:\n'
        f'0, with probability {psi_sum[0]}\n'
        f'1, with probability {1-psi_sum[0]}\n'
         f'The second value is the one we are trying to approximate.')
    for i in range(len(psi)):
        psi[i][0] = psi[i][0]/np.sqrt(psi_sum)
        
    return psi

In [None]:
# We calculate the matrices of the individual operations within Q = A * S_0 * A^dag * S_psi
def Q_routine(nb_eval_qbits: int) -> QRoutine:
    # S_0
    #Preparing states |0>_n and <0|_n
    ket_zero_n = np.zeros((2**(nb_growth_qbits+nb_price_qbits+1),1),dtype= complex)
    ket_zero_n[0][0] = 1
    bra_zero_n = np.conjugate(np.transpose(ket_zero_n))
    
    def S_0_matrix():
        return np.identity(2**(nb_growth_qbits+nb_price_qbits+1))-2*np.matmul(ket_zero_n,bra_zero_n)
    # Building a corresponding gate
    S_0_gate = AbstractGate('S_0',[],arity = nb_growth_qbits+nb_price_qbits+1, matrix_generator = S_0_matrix)
    
    # S_psi
    # Preparing the needed states |0> and <0| as numpy arrays
    ket_zero = np.array([[1],[0]],dtype=complex)
    bra_zero = np.conjugate(np.transpose(ket_zero))

    # Run a simulation of A to get psi 
    ket_psi = get_psi()
    bra_psi = np.conjugate(np.transpose(ket_psi))

    # Calculate the matrix
    def S_psi_matrix():
        return np.identity(2**(nb_growth_qbits+nb_price_qbits+1))-2*np.matmul(np.kron(ket_psi,ket_zero),np.kron(bra_psi,bra_zero))
    # Build corresponding gate
    S_psi_gate = AbstractGate('S_{\psi}',[],arity = nb_growth_qbits+nb_price_qbits+1, matrix_generator = S_psi_matrix)
    
    # Build individual Q gates based on power 2**j
    @build_gate('Q^j',[int],arity = nb_growth_qbits+nb_price_qbits+1)
    def Q_to_j(j:'power of Q')-> QRoutine():
        A_rout = A_routine()
        Qj_rout = QRoutine()
        i = 0
        while i < 2**j:
            Qj_rout.apply(S_psi_gate(),range(nb_growth_qbits+nb_price_qbits+1))
            Qj_rout.apply(A_rout.dag(),range(nb_growth_qbits+nb_price_qbits+1))
            Qj_rout.apply(S_0_gate(),range(nb_growth_qbits+nb_price_qbits+1))
            Qj_rout.apply(A_rout,range(nb_growth_qbits+nb_price_qbits+1))
            i += 1
        return Qj_rout
            
            
    # The routine of controlled Q gates
    rout = QRoutine()
    for j in range(nb_eval_qbits):
        rout.apply(Q_to_j(j).ctrl(), nb_growth_qbits + nb_price_qbits+1+j, range(nb_growth_qbits+nb_price_qbits+1))
    
    return rout

## Building the circuit
Finally, we build the whole quantum circuit.

In [None]:
prog = Program()

dist_qbits = prog.qalloc(nb_growth_qbits + nb_price_qbits)
obj_qbit = prog.qalloc(1)
eval_qbits = prog.qalloc(nb_eval_qbits)

prog.apply(A_routine(),dist_qbits, obj_qbit)

prog.apply(hadamard_all(nb_eval_qbits),eval_qbits)

prog.apply(Q_routine(nb_eval_qbits), dist_qbits, obj_qbit, eval_qbits)

prog.apply(QFT(nb_eval_qbits).dag(), eval_qbits)

In [None]:
# display the whole circuit
circ = prog.to_circ()
%qatdisplay circ

## Running the full circuit on a LinAlg-simulator
We measure only the evaluation qubits because with that information we can retrieve an approximation for $a$. For curiosity, we time the simulation.

In [None]:
tic = time.perf_counter()
result = LinAlg().submit(circ.to_job(
                                    nbshots = 0,
                                    qubits = range(nb_growth_qbits+nb_price_qbits+1,nb_growth_qbits+nb_price_qbits+1+nb_eval_qbits)
                                    ))
toc = time.perf_counter()
sim_time = toc-tic
print('Simulation time in seconds:',sim_time)

Mesurement result for the evaluation qubits

In [None]:
for sample in result:
    print(f'State: {sample.state},  amplitude: {sample.amplitude},  probability: {sample.probability}')
    
# Visualizing the initial measurement results
import matplotlib.pyplot as plt
plt.figure()
states = [str(sample.state) for sample in result]
probs = [sample.probability for sample in result]
plt.bar(states,probs)
plt.ylabel('Probability')
plt.xticks(rotation = 60)
plt.draw()

## Post-processing the circuit measurement

Some numerical post-processing is needed to get the approximation for $a$. First, the binary measurement result from the evaluation qubits is mapped to an approximation $\tilde{a} = \sin^2(y\pi/M) \in [0,1]$.

In [None]:
# Creating an array of zeros for all state probabilities and replacing the non-zero values for measured probabilities 
all_probs = np.zeros(2**nb_eval_qbits, dtype=float)
for sample in result:
    state_decimal = sample.state.int
    all_probs[state_decimal] = sample.probability

# The mapping used between measured states and corresponding estimate for the probability p
a_tildes = [np.sin(i*np.pi/(2**nb_eval_qbits))**2 for i in range(2**(nb_eval_qbits-1)+1)]
#print(f'Possible discrete values for estimator a: \n {a_tildes}')

# Aggregating the data from different states that correspond to same probability bins
probs =[]
probs.append(all_probs[0])
i = 1
while i < 2**nb_eval_qbits/2:
    #print(f'These states correspond to the same bin: {i} and {2**nb_eval_qbits-i}')
    probs.append(all_probs[i] + all_probs[2**nb_eval_qbits-i])
    i += 1
probs.append(all_probs[2**(nb_eval_qbits-1)])


# finding a with highest probability
lucky_a_tilde = a_tildes[np.argmax(probs)]
print(f'The algorithm gives an estimate a = {lucky_a_tilde}')

# Plotting the results for a
plt.figure()
plt.bar(a_tildes, probs, width = 0.005)
plt.ylabel('Probability')
plt.xlabel('Estimator for a')
plt.show()

Then $\tilde{a}$ is mapped back to the original function values.

In [None]:
def ExpectedFx(a):
    scaled_value = (a-1/2)/c + 1/2
    value = scaled_value*(f_max-f_min) + f_min
    return value

# True expected value after Taylor approxmiations, calculated classically.
true_v = 4961

In [None]:
EFx = ExpectedFx(lucky_a_tilde)

print('---RESULTS---\n'
    f'From quantum simulation: Expected value for the 2021 price: {EFx}\n'
    f'Parameter c was set to {c}.\n'
    f'{nb_eval_qbits} evaluation qubits were used in the QPE routine.\n'
    f'The probability to measure the objective qubit in state 1 was approximated to {lucky_a_tilde}.')

In order to understand the effects of the number of evaluation qubits $m$ and the scaling parameter $c$, we can plot all the values that the algorithm could output, and check how many of those values are within realistic margins.

In [None]:
possible_v = [ExpectedFx(a) for a in a_tildes]
print(possible_v)
plt.scatter(a_tildes,possible_v,s=2)
plt.axhline(f_min, linestyle = '--', color='g')
plt.axhline(f_max, linestyle = '--', color='g')
plt.show()

## Finally, we plot the output from the experiment

In [None]:
plt.bar(possible_v,probs, width = 30)
plt.xlim(f_min, f_max)
plt.ylim(0,1)
plt.xlabel(r'$V_{2021}$', fontsize = '12')
plt.ylabel('Probability', fontsize = '12')
plt.tick_params(axis='both', which='major', labelsize='11')
textstr = '\n'.join((f'c = {c}',f'$m = ${nb_eval_qbits}', f'$M = ${2**nb_eval_qbits}', r'$\mathbb{E}(V_{2021}) = %.0f$' % np.round(EFx,0)))
plt.text(0.75*f_max,0.7, textstr, fontsize ='13')
plt.axvline(x = true_v,linestyle = '--', color = 'r')
#filename = f'housing_price_c{str(c).replace(".","")}_m{nb_eval_qbits}'
#plt.savefig(filename, dpi = 300)
plt.plot()

The red line is drawn at 4961 EUR, which is the classically calculated value after approximations used in the algorithm. The exact value was 5009 EUR. With different number of evaluation qubits and different value for $c$ the algorithm gives varying results, but in each case it finds a relatively good approximate. Go ahead and try the algorithm with different values of evaluation qubits and  $c \in (0,1]$. These are defined in the QPE and objective function sections.

# That's it!

Our demonstration of the Quantum Amplitude Estimation is now finished. 