# The Set Cover Problem BQM

CDL Quantum Hackathon 2021  
Team ZebraKet   
Ziwei Qiu (ziweiqiu@g.harvard.edu), Alex Khan, Theo Cleland, Ehsan Torabizadeh

# Problem Definition

As a grocery store manager, you want to re-stock. This notebook helps you decide which suppliers to buy all your items you need. 

Your inventory has $M$ items in total and there are $N$ suppliers in the market. Each supplier only provides a subset of items so you have to purchase from multiple suppliers. You want to minimize the number of suppliers, since more suppliers add overhead expenses in the real world (such as negoitation, traveling, etc.). The constraint is that the union of the items provided from the suppliers you chose is equal to your inventory. Therefore, this is a set cover problem.

### QUBO Representation

We define two set of variables in this problem [1]:  
(1) $x_\alpha$ is a **discrete** variable which denotes the quantity of item $\alpha$ you decide to purchase.  
(2) $y_{\alpha,m}$ is a **binary** variable which equals to 1 if among the suppliers you choose, there are $m$ of suppliers have the item $\alpha$ available.   

The following Hamiltonian represents the problem:
$$H = A\sum_{\alpha=1}^{M}\left(1-\sum_{m=1}^{N}y_{\alpha,m}\right)^2 + A\sum_{\alpha=1}^{M}\left(\sum_{m=1}^{N}my_{\alpha,m}-\sum_{i:\alpha\in V_i}x_i \right)^2 + 
B\sum_{i=1}^{N} x_i$$

 $$\sum_{i:\alpha\in V_i}x_i = \sum_{i=1}^{N} x_i I_{i,\alpha}$$
 
,where $I_{i,\alpha}$ is an indicator variable, which equals to 1 if item $\alpha$ is provided by supplier $i$.

The first term enforces exactly one $y_{\alpha,m}$ equals 1 to guarantee this is a valid solution. The second term represents the contraints you need to cover all the items in the inventory (or universe). The third term in minimizes the number of suppliers. We need to satisfy $A>B>0$ in order to get valid solutions. 

We expand the Hamiltonian $H$ to get the linear and quadratic terms in the QUBO representation.

$$H=\sum_{\alpha=1}^{M}\sum_{m=1}^{N}A(m^2-1)y_{\alpha,m}+\sum_{i=1}^{N}\left(\sum_{\alpha=1}^{M}AI_{i,\alpha}+B \right)x_i+2A\sum_{\alpha=1}^{M}\left[\sum_{m,n}(1+mn)y_{\alpha,m}y_{\alpha,n}+\sum_{i,j}I_{i,\alpha}I_{j,\alpha}x_ix_j-\sum_{m,i}my_{\alpha,m}I_{i,\alpha}x_i\right]$$

The first two terms are **linear** terms and the rest are **quadratic** terms.


In [1]:
# Ziwei Qiu, ziweiqiu@g.harvard.edu
import os
os.chdir('..')
from dimod import BinaryQuadraticModel
from dimod import ExactSolver
from neal import SimulatedAnnealingSampler
from itertools import combinations
from dwave.system import LeapHybridSampler
import numpy as np
import pandas as pd
from utils.data import read_inventory_optimization_data
from services.classical_optimizers import binary_supplier_optimizer

In [2]:
def build_setcover_bqm(U, V, verbose = False):
    """Construct BQM for the set cover problem
    Args:
        U (array-like):
            A set of elements defining the universe
        V (array of sets):
            Array of subsets
    Returns:
        Binary quadratic model instance
        x: variable
    """
    
    # Create indicator variables
    I = []
    for i in range(len(V)):
        I.append([1 if U[a] in V[i] else 0 for a in range(len(U))])
    
    if verbose:
        print('Indicator variables: I_i,a',I)
    
    # Lagrange multipliers A>B>0
    A = 2
    B = 1
    
    ##@  Binary Quadratic Model @##
    bqm = BinaryQuadraticModel('BINARY')

    # Add linear terms
    # x linear terms
    x = [bqm.add_variable('x_'+str(i+1), A*sum(I[i])+B) for i in range(0,len(V))]
    if verbose:
        print('x variables:',x)

    # y_am linear terms
    y = []
    for a in range(1,len(U)+1):
        y.append([bqm.add_variable('y_('+str(a)+', '+str(m)+')', A*(m**2-1)) for m in range(1,len(V)+1)])
    if verbose:
        print('y variables:',y)

    # Add quadratic terms

    # x_i-x_j terms
    for i in range(1,len(V)+1):
        for j in range(i+1,len(V)+1):
            key = ('x_' + str(i), 'x_' + str(j))
            bqm.quadratic[key] = 2*A*np.dot(np.array(I[i-1]),np.array(I[j-1]))

    # y_am - y_an terms
    for m in range(1,len(V)+1):
        for n in range(m+1,len(V)+1):
            for a in range(1,len(U)+1):
                key = ('y_('+str(a)+', '+str(m)+')', 'y_('+str(a)+', '+str(n)+')')
                bqm.quadratic[key] = 2*A*(1+m*n)

    # x_i-y_am terms
    for i in range(1,len(V)+1):
        for m in range(1,len(V)+1):
            for a in range(1,len(U)+1):
                key = ('x_' + str(i), 'y_('+str(a)+', '+str(m)+')')
                bqm.quadratic[key] = -2*A*m*I[i-1][a-1]
    return bqm, x

def solve_bqm(bqm, x, sampler, **kwargs):
    response = sampler.sample(bqm, **kwargs)
    energies = response.record.energy
    best_energy = energies[0]
    best_solution = response.first.sample
    best_solution = [best_solution[i] for i in x]
    print(best_solution)
    print(f'Energy: {best_energy}')
    
    return best_solution, best_energy

def display_classical_solution(classical_solution, supplier_data):
    print('\nSolution (Classical Algorithm):')
    print('There are {:d} suppliers selected.'.format(len(classical_solution)))
    idx_supplier = [index for index, data in enumerate(supplier_data) if len([s for s in classical_solution if s == data]) > 0]
    suppliers = [f'supplier{i}' for i in idx_supplier]
    print('Selected Suppliers:', suppliers)

def display_data(inventory, supplier_inventory):
    print('There are {:d} items we need to source in our inventory.'.format(len(inventory)))
    print('There are {:d} suppliers.'.format(len(supplier_inventory)))
    print('Inventory:')
    print(inventory)
    print('\nSupplier data:')
    for idx, supplier_data in enumerate(supplier_inventory):
        print(f'supplier{idx}: ', supplier_data)

# Implementation

In [3]:
# Define a simple set cover problem
U = list(set(np.random.randint(10, size=(10))))

V = [set(U[i] for i in np.random.randint(len(U), size=(8))) for j in range(5)]

print('The universe is',U)
print('Number of elements in the universe: {:d}'.format(len(U)))

print('There are {:d} collections:'.format(len(V)),V)
print('Number of sets: N={:d}'.format(len(V)))

The universe is [0, 2, 3, 4, 9]
Number of elements in the universe: 5
There are 5 collections: [{0, 2, 3, 4, 9}, {0, 9, 2}, {0, 2, 3, 4}, {0, 9, 2, 3}, {0, 2, 3, 4, 9}]
Number of sets: N=5


### Solve the Set Cover Problem with Simulated Annealing

In [4]:
bqm,x = build_setcover_bqm(U, V)
best_solution = solve_bqm(bqm, x, SimulatedAnnealingSampler())

[0, 1, 0, 1, 1]
Energy: -7.0


### Solve the Set Cover Problem with Quantum Annealing (Leap Hybrid Solver)

In [5]:
bqm,x = build_setcover_bqm(U, V)
best_solution = solve_bqm(bqm, x, LeapHybridSampler())

[0, 0, 0, 0, 1]
Energy: -9.0


# Grocery Data 
## Small dataset

In [6]:
inventory, supplier_inventory = read_inventory_optimization_data(os.path.join(os.getcwd(),'data/small-cost-mock.csv'))
# Build the BQM
bqm,x = build_setcover_bqm(inventory, supplier_inventory)
print('There are {:d} items in the universe.\n'.format(len(inventory)))
print('There are {:d} suppliers.\n'.format(len(supplier_inventory)))

There are 20 items in the universe.

There are 10 suppliers.



In [7]:
# Quantum Annealing
print('Solution (Hybrid):')
best_solution, best_energy = solve_bqm(bqm, x, LeapHybridSampler())
print('There are {:d} suppliers selected.'.format(sum(best_solution)))
suppliers = [f'supplier{i}' for i in np.where(best_solution)[0]]
print('Selected Suppliers:', suppliers)

Solution (Hybrid):
[0, 0, 0, 1, 1, 0, 1, 0, 0, 0]
Energy: -23.0
There are 3 suppliers selected.
Selected Suppliers: ['supplier3', 'supplier4', 'supplier6']


In [8]:
# Simulated Annealing
print('Solution (Simulated Annealing):')
best_solution, best_energy = solve_bqm(bqm, x, SimulatedAnnealingSampler(), **{"num_reads":100, "num_sweeps": 1000})
print('There are {:d} suppliers selected.'.format(sum(best_solution)))
suppliers = [f'supplier{i}' for i in np.where(best_solution)[0]]
print('Selected Suppliers:', suppliers)

Solution (Simulated Annealing):
[1, 1, 0, 1, 1, 0, 1, 0, 1, 1]
Energy: 14.0
There are 7 suppliers selected.
Selected Suppliers: ['supplier0', 'supplier1', 'supplier3', 'supplier4', 'supplier6', 'supplier8', 'supplier9']


In [9]:
# Classical Algo
best_solution_classical = binary_supplier_optimizer(inventory, supplier_inventory)
display_classical_solution(best_solution_classical, supplier_inventory)


Solution (Classical Algorithm):
There are 2 suppliers selected.
Selected Suppliers: ['supplier0', 'supplier7']


## Medium dataset

In [10]:
inventory, supplier_inventory = read_inventory_optimization_data(os.path.join(os.getcwd(),'data/medium-cost-mock.csv'))
# Build the BQM
bqm,x = build_setcover_bqm(inventory, supplier_inventory)
print('There are {:d} items in the universe.\n'.format(len(inventory)))
print('There are {:d} suppliers.\n'.format(len(supplier_inventory)))

There are 100 items in the universe.

There are 40 suppliers.



In [13]:
# Quantum Annealing
print('Solution (Hybrid):')
best_solution, best_energy = solve_bqm(bqm, x, LeapHybridSampler())
print('There are {:d} suppliers selected.'.format(sum(best_solution)))
suppliers = [f'supplier{i}' for i in np.where(best_solution)[0]]
print('Selected Suppliers:', suppliers)

Solution (Hybrid):
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]
Energy: 888.0
There are 36 suppliers selected.
Selected Suppliers: ['supplier0', 'supplier1', 'supplier2', 'supplier3', 'supplier4', 'supplier5', 'supplier6', 'supplier7', 'supplier8', 'supplier10', 'supplier11', 'supplier14', 'supplier15', 'supplier16', 'supplier17', 'supplier18', 'supplier19', 'supplier20', 'supplier21', 'supplier22', 'supplier23', 'supplier24', 'supplier25', 'supplier26', 'supplier27', 'supplier28', 'supplier29', 'supplier30', 'supplier31', 'supplier32', 'supplier34', 'supplier35', 'supplier36', 'supplier37', 'supplier38', 'supplier39']


In [14]:
# Simulated Annealing
print('Solution (Simulated Annealing):')
best_solution, best_energy = solve_bqm(bqm, x, SimulatedAnnealingSampler(), **{"num_reads":100, "num_sweeps": 1000})
print('There are {:d} suppliers selected.'.format(sum(best_solution)))
suppliers = [f'supplier{i}' for i in np.where(best_solution)[0]]
print('Selected Suppliers:', suppliers)

Solution (Simulated Annealing):
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]
Energy: 1111.0
There are 36 suppliers selected.
Selected Suppliers: ['supplier0', 'supplier1', 'supplier2', 'supplier3', 'supplier4', 'supplier5', 'supplier6', 'supplier7', 'supplier8', 'supplier9', 'supplier10', 'supplier13', 'supplier14', 'supplier16', 'supplier17', 'supplier18', 'supplier19', 'supplier20', 'supplier21', 'supplier22', 'supplier23', 'supplier24', 'supplier25', 'supplier26', 'supplier27', 'supplier28', 'supplier29', 'supplier30', 'supplier31', 'supplier32', 'supplier34', 'supplier35', 'supplier36', 'supplier37', 'supplier38', 'supplier39']


In [15]:
# Classical Algo
best_solution_classical = binary_supplier_optimizer(inventory, supplier_inventory)
display_classical_solution(best_solution_classical, supplier_inventory)


Solution (Classical Algorithm):
There are 2 suppliers selected.
Selected Suppliers: ['supplier2', 'supplier22']


## Large dataset

In [16]:
inventory, supplier_inventory = read_inventory_optimization_data(os.path.join(os.getcwd(),'data/large-cost-mock.csv'))
# Build the BQM
bqm,x = build_setcover_bqm(inventory, supplier_inventory)
print('There are {:d} items in the universe.\n'.format(len(inventory)))
print('There are {:d} suppliers.\n'.format(len(supplier_inventory)))

There are 200 items in the universe.

There are 80 suppliers.



In [17]:
# Quantum Annealing
print('Solution (Hybrid):')
best_solution, best_energy = solve_bqm(bqm, x, LeapHybridSampler())
print('There are {:d} suppliers selected.'.format(sum(best_solution)))
suppliers = [f'supplier{i}' for i in np.where(best_solution)[0]]
print('Selected Suppliers:', suppliers)

Solution (Hybrid):
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Energy: 3694.0
There are 78 suppliers selected.
Selected Suppliers: ['supplier0', 'supplier1', 'supplier2', 'supplier3', 'supplier4', 'supplier5', 'supplier6', 'supplier7', 'supplier8', 'supplier9', 'supplier10', 'supplier11', 'supplier12', 'supplier13', 'supplier14', 'supplier15', 'supplier16', 'supplier17', 'supplier18', 'supplier19', 'supplier21', 'supplier22', 'supplier23', 'supplier24', 'supplier25', 'supplier26', 'supplier27', 'supplier28', 'supplier29', 'supplier30', 'supplier31', 'supplier32', 'supplier33', 'supplier34', 'supplier35', 'supplier36', 'supplier37', 'supplier38', 'supplier39', 'supplier40', 'supplier41', 'supplier42', 'supplier43', 'supplier44', 'supplier45', 'supplier46', 'supplier47', 'supplier48', 'supplier

In [18]:
# Simulated Annealing
print('Solution (Simulated Annealing):')
best_solution, best_energy = solve_bqm(bqm, x, SimulatedAnnealingSampler(), **{"num_reads":50, "num_sweeps": 100})
print('There are {:d} suppliers selected.'.format(sum(best_solution)))
suppliers = [f'supplier{i}' for i in np.where(best_solution)[0]]
print('Selected Suppliers:', suppliers)

Solution (Simulated Annealing):
[0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1]
Energy: 5354.0
There are 76 suppliers selected.
Selected Suppliers: ['supplier1', 'supplier2', 'supplier4', 'supplier5', 'supplier6', 'supplier7', 'supplier8', 'supplier9', 'supplier10', 'supplier11', 'supplier12', 'supplier13', 'supplier14', 'supplier15', 'supplier17', 'supplier18', 'supplier19', 'supplier20', 'supplier21', 'supplier22', 'supplier23', 'supplier24', 'supplier25', 'supplier26', 'supplier27', 'supplier28', 'supplier29', 'supplier30', 'supplier31', 'supplier32', 'supplier33', 'supplier34', 'supplier35', 'supplier36', 'supplier37', 'supplier38', 'supplier39', 'supplier40', 'supplier41', 'supplier42', 'supplier43', 'supplier44', 'supplier45', 'supplier46', 'supplier47', 'supplier48', 'supplier49', 'supplie

In [19]:
# Classical Algo
best_solution_classical = binary_supplier_optimizer(inventory, supplier_inventory)
display_classical_solution(best_solution_classical, supplier_inventory)


Solution (Classical Algorithm):
There are 3 suppliers selected.
Selected Suppliers: ['supplier0', 'supplier6', 'supplier13']
