Copyright **`(c)`** 2024 Giovanni Squillero `<giovanni.squillero@polito.it>`  
[`https://github.com/squillero/computational-intelligence`](https://github.com/squillero/computational-intelligence)  
Free for personal or classroom use; see [`LICENSE.md`](https://github.com/squillero/computational-intelligence/blob/master/LICENSE.md) for details.  

# Set Cover problem

See: https://en.wikipedia.org/wiki/Set_cover_problem

In [90]:
from random import random, seed
from itertools import product
import numpy as np

from icecream import ic

## Reproducible Initialization

If you want to get reproducible results, use `rng` (and restart the kernel); for non-reproducible ones, use `np.random`.

In [91]:
UNIVERSE_SIZE = 1000
NUM_SETS = 100
DENSITY = 0.2

rng = np.random.Generator(np.random.PCG64([UNIVERSE_SIZE, NUM_SETS, int(10_000 * DENSITY)]))

In [92]:
# DON'T EDIT THESE LINES!

SETS = np.random.random((NUM_SETS, UNIVERSE_SIZE)) < DENSITY
for s in range(UNIVERSE_SIZE):
    if not np.any(SETS[:, s]):
        SETS[np.random.randint(NUM_SETS), s] = True
COSTS = np.power(SETS.sum(axis=1), 1.1)

## Helper Functions

In [107]:
def valid(solution):
    """Checks wether solution is valid (ie. covers all universe)"""
    return np.all(np.logical_or.reduce(SETS[solution]))


def cost(solution):
    """Returns the cost of a solution (to be minimized)"""
    return COSTS[solution].sum()

## Have Fun!

In [103]:
# A dumb solution of "all" sets
solution = np.full(NUM_SETS, True)
valid(solution), cost(solution)

TypeError: 'numpy.bool' object is not callable

In [102]:
# A random solution with random 50% of the sets
solution = rng.random(NUM_SETS) < .5
valid(solution), cost(solution)

TypeError: 'numpy.bool' object is not callable

My Solutions

The following solution is structured in two parts:
1) Find the set that has the largest amout of elements
2) Iterate over the other sets and include in the min_set list all the sets that cover a missing element of the previous min_set sets.

In [108]:
min_sets=[]
not_covered = np.ones(UNIVERSE_SIZE, dtype=bool)

# Search for the set with the highest number of elements
best_set = None
best_set_covered = set()

for i in range(len(SETS)):
    covered_by_set = np.where(SETS[i] & not_covered)[0]
    if len(covered_by_set) > len(best_set_covered):
        best_set = i
        best_set_covered = covered_by_set

min_sets.append(best_set)
not_covered[best_set_covered] = False

# Add new sets that have some value of the not covered
while np.any(not_covered):
    for elem in np.where(not_covered)[0]:
        found = False
        for i in range(len(SETS)):
            if i in min_sets:
                continue
            if SETS[i][elem]:
                min_sets.append(i)
                not_covered[np.where(SETS[i])[0]] = False
                found = True
                break
        if found:
            break
        
print("Number of elements in min_sets:",len(min_sets))
print("Indices of the set taken and order:", min_sets)
valid(min_sets),cost(min_sets)


Number of elements in min_sets: 26
Indices of the set taken and order: [39, 5, 3, 1, 0, 9, 8, 14, 10, 19, 2, 6, 7, 4, 25, 13, 12, 33, 21, 11, 24, 15, 18, 28, 20, 22]
Selected sets: [3, 1, 9, 8, 14, 10, 19, 2, 6, 4, 25, 13, 12, 33, 21, 11, 24, 15, 18, 28, 20, 22]
Number of selected sets: 22
Cost of selected sets: 7464.14231914046
Is the solution valid? True
