# Set Covering

Set covering problems are a class of combinatorial optimization problems in which the goal is to find a minimum-size collection of sets (subsets of a larger set) such that every element from the universal set is covered by at least one of the selected sets. These problems have applications in various fields, including operations research, computer science, logistics, and facility location.

Here's a more formal definition:

- Given a universal set U and a collection of subsets S = {S1, S2, ..., Sn}, where each Si is a subset of U, the set covering problem aims to find a minimum-size subset C of S such that the union of all sets in C covers every element in U. Mathematically, you want to minimize the cardinality of C, subject to the constraint that the union of sets in C equals U.

Set covering problems can be described as an integer linear programming (ILP) problem. The decision version of the problem is typically referred to as the Set Covering Problem (SCP), and it can be formulated as follows:

**Decision Set Covering Problem (SCP):**

- Input: A finite set U, a collection of subsets S = {S1, S2, ..., Sn} of U, and a positive integer k.
- Question: Is there a subset C of S with at most k sets such that the union of sets in C covers all elements of U?

The optimization version of the problem is known as the Set Covering Optimization Problem (SCOP):

**Set Covering Optimization Problem (SCOP):**

- Input: A finite set U, a collection of subsets S = {S1, S2, ..., Sn} of U.
- Output: Find a subset C of S with the minimum cardinality such that the union of sets in C covers all elements of U.


## Problem Generation


In [17]:
import numpy as np
from random import random
from functools import reduce
from collections import namedtuple
from queue import PriorityQueue, SimpleQueue

In [18]:
# constants
PROBLEM_SIZE = 5   # dimension of the finite set U
NUMBER_SET = 10     # number of subsets in the collection S
SETS = tuple(
    np.array([random()<.3 for i in range(PROBLEM_SIZE)]) for j in range(NUMBER_SET)
    ) # generate sets in S

State = namedtuple('State', ['taken','not_taken'])

In [19]:
SETS

(array([ True, False, False, False, False]),
 array([False, False, False, False, False]),
 array([False, False, False,  True, False]),
 array([False, False, False, False,  True]),
 array([False, False, False,  True, False]),
 array([ True, False,  True, False, False]),
 array([False, False, False, False, False]),
 array([False, False, False, False,  True]),
 array([False, False, False, False,  True]),
 array([ True, False, False,  True,  True]))

In [20]:
def goal_check(state):
    """
    check if the logical OR all the elements yeald a line of all true ie the
    condition for a state to be covering the whole set U
    """
    # return np.all(reduce(np.logical_or, [SETS[i] for i in state.taken], np.zeros(PROBLEM_SIZE)))

    # Create an array with all False values of the same size as the problem size
    result = np.zeros(PROBLEM_SIZE, dtype=bool)

    # Iterate through the selected sets and update the result array using logical OR. 
    for i in state.taken:
        np.logical_or(result, SETS[i], out=result)
    #  The **out** parameter of np.logical_or directly updates the
    #  result array, which can be more efficient in terms of memory allocation
    #  and manipulation, especially for larger problem sizes.

    # Check if all elements in the result array are True
    return result.all()

## Solving with path search


The approach being used is not strictly a breadth-first search (BFS) but rather a modified form of BFS. It's sometimes referred to as a "best-first search" or "priority queue-based search." :

- **PriorityQueue**: The code initializes frontier as a PriorityQueue. In this data structure, states are extracted based on their priority. This priority-driven extraction is different from a standard BFS, where states are processed in the order they were added to the queue.

The code is exploring states by iteratively selecting subsets (actions) and generating new states based on those selections. This is somewhat similar to a BFS approach where all possible next states are considered. However, the key difference is that the priority queue allows you to explore states with lower estimated cost or other heuristics first. This is a characteristic of best-first search.

Replacing the PriorityQueue with a **SimpleQueue**, the code would more closely resemble a proper breadth-first search (BFS). A SimpleQueue follows a first-in, first-out (FIFO) order, which is a fundamental characteristic of BFS.


In [21]:
frontier = PriorityQueue()
# frontier = SimpleQueue()

In [22]:
# initialize the frontier with the inistial state, ie no sets have been taken yet
initial_state= State(set(), set(range(NUMBER_SET)))
frontier.put(initial_state)
print(initial_state)

State(taken=set(), not_taken={0, 1, 2, 3, 4, 5, 6, 7, 8, 9})


Generating the sets at random we might end up in a case where for one of the elements all the columns are false, thus making it impossible to find a solution.

To ignore this we can simply break the loop after it has checked all the possible combinations of sets.

To calculate the maximum number of combinations possible given your problem size, we can consider that each of the 10 subsets can either be included (True) or excluded (False) from the solution. This means that there are 2 possibilities for each of the 10 subsets. So, the total number of combinations is 2^10.

Therefore, there are a maximum of 2^10 = 1024 combinations that need to be considered in the worst case for your problem size.

Would be better to check at SETS generation or generate them in different way but ehhh


In [23]:
# extract from beginning of frontier
current_state= frontier.get()

# check if current state is solution
counter=0
while not goal_check(current_state):
    # increment counter of iterations
    counter+=1

    # check if the problem is not solvable with the generated sets
    if counter> 1024:
        print("not solvable with the generated sets") # check SETS print for a col of all False
        break

    # for each not taken action in the current state
    for action in current_state.not_taken:
        # create a new state by taking that action from the not_taken subset and moving it to the taken subset
        newstate = State(current_state.taken ^ {action}, current_state.not_taken ^ {action} )
        # add each new state to the frontier
        frontier.put(newstate)

    # extract from beginning of frontier
    current_state = frontier.get()

print("Solved in",counter,"steps")

not solvable with the generated sets
Solved in 1025 steps


In [24]:
# solution
print(current_state)
goal_check(current_state)

State(taken={1, 2, 3, 4, 5, 6, 7, 8, 9}, not_taken={0})


False