## Set Covering greedy algorithm
Many real-world optimization problems can be expressed as variants or extensions of the set cover problem. We define a finite set of objects $U = \{x_1, . . . , x_m\}$ as the universe, and S = $\{s_1, . . . , s_n\}$ a collection of subsets of $U$, such that every element of $U$ belongs to at least one set of $S$.


**input**: collection $S$ of sets over universe $U$, costs $c: S → \mathbb{R}_+$

**output**: set cover $C$

1. Let $C \gets \emptyset$;
2. Repeat until $C$ is a set cover;
3. Find a set $s \in S$ maximizing the number of elements in $s$ not yet covered by any set in $C$, divided by the cost $c_s$;
4. Add $s$ to C;
5. Return $C$.

Some theorethical results can be proved for the worst case quality scenario. For instance in [1,2,3] it was proved that: 

**Theorem** The greedy set-cover algorithm returns a set cover of cost at most $H(d)$ opt, where opt is the minimum cost of any set cover, $d=\max_{s∈S}\{|s|\}$ is the maximum set size, and $H(d)$≈$0.58+\ln(d)$ is the $d$-th Harmonic number.

The logarithmic approximation guarantee is the best possible in the following sense: if P≠NP, in the worst case, no polynomial-time algorithm guarantees a cover of cost $o$(opt $\log(n)$), where $n=|U|$ is the number of elements to be covered.

Bibliography

[1]	V. Chvátal. A greedy heuristic for the set-covering problem. Math. Operations Research, 4:233–235, 1979.

[2]	D. S. Johnson. Approximation algorithms for combinatorial problems. J. Computer System Sciences, 9:256–278, 1974.

[3]	L. Lovász. On the ratio of optimal integral and fractional covers. Discrete Mathematics, 13:383–390, 1975.


In [1]:
import numpy as np

In [5]:
def equals_set(s1, s2):
    if len(s1) != len(s2):
        return False
    for elem in s1:
        if elem not in s2:
            return False
    return True

# implementation of set covering greedy algorithm
# input matrix a and c defined above
def set_cover(a, c):
    # number of subsets
    n_subset = a.shape[1]
    # subsets
    subsets = [set(np.where(a[:, i] == 1)[0]) for i in range(n_subset)]
    # universe
    U = set(range(a.shape[0]))
    # set containing singular elements of each subset
    elements = set(e for s in subsets for e in s)
    # Check the subsets cover the universe
    if elements != U:
        return None
    covered = set()
    visited_idx = set()
    C = []
    # Greedily add the subsets maximizing the number of elements in it not yet covered by any set found so far, divided by its cost
    while not equals_set(covered, elements):
        best_cost, idx = 0, None
        for i, cur_set in enumerate(subsets):
            if i in visited_idx:
                continue
            count = 0
            for elem in cur_set:
                if elem not in covered:
                    count += 1
            count /= c[i]
            if count > best_cost:
                best_cost = count
                idx = i

        covered.update(subsets[idx])
        C.append(subsets[idx])
        visited_idx.add(idx)
 
    return C