# THIS IS THE SOLUTION FOR COMPUTATIONAL INTELLIGENS LAB 01

> Author: `Daniel Bologna - s310582`
> - You can find the solution below.

Copyright **`(c)`** 2024 Giovanni Squillero `<giovanni.squillero@polito.it>`  
[`https://github.com/squillero/computational-intelligence`](https://github.com/squillero/computational-intelligence)  
Free for personal or classroom use; see [`LICENSE.md`](https://github.com/squillero/computational-intelligence/blob/master/LICENSE.md) for details.  

# Set Cover problem

See: https://en.wikipedia.org/wiki/Set_cover_problem

In [1]:
from random import random, seed
from itertools import product
import numpy as np

from icecream import ic

# EXTRA LIBRARIES (!plot ONLY purpose!)
from matplotlib import pyplot as plt
from tqdm.auto import tqdm
from itertools import accumulate

## Reproducible Initialization

If you want to get reproducible results, use `rng` (and restart the kernel); for non-reproducible ones, use `np.random`.

In [2]:
UNIVERSE_SIZE = 5#100_000
NUM_SETS = 10#10_000
DENSITY = 0.2

rng = np.random.Generator(np.random.PCG64([UNIVERSE_SIZE, NUM_SETS, int(10_000 * DENSITY)]))

In [3]:
# DON'T EDIT THESE LINES!

SETS = np.random.random((NUM_SETS, UNIVERSE_SIZE)) < DENSITY
for s in range(UNIVERSE_SIZE):
    if not np.any(SETS[:, s]):
        SETS[np.random.randint(NUM_SETS), s] = True
COSTS = np.pow(SETS.sum(axis=1), 1.1)

## Helper Functions

In [20]:
def valid(solution):
    """Checks wether solution is valid (ie. covers all universe)"""
    return np.all(np.logical_or.reduce(SETS[solution]))


def cost(solution):
    """Returns the cost of a solution (to be minimized)"""
    return COSTS[solution].sum()

## Have Fun!

In [None]:
# A dumb solution of "all" sets
solution = np.full(NUM_SETS, True)
valid(solution), cost(solution)

In [None]:
# A random solution with random 50% of the sets
solution = rng.random(NUM_SETS) < .5
valid(solution), cost(solution)

---

# Solution

## My helper functions

In [4]:
def valid(_current_solution : np.ndarray) -> bool:
    """
    Checks whether or the _current_solution is valid or not.
    Args:
        _current_solution (np.ndarray): The current solution.
    Returns:
        bool: _current_solution validity.
    """
    return np.all(np.logical_or.reduce(SETS[_current_solution]))


def cost(_current_solution : np.ndarray) -> float:
    """
    Compute the cost of the current sets group.
    Args:
        _current_solution (np.ndarray): The current solution.
    Returns:
        float: the sum of the costs of each set in the current solution.
    """
    return COSTS[_current_solution].sum()

## Implementation

### Initial values

In [11]:
MAX_STEPS : int = 10_000

starting_solution = rng.random(NUM_SETS) < .5

### Validity function

I choose the same one provided during the lab

In [41]:
def valid(solution):
    """Checks wether solution is valid (ie. covers all universe)"""
    return np.all(np.logical_or.reduce(SETS[solution]))

### Cost function

Our goal is to minimize the cost of finding the smallest union of subsets S that covers the universe U. To do this, we must think of a fitness function that, when maximized, also minimizes the cost. We can also think to consider not only the cost as defined earlier but to re-think it as the number of subsets in our current solution.


Starting from the `cost function`, we can see that it computes the sum of the cost of each selected subset in our current solution.
- the **cost of a single subset** is *the count of taken element of the universe to the power of 1.1*. 
- the **cost of a solution** is the sum of the costs of each subset inlcuded in the solution itself.

we can tell that if the cost of a solution is around pow(UNIVERSE_SIZE, 1.1) we can expect aving the right amount of element **without knowing if these elements actually cover the universe**.

In the end we can re-think the cost function as the count of **how many times universe's elements are chosen in different subsets**. In this way, we want to minimize this overlapping value.

In [42]:
def cost(_current_solution : np.ndarray) -> int:
    intersection = np.add.reduce(SETS[_current_solution])
    cost = 0
    for x in intersection:
        if x > 1:
            cost += x - 1
    return cost

### Fitness function

See the fitness as the tuple containing the following values

- valid(solution)
- number of chosen universe's element taken once
- -cost(solution)

> Our objective is to maximize the number of elements in the universe covered by our subsets, and minimize the -cost to reduce the intersections (element taken more than once in different subsets).

In [38]:
def fitness(_current_solution : np.ndarray) -> tuple[bool, float, float]:
    return (
        valid(_current_solution), 
        np.logical_or.reduce(SETS[_current_solution]).sum(), 
        -cost(_current_solution))

Let's check some values

In [39]:
sol = rng.random(NUM_SETS) < .3
next_sol = rng.random(NUM_SETS) < .3
is_greater = fitness(next_sol) > fitness(sol) 

ic(fitness(next_sol), fitness(sol), is_greater)

ic| fitness(next_sol): (np.False_, np.int64(1), 0)
    fitness(sol): (np.False_, np.int64(3), np.int64(-1))
    is_greater: np.False_


((np.False_, np.int64(1), 0),
 (np.False_, np.int64(3), np.int64(-1)),
 np.False_)

### Tweak function

In [12]:
def tweak(_current_solution : np.ndarray) -> np.ndarray:
    return _current_solution

### Iterations

ic| fitness(s1): (np.False_, np.int64(2), 0)
    fitness(s2): (np.False_, np.int64(1), 0)
    fitness(s1) > fitness(s2): np.True_


((np.False_, np.int64(2), 0), (np.False_, np.int64(1), 0), np.True_)

### Plot

## Results!

| instance | Universe size | numset | density | time |
|----------|---------------|--------|---------|------|
| 1        |               |        |         |      |
| 2        |               |        |         |      |
| 3        |               |        |         |      |
| 4        |               |        |         |      |
| 5        |               |        |         |      |
| 6        |               |        |         |      |