## General Info - 10/10/2023

Copyright (c) 2023 Davide Sferrazza

The notebook was created during the lecture "Set Covering", which resulted in the template available [here](https://github.com/squillero/computational-intelligence/blob/a62479c672c960a49c18532e1b6cc43d0c9eb01f/2023-24/set-covering.ipynb). 

I extended the content, reorganized the code and created new functions implementing several search algorithms.

### Nomenclature
In what follows:
- a **boolean value** is a _**tile**_;
- an **array/list of booleans** is a _**set of tiles**_ or _**line of tiles**_.

#### Code Explanation

In [181]:
# import the needed libraries
from random import random
from functools import reduce
from collections import namedtuple
from itertools import count
from queue import PriorityQueue, SimpleQueue, LifoQueue
import math
import numpy as np

To correctly model the problem, we represent a set of tiles like a boolean numpy array of size `PROBLEM_SIZE`. A tile is present or not with probability `TILE_PROBABILITY`.

A state can be modeled as a tuple of two elements:
- the first element is a set of already taken sets of tiles;
- the second element is a set of not taken sets of tiles.

> Please note that with this representation, the state tuple already contains the path.

In [182]:
PROBLEM_SIZE = 5
NUM_SETS = 10
TILE_PROBABILITY = 0.3
SETS = tuple(np.array([random() < TILE_PROBABILITY for _ in range(PROBLEM_SIZE)]) for _ in range(NUM_SETS))
State = namedtuple("State", ["taken", "not_taken"])

The goal is reached if our taken sets of tiles can be stacked and collapsed into a single set of contiguous `PROBLEM_SIZE` tiles.

In [183]:
def covered(state):
    return reduce(
        np.logical_or,
        [SETS[i] for i in state.taken],
        np.array([False for _ in range(PROBLEM_SIZE)]),
    )

In [184]:
def goal_check(state):
    return np.all(covered(state))

We can check whether the randomly generated problem is solvable by considering a solution in which we take all sets of tiles.

In [185]:
assert goal_check(State(set(range(NUM_SETS)), set())), "Problem not solvable"

I define some functions that evaluate a state and assign a value to it based on its content.

In [186]:
def get_progressive_number(_=None, counter=count(1)):
    return next(counter)

In [187]:
def count_remaining_tiles(state):
    return PROBLEM_SIZE - sum(covered(state))

In [188]:
def count_number_taken_sets(state):
    return len(state.taken)

In [189]:
def sum_occupied_cells(state):
    return sum([sum(SETS[i]) for i in state.taken])

I define a function that implements the generic search algorithm. The user can specify which data structure to use as the frontier and which priority function to use.

If you use a priority queue as the frontier, the lower the priority, the earlier the state tuple will be considered for analysis.

In [190]:
def generic_search(current_state, frontier=None, priority_function=get_progressive_number):
    if not frontier:
        frontier = PriorityQueue()

    frontier.put((priority_function(current_state), current_state))

    counter = 0
    _, current_state = frontier.get()
    while not goal_check(current_state):
        counter += 1
        for action in current_state.not_taken:
            new_state = State(current_state.taken ^ {action}, current_state.not_taken ^ {action})
            frontier.put((priority_function(new_state), new_state))
        _, current_state = frontier.get()

    print(f"Solved in {counter:,} steps ({len(current_state.taken)} set of tiles)")
    print(f"Solution: {current_state}")

Given an initial state in which no set of tiles has been taken, we can perform different types of searches by simply changing the arguments of the `generic_search` function.

In [191]:
current_state = State(set(), set(range(NUM_SETS)))

In [192]:
print('-- Breadth-first search')
generic_search(current_state, SimpleQueue())
print('-- Depth-first search')
generic_search(current_state, LifoQueue())
print('-- Dijkstra with sum_occupied_cells as priority function (here for the sake of trying)')
generic_search(current_state, priority_function=sum_occupied_cells)
print('-- Dijkstra with count_number_taken_sets as priority function')
generic_search(current_state, priority_function=count_number_taken_sets)
print('-- Greedy Best-First search')
generic_search(current_state, priority_function=count_remaining_tiles)

-- Breadth-first search
Solved in 156 steps (3 set of tiles)
Solution: State(taken={0, 9, 7}, not_taken={1, 2, 3, 4, 5, 6, 8})
-- Depth-first search
Solved in 5 steps (5 set of tiles)
Solution: State(taken={5, 6, 7, 8, 9}, not_taken={0, 1, 2, 3, 4})
-- Dijkstra with sum_occupied_cells as priority function (here for the sake of trying)
Solved in 2,658 steps (4 set of tiles)
Solution: State(taken={8, 9, 2, 7}, not_taken={0, 1, 3, 4, 5, 6})
-- Dijkstra with count_number_taken_sets as priority function
Solved in 102 steps (3 set of tiles)
Solution: State(taken={9, 5, 6}, not_taken={0, 1, 2, 3, 4, 7, 8})
-- Greedy Best-First search
Solved in 3 steps (3 set of tiles)
Solution: State(taken={9, 1, 6}, not_taken={0, 2, 3, 4, 5, 7, 8})


The following function implements a depth-limited search, which by default has no limit and then performs an iterative deepening search.

In [193]:
def depth_limited_search(current_state, limit=float('inf')):
    frontier = LifoQueue()
    initial_state = current_state

    level = 0

    while level <= limit:
        frontier.put(initial_state)
        counter = 0

        while not goal_check(current_state) and frontier.qsize() > 0:
            current_state = frontier.get()
            counter += 1
            for action in current_state.not_taken:
                new_state = State(current_state.taken ^ {action}, current_state.not_taken ^ {action})
                if len(new_state.taken) <= level:
                    frontier.put(new_state)

        if goal_check(current_state):
            break

        level += 1

    if goal_check(current_state):
        print(f"Solved in {counter:,} steps")
        print(f"Solution: {current_state}")
    else:
        print("Problem not solved. Try increasing the limit.")

In [194]:
print('-- Depth-limited search - limit: 3')
depth_limited_search(current_state, limit=3)
print('-- Depth-limited search - unbounded')
depth_limited_search(current_state)

-- Depth-limited search - limit: 3
Solved in 15 steps
Solution: State(taken={9, 5, 7}, not_taken={0, 1, 2, 3, 4, 6, 8})
-- Depth-limited search - unbounded
Solved in 15 steps
Solution: State(taken={9, 5, 7}, not_taken={0, 1, 2, 3, 4, 6, 8})


### Lab 01 - 19/10/2023

To implement $A^*$, we can simply modify the `generic_search` algorithm. The new function will receive as input the cost function $g(\cdot)$ and the heuristic function $h(\cdot)$. Each state $n$ will be inserted in the priority queue according to the value of $g(n) + h(n)$.

In [195]:
def a_star(current_state, cost_function=get_progressive_number, heuristic_function=get_progressive_number):
    frontier = PriorityQueue()
    frontier.put((cost_function(current_state) + heuristic_function(current_state), current_state))

    counter = 0
    _, current_state = frontier.get()
    while not goal_check(current_state):
        counter += 1
        for action in current_state.not_taken:
            new_state = State(current_state.taken ^ {action}, current_state.not_taken ^ {action})
            frontier.put((cost_function(new_state) + heuristic_function(new_state), new_state))
        _, current_state = frontier.get()

    print(f"Solved in {counter:,} steps ({len(current_state.taken)} set of tiles)")
    print(f"Solution: {current_state}")

If we choose $h$ as `count_remaining_tiles`, which is a non-admissible function, we obtain the following result:

In [196]:
print('-- A-star with count_remaining_tiles as heuristic function')
a_star(current_state, cost_function=count_number_taken_sets, heuristic_function=count_remaining_tiles)

-- A-star with count_remaining_tiles as heuristic function
Solved in 8 steps (3 set of tiles)
Solution: State(taken={9, 1, 7}, not_taken={0, 2, 3, 4, 5, 6, 8})


`count_remaining_tiles` is non-admissible function because we reason in a pessimistic way. Given $n$ remaining tiles, we say that we need at least $n$ set of tiles to cover them, which means that each set covers 0 or at most 1 tile. \
Using it as a heuristic function in the $A^*$ algorithm does not guarantee that we will get an optimal solution. By the way, the algorithm with this function ends in a very limited number of steps and with a pretty good solution.

To implement an $A^*$ algorithm which leads to an optimal solution, i.e. the solution with the minimum cost, while expanding the minimum number of nodes, we should use an admissible heuristic. \
The closer the heuristic is to the true cost, the faster the algorithm will converge to the optimum.

To create an admissible heuristic, given the current state, we compute the missing tiles and the covered tiles from it, while sorting the not taken sets of tiles in decreasing order of the possible tiles that each set could cover. \
The heuristic function returns the minimum number of not taken sets we need to reach our goal.

In [197]:
def admissible_heuristic(state):
    covered_tiles = covered(state)
    missing_tiles = count_remaining_tiles(state)
    if missing_tiles == 0:
        return 0
    candidates = sorted(
        (sum(np.logical_and(SETS[i], np.logical_not(covered_tiles))) for i in state.not_taken), reverse=True
    )
    taken = 1
    while sum(candidates[:taken]) < missing_tiles:
        taken += 1
    return taken

In [198]:
print('-- A-star with admissible_heuristic as heuristic function')
a_star(current_state, cost_function=count_number_taken_sets, heuristic_function=admissible_heuristic)

-- A-star with admissible_heuristic as heuristic function
Solved in 11 steps (3 set of tiles)
Solution: State(taken={0, 9, 7}, not_taken={1, 2, 3, 4, 5, 6, 8})


Here I reorganized the code and improved the solution proposed by the Professor on [24th October 2023](https://github.com/squillero/computational-intelligence/blob/2683313a93160547d1cb8ab1a275fcea039fe492/2023-24/set-covering_path-search.ipynb) by restricting the sorting problem only to those sets we have not yet taken. \
The Professor's solution sorts over all sets (`state.taken.union(state.not_taken)`), but for the taken sets `sum(np.logical_and(SETS[i], np.logical_not(covered_tiles)))` is always equal to zero. We can simply ignore them.