## 11.5 Searching subsets

For the TSP, the order in which places are visited impacts the total cost
and so the output must be a sequence. For other problems,
the output is a set and we must generate subsets instead of permutations of
a collection of items. Here's one such problem.

### 11.5.1 Problem

Many products (cars, dishwashers, etc.) are designed and manufactured
as product lines with a set of configurable features, to reduce costs.
Not all features are compatible with each other, e.g. the features
'petrol engine' and 'diesel engine' are mutually incompatible.

Given the features and the pairs of incompatible features,
how many different products can be made? This is another example of a CSP:
the constraints are given as incompatible feature pairs.

<div class="alert alert-info">
<strong>Info:</strong> This is an adaptation of problem
<a href="https://open.kattis.com/problems/geppetto">Geppetto</a>.
</div>

Here's one possible definition of the problem.
How each feature is represented (an integer, a string, etc.) is irrelevant.

**Function**: feasible products\
**Inputs**:
*features*, a set of objects; *incompatible*, a set of pairs of objects\
**Preconditions**:

- *features* isn't empty
- every pair in *incompatible* consists of two different objects in *features*
- if *incompatible* has feature pair (A, B), then it hasn't pair (B, A)

**Output**: *products*, an integer\
**Postconditions**: *products* is the number of non-empty subsets of *features* that don't contain a pair of objects in *incompatible*

The second precondition states that no feature is incompatible with itself.
The third precondition prevents redundant incompatibility information.

Let's think about some tests. What are the edge cases?

___

The smallest possible input is a single feature which therefore
has no incompatibilities.
Other edges cases (with more than one feature) are no incompatibilities
and all features are mutually incompatible.

To keep lines of code short, I represent features with integers instead of
strings. I use tuples to represent pairs.

In [1]:
from algoesup import check_tests

feasible_products_tests = [
    # case,             features,     incompatible, products
    ('smallest input',  {1},          {},                 1),
    ('all compatible',  {1, 2},       {},                 3),
    ('no compatible',   {1, 2},       {(1, 2)},           2),
    ('some compatible', {1, 2, 3, 4}, {(1, 2), (3, 4)},   8)
]

check_tests(feasible_products_tests, [set, set, int])

Error: in test "smallest input", {} must have type set.
Error: in test "all compatible", {} must have type set.


The 'all compatible' test has output 3 because there are
two products with one feature each and one product with both features.
The latter isn't a feasible product when both features are incompatible,
so there are only two single-feature products for the 'no compatible' test.

Can you explain why there are eight feasible products for the last problem instance?

___

There are four single-feature products and another four products with
two features: one from the first incompatible pair and
another from the second pair.
Those four products have features 1 and 3, 1 and 4, 2 and 3, and 2 and 4.
There are no products with three or four features as they would include
two incompatible features.

### 11.5.2 Algorithm

The problem can be solved with an exhaustive search:
generate each non-empty subset of features and test whether it includes
a pair of incompatible features. If not, we found a subset of compatible
features and can increment the product counter.
To check if a candidate feature subset is a solution,
we make a linear search over all
pairs of incompatible features and test if both are in the current candidate.

The overall algorithm thus consists of an exhaustive search within
an exhaustive search.

In mathematics, a ***k*-combination** is a selection of *k* items
from a collection, without considering the order in which they were selected.
If the collection from which items are selected is a set,
then a *k*-combination is a subset of size *k*.
The 0-combination is the empty set.

For this problem, *k* is the number of features to be put in the product.
The algorithm has to generate and test all *k*-combinations,
for each *k* from 1 to the total number of features.

1. let *products* be 0
2. for each *k* from 1 to │*features*│:
   1. for each *product* that is a *k*-combination of *features*:
      1. if feasible(*product*, *incompatible*):
         1. let *products* be *products* + 1

Step&nbsp;2.1.1 tests the current product candidate with
an auxiliary Boolean function that does the linear search.

#### Exercise 11.5.1

Copy the above algorithm and change it so that
instead of the number of feasible products
it computes a largest set of mutually compatible features.
(This is known as the **maximal independent set** problem.)
The output variable should be called *compatible*.

_Write your answer here._

[Answer](../32_Answers/Answers_11_5_01.ipynb)

### 11.5.3 Complexity

When generating a subset, we have two choices for each item:
either we put it in the subset or we don't.
This means there are 2ⁿ subsets of a set with *n* items.

A simple rule of thumb is that $2^{10m} = (2^{10})^m = 1024^m$ is about
$1000^m$, so sets of 10, 20 and 30 items ($m$ is 1, 2, and 3)
have about a thousand, a million and a billion subsets, respectively.

The algorithm does a linear search over the incompatible pairs for each subset.
The worst-case complexity is therefore Θ(2ⁿ × │*incompatible*│),
with *n* = │*features*│.
The complexity is said to be **exponential** when it's of the form *c*ⁿ, with
*c* some constant greater than one and *n* the size of one of the inputs.

<div class="alert alert-info">
<strong>Info:</strong> MU123 Unit&nbsp;13 and MST124 Unit&nbsp;3 Section&nbsp;4 introduce exponential functions.
</div>

Exponential algorithms with *c* = 2 become slow very fast,
but not as fast as factorial algorithms,
which take eons to finish even for very small inputs,
assuming the hardware would last that long.
Here's a comparison of several functions.

n  |  n² |    n³ |        2ⁿ | n!
-:|-:|-:|-:|-:
 0 |    0 |      0 |          1 |                                  1
 5 |   25 |    125 |         32 |                                120
10 |  100 |  1,000 |      1,024 |                          3,628,800
15 |  225 |  3,375 |     32,768 |                  1,307,674,368,000
20 |  400 |  8,000 |  1,048,576 |          2,432,902,008,176,640,000
25 |  625 | 15,625 | 33,554,432 | 15,511,210,043,330,985,984,000,000

For example, if generating and testing one subset of features takes 1&nbsp;ms, then
the exponential algorithm takes about 33.5 thousand seconds (that's over 9&nbsp;hours!)
for 25 features, a rather small input value.

If generating one tour and computing its cost also takes 1&nbsp;ms, then the
factorial exhaustive search algorithm for the TSP in the previous section takes

In [2]:
from math import factorial

MS_PER_YEAR = 365 * 24 * 60 * 60 * 1000  # milliseconds in a year
print(factorial(25) // MS_PER_YEAR // 1000**3, "billion years")

491857 billion years


to find the best tour to visit 25 places and return to the start place.

<div class="alert alert-warning">
<strong>Note:</strong> Algorithms with best-case exponential complexity can only be applied
to very small input values.
Algorithms with best-case factorial complexity are practically useless.
</div>

### 11.5.4 Code

To generate subsets, we'll use another function from the `itertools` module:
`combinations`. It takes a collection of items and an integer *k*,
and generates one by one all *k*-combinations of those items.
Each combination is represented with a tuple although conceptually it's a set.
Here's a simple example.

In [3]:
from itertools import combinations

items = {"some", "words"}
for size in range(len(items) + 1):
    for subset in combinations(items, size):
        print(subset)

()
('some',)
('words',)
('some', 'words')


Let's implement the algorithm above.

In [4]:
from itertools import combinations
from algoesup import test


def feasible_products(features: set, incompatible: set) -> int:
    """Return the number of subsets of features without incompatibilities.

    Preconditions:
    - len(features) > 0
    - incompatible is a set of pairs of distinct elements of features
    - if pair (a, b) is in incompatible, pair (b, a) isn't
    """

    def feasible(product: tuple) -> bool:
        """Check if product hasn't two incompatible features."""
        for pair in incompatible:
            if pair[0] in product and pair[1] in product:
                return False
        return True

    products = 0
    for size in range(1, len(features) + 1):
        for product in combinations(features, size):
            if feasible(product):
                products = products + 1
    return products


test(feasible_products, feasible_products_tests)

Testing feasible_products...
Tests finished: 4 passed (100%), 0 failed.


#### Exercise 11.5.2

The **0/1 knapsack** problem, another classic optimisation problem,
goes as follows.
Given a set of items as weight–value pairs,
and given the largest weight a rucksack can carry without bursting,
find the highest-valued subset of items that can be packed.

The name of the problem comes from the fact that the solution has
0 or 1 of each item.

Outline an algorithm.

_Write your answer here._

[Hint](../31_Hints/Hints_11_5_02.ipynb)
[Answer](../32_Answers/Answers_11_5_02.ipynb)

⟵ [Previous section](11_4_permutations.ipynb) | [Up](11-introduction.ipynb) | [Next section](11_6_practice.ipynb) ⟶