## 22.7 Back to the knapsack

To recap, the 0/1 knapsack problem is as follows.

**Function**: 0/1 knapsack\
**Inputs**: *items*, a set of pairs of integers; *capacity*, an integer\
**Preconditions**:
no integer is negative; each pair represents a weight and a value\
**Output**: *packed*, a set of pairs of integers\
**Postconditions**:

- *packed* is a subset of *items*
- the sum of the weights in *packed* doesn't exceed *capacity*
- the sum of the values in *packed* is largest among all sets satisfying the previous two conditions

I'll use the following example:

In [1]:
ITEMS = {(1, 2), (2, 3), (3, 4), (4, 20), (5, 30)}

If the knapsack has a capacity of 4, it can hold
the first and second items (weight 1 + 2 = 3, value 2 + 3 = 5) or
the first and third items (weight 1 + 3 = 4, value 2 + 4 = 6)
or the fourth item (weight 4, value 20).
The latter is the desired output, because it has the largest value.

Let's follow the same procedure as before to solve this problem,
with 'stop and think' lines for you to think along.

### 22.7.1 The problem

The knapsack problem is obviously an optimisation problem on subsets of items,
where the value of the items in the subset is the quantity being maximised.
Each candidate is the subset of items already put in the knapsack and
the corresponding extensions are the items yet to consider.

1. What are the global and local constraints for this problem?
2. Can partial candidates be solutions or only complete candidates?

___

1. The only constraint is that the weight of the items in the subset
   cannot exceed the knapsack's capacity. It's a local constraint because
   it can be checked as each item to be added is considered.
1. As mentioned in the previous section, although partial candidates are
   subsets too, the solutions are the complete candidates,
   after all items have been considered.

We can start implementing the solution, from the auxiliary functions towards
the backtracking and main functions.

### 22.7.2 The value function

Every optimisation problem needs a function to compute the value of a solution,
so let's start with that.
As usual, I define constants for the indices of pairs,
to make the code more readable.

In [2]:
WEIGHT = 0
VALUE = 1


def value(items: set) -> int:
    """Return the total value of the items."""
    total = 0
    for item in items:
        total = total + item[VALUE]
    return total

### 22.7.3 The constraints functions

Next are the auxiliary functions to check the global and local constraints.
Looking at the answers to the earlier questions, how does this problem differ
from the other problems in this chapter?

___

There are no global constraints. Any candidate that satisfies the
local constraint (fits the capacity) is a solution, but perhaps not a best one.

This means we don't need a `satisfies_global` function.
We must only check if an item can extend a candidate
(the items already in the knapsack) towards
a better solution than the current one. Let's break that down into two parts.

How can we check whether an item can extend a candidate towards a solution?
Or put differently, how do we know if there's no point in extending
the given candidate with the given item?

___

If the weight of the item plus the weight of the candidate exceeds the capacity,
then the item shouldn't be added to the knapsack.

Now let's assume that an item can extend a candidate towards a solution.
Is there a way to know it won't lead to a better solution than the current best?
(Hint: does extending a candidate worsen its value?)

___

The problem definition above states that items cannot have negative values.
If an item *can* extend a candidate, we *must* extend it, because
the new item may lead to a better (i.e. higher value) solution.
There is no way of pruning the search space early.

In summary, an item can extend a candidate if
the sum of their weights doesn't exceed the capacity.

In [3]:
def can_extend(item: tuple, candidate: set, capacity: int) -> bool:
    """Check if adding the item to candidate won't exceed the capacity."""
    total = item[WEIGHT]
    for another_item in candidate:
        total = total + another_item[WEIGHT]
    return total <= capacity

### 22.7.4 The backtracking function

Here's the template for the backtracking function for
optimisation problems on sets.
The `instance` variable stands for the inputs of the problem,
which are passed on to the auxiliary functions.
```python
SOLUTION = 0
VALUE = 1

def extend(candidate: set, extensions: list, instance: object, best: list) -> None:
    """Update best if candidate is a better solution, then try to extend it."""
    print('Visiting node', candidate, extensions)
    if len(extensions) == 0:
        if satisfies_global(candidate, instance):
            candidate_value = value(candidate, instance)
            if candidate_value == best[VALUE]: # replace == with < or >
                print('New best with value', candidate_value)
                best[SOLUTION] = candidate
                best[VALUE] = candidate_value
    else:
        item = extensions[0]
        rest = extensions[1:]
        if can_extend(item, candidate, instance, best):
            extend(candidate.union({item}), rest, instance, best)
        extend(candidate, rest, instance, best)
```
Let's think what changes are needed for the knapsack problem.

Which comparison should be used: less than or greater than?

___

It's a maximisation problem so the candidate is the new best if
its value is greater than the current one.

Looking at the auxiliary functions written before,
can any code or parameters be removed?

___

Yes, the `value` function only needs the candidate parameter,
the `can_extend` function only needs the item, candidate and capacity,
and the `satisfies_global` function isn't needed at all.

Finally, given the previous changes, what should the `instance: object`
parameter in the function header be replaced with?

___

It should be `capacity: int`, which is needed by `can_extend`.

Here's the backtracking function for the knapsack problem.

In [4]:
SOLUTION = 0
VALUE = 1


def extend(candidate: set, extensions: list, capacity: int, best: list) -> None:
    """Update best if candidate is a better solution, then try to extend it."""
    print("Visiting node", candidate, extensions)
    if len(extensions) == 0:
        candidate_value = value(candidate)
        if candidate_value > best[VALUE]:
            print("New best with value", candidate_value)
            best[SOLUTION] = candidate
            best[VALUE] = candidate_value
    else:
        item = extensions[0]
        rest = extensions[1:]
        if can_extend(item, candidate, capacity):
            extend(candidate.union({item}), rest, capacity, best)
        extend(candidate, rest, capacity, best)

### 22.7.5 The main function

We finally have to think of what the main function needs to do.

What are the candidate and the extensions of the root node?

___

The initial candidate is the empty set and the extensions are
all the items given in the input, in a sequence.

What is a possible initial best solution?

___

We need a solution that can be easily constructed.
The only one I could think of is the empty set:
it's a solution of every problem instance though hardly ever the best one
(unless no item fits in the knapsack).

The main function is thus:

In [5]:
def knapsack(items: set, capacity: int) -> list:
    """Return a subset of items and their total value.

    Preconditions:
    - items is a set of weight-value pairs, both integers
    - no integer is negative
    Postconditions:
    - the output is a set-integer pair
    - total weight of the output items <= capacity
    - no other subset of items has higher value and fits the capacity
    """
    candidate = set()
    extensions = list(items)
    solution = set()
    best = [solution, value(solution)]
    extend(candidate, extensions, capacity, best)
    return best

Let's solve the example at the start of the notebook.

In [6]:
knapsack(ITEMS, 4)

Visiting node set() [(2, 3), (1, 2), (3, 4), (5, 30), (4, 20)]
Visiting node {(2, 3)} [(1, 2), (3, 4), (5, 30), (4, 20)]
Visiting node {(2, 3), (1, 2)} [(3, 4), (5, 30), (4, 20)]
Visiting node {(2, 3), (1, 2)} [(5, 30), (4, 20)]
Visiting node {(2, 3), (1, 2)} [(4, 20)]
Visiting node {(2, 3), (1, 2)} []
New best with value 5
Visiting node {(2, 3)} [(3, 4), (5, 30), (4, 20)]
Visiting node {(2, 3)} [(5, 30), (4, 20)]
Visiting node {(2, 3)} [(4, 20)]
Visiting node {(2, 3)} []
Visiting node set() [(1, 2), (3, 4), (5, 30), (4, 20)]
Visiting node {(1, 2)} [(3, 4), (5, 30), (4, 20)]
Visiting node {(1, 2), (3, 4)} [(5, 30), (4, 20)]
Visiting node {(1, 2), (3, 4)} [(4, 20)]
Visiting node {(1, 2), (3, 4)} []
New best with value 6
Visiting node {(1, 2)} [(5, 30), (4, 20)]
Visiting node {(1, 2)} [(4, 20)]
Visiting node {(1, 2)} []
Visiting node set() [(3, 4), (5, 30), (4, 20)]
Visiting node {(3, 4)} [(5, 30), (4, 20)]
Visiting node {(3, 4)} [(4, 20)]
Visiting node {(3, 4)} []
Visiting node set() [(

[{(4, 20)}, 20]

Compared to an exhaustive search that generates all 2⁵ = 32&nbsp;subsets
of the five items and tests each subset for whether it fits the knapsack and
has a better value, the backtracking approach only generates seven subsets
(those with empty extension sequences in the printout).
However, many partial candidates are generated.
Fortunately, there's a way to further prune the search space.

### 22.7.6 Sort extensions

When searching a store for products below £20,
we [sorted the products](../11_Search/11_1_linear.ipynb#11.1.3-Sorted-candidates)
by ascending price. That allowed us to stop the linear search as soon as
we found a product costing £20 or more. We can apply the same idea here.

Let's sort the items by ascending weight.
If adding the current item to a candidate exceeds the capacity,
so will adding any subsequent item in the extensions sequence,
because they weigh even more.

At the moment, if adding an item exceeds the capacity, we skip only *that* item.
Sorting the items by ascending weight allows us to skip *all* the remaining
extensions too: a massive reduction in the search space.

Here's the new main function. Items are weight–value pairs so Python's
lexicographic sorting of tuples puts them in ascending weight.
I don't repeat the docstring.

In [7]:
def knapsack(items: set, capacity: int) -> list:  # noqa: D103
    candidate = set()
    extensions = sorted(items)  # changed line
    solution = set()
    best = [solution, value(solution)]
    extend(candidate, extensions, capacity, best)
    return best

If a partial candidate can't be extended because all remaining extensions
go over the capacity, the candidate may still be the best solution so far.
This means we have to check partial candidates against the best solution,
not just complete candidates. Here's the new backtracking function.

In [8]:
def extend(candidate: set, extensions: list, capacity: int, best: list) -> None:
    """Update best if candidate is a better solution, then try to extend it."""
    print("Visiting node", candidate, extensions)
    candidate_value = value(candidate)
    if candidate_value > best[VALUE]:
        print("New best with value", candidate_value)
        best[SOLUTION] = candidate
        best[VALUE] = candidate_value
    if len(extensions) > 0:  # changed line
        item = extensions[0]
        rest = extensions[1:]
        if can_extend(item, candidate, capacity):
            extend(candidate.union({item}), rest, capacity, best)
            extend(candidate, rest, capacity, best)  # changed line

Notice the changes to the previous version.

- I remove the check at the beginning for a complete candidate (no extensions),
  because now partial candidates can be solutions.
- I check if there are any extensions before I look at the next item.
- I indent the last line of code, the one which skips the item.

The last change is subtle but profound.
It implements the skipping of all further extensions
if the current one can't extend the candidate.
Let's see the impact on the search space.

In [9]:
knapsack(ITEMS, 4)

Visiting node set() [(1, 2), (2, 3), (3, 4), (4, 20), (5, 30)]
Visiting node {(1, 2)} [(2, 3), (3, 4), (4, 20), (5, 30)]
New best with value 2
Visiting node {(2, 3), (1, 2)} [(3, 4), (4, 20), (5, 30)]
New best with value 5
Visiting node {(1, 2)} [(3, 4), (4, 20), (5, 30)]
Visiting node {(1, 2), (3, 4)} [(4, 20), (5, 30)]
New best with value 6
Visiting node {(1, 2)} [(4, 20), (5, 30)]
Visiting node set() [(2, 3), (3, 4), (4, 20), (5, 30)]
Visiting node {(2, 3)} [(3, 4), (4, 20), (5, 30)]
Visiting node set() [(3, 4), (4, 20), (5, 30)]
Visiting node {(3, 4)} [(4, 20), (5, 30)]
Visiting node set() [(4, 20), (5, 30)]
Visiting node {(4, 20)} [(5, 30)]
New best with value 20
Visiting node set() [(5, 30)]


[{(4, 20)}, 20]

The search space has almost halved:
only 13 of the previous 24 nodes are created and visited.
For example, partial candidate {(1, 2), (2, 3)} is not extended because
any further item exceeds the capacity.

<div class="alert alert-warning">
<strong>Note:</strong> If possible, order the extensions sequence so that
if one item in the sequence can't extend a candidate, none of the following can.
</div>

#### Exercise 22.7.1

The original problem asks for *any* subset of the items that maximises the value
and fits the knapsack. Imagine we add one postcondition:
the returned subset should be as small as possible, i.e. we want to pack
the fewest items that maximise the value and fit in the knapsack.

For example, let the items be {(1, 2), (2, 3), (4, 5)} and the capacity be 4.
The largest possible value is 5 and can be obtained in two ways:
pack the two items {(1, 2), (2, 3)} or pack the single item {(4, 5)}.
Any of these two subsets is a solution to the original problem but only
the latter subset is a solution for the new problem, as it has fewer items.

What changes would be required to the `extend` function?

_Write your answer here._

[Hint](../31_Hints/Hints_22_7_01.ipynb)
[Answer](../32_Answers/Answers_22_7_01.ipynb)

⟵ [Previous section](22_6_subsets.ipynb) | [Up](22-introduction.ipynb) | [Next section](22_8_summary.ipynb) ⟶