# Knapsack Problem
#### Solving the classic knapsack problem: given a list of item weight and values, and a knapsack with a maximum weight capacity, which items should be taken in the knapsack to maximize the total value?
Several algorithms are presented below: 
- Exhaustive brute force
- dynamic programming
- Branch and bound
- Branch and bound with linear relaxation
- OR-tool's knapsack solver

I present an example where 19 items are given. As can be seen, all algorithms give the same answer. However, not all of these methods are scalable. In general, as you scroll down the page, the algorithms become more efficient. To expemplify this, two examples are given at the very bottom of the page: using brand and bound with linear relaxationi and OR-tools in order to solve a knapsack problem with 10,000 items.

In [2]:
import numpy as np
import pandas as pd

import itertools

from progressbar import ProgressBar


# Exhaustive Brute Force Search

In [5]:
f = open('knapsack/data/ks_19_0', 'r')
input_data = f.read()

In [6]:
lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])


weights = []
values = []


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))

In [7]:
all_solutions = list(itertools.product([0, 1], repeat=item_count))

solution_values = [-10000] * (2**item_count)

pbar = ProgressBar()

for i in pbar(range(2**item_count)):
    if sum([a*b for a,b in zip(weights, all_solutions[i])]) <= capacity:
         solution_values[i] = sum([c*d for c,d in zip(values, all_solutions[i])])
    

100% |########################################################################|


In [8]:
print('the optimal item selection is', np.array(all_solutions[np.argmax(solution_values)]))
print('the value is', max(solution_values))

the optimal item selection is [0 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0]
the value is 12248


# Dynamic Programming

In [9]:
def O(k,j):
    if j == 0:
        return 0
    elif weights[j] <= k:
        return np.max([O(k,j-1), values[j]+ O(k-weights[j], j-1)])
    else:
        return O(k,j-1)

In [10]:
f = open('knapsack/data/ks_19_0', 'r')
input_data = f.read()

In [11]:
lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])


# the first entry in weights and values lists, 0, is a dummy entry that simplifies the recursion code a bit
weights = [0]
values = [0]


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))

In [13]:
pbar = ProgressBar()

In [14]:
#create empty table
table = np.zeros((capacity+1,len(weights)))

#run through the recursion 
for j in pbar(range(len(weights))):
    for k in range(capacity+1):
        table[k,j] = O(k,j)



100% |########################################################################|


In [15]:
table

array([[    0.,     0.,     0., ...,     0.,     0.,     0.],
       [    0.,     0.,     0., ...,     0.,     0.,     0.],
       [    0.,     0.,     0., ...,     0.,     0.,     0.],
       ...,
       [    0.,  1945.,  2266., ..., 12248., 12248., 12248.],
       [    0.,  1945.,  2266., ..., 12248., 12248., 12248.],
       [    0.,  1945.,  2266., ..., 12248., 12248., 12248.]])

In [16]:
selection = np.zeros(len(weights))
j_index = len(weights) - 1
k_index = capacity

In [17]:
for j in range(len(weights)):
    if table[k_index, j_index - j] != table[k_index, j_index - j - 1]:
        selection[j_index-j] = 1
        k_index -= weights[j_index - j]
        

In [18]:
selection

array([0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0.,
       0., 0., 0.])

In [19]:
list(selection[1:])

[0.0,
 0.0,
 1.0,
 0.0,
 0.0,
 1.0,
 0.0,
 1.0,
 0.0,
 0.0,
 0.0,
 0.0,
 1.0,
 1.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

# Branch and Bound

In [20]:
f = open('knapsack/data/ks_19_0', 'r')
input_data = f.read()

In [21]:
lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])


weights = []
values = []


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))

In [22]:
bucket = [[0] * item_count]
best_value = 0
optimal_answer = sum(values)
current_level = 0

In [23]:
while len(bucket) != 0:
    new_bucket = []
    for i in range(len(bucket)):
        
        current_node = bucket[i]        
       
        #left node
        weight_test_node = current_node.copy()
        weight_test_node[current_level] = 1
        
        if (sum([a*b for a,b in zip(weights[0:current_level + 1], weight_test_node[0:current_level + 1])]) <= capacity):
            
            new_bucket.append(weight_test_node)
            
            test_value = sum([c*d for c,d in zip(values[0:current_level + 1], weight_test_node[0:current_level + 1])])
            if test_value > best_value:
                best_node = weight_test_node.copy()
                best_value = test_value
            
        #right node
        if (optimal_answer - sum([c*d for c,d in zip(values[0:current_level + 1], list(1 - np.array(current_node))[0:current_level + 1])]) > best_value):
            new_bucket.append(current_node)
            

    
    if current_level < item_count - 1:
        current_level += 1

        
    else:
        break
    
    bucket = new_bucket.copy()

print(best_node)   
print(best_value)

[0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0]
12248


## With Linear Relaxation

In [24]:
f = open('knapsack/data/ks_19_0', 'r')
input_data = f.read()

In [25]:
lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])


weights = []
values = []


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))

In [26]:
#create a DataFrame table to 1) sort the items by value/weight and 2) capture the original order of the items so that we can put them in their original order at the end
position_table = pd.DataFrame()
position_table['Values'] = values
position_table['Weights'] = weights
position_table['original_order'] = range(item_count)
position_table['value_per_weight'] = [a/b for a,b in zip(values, weights)]

value_table = position_table.sort_values('value_per_weight', ascending = False).reset_index(drop = True)


In [27]:
#find an upper bound for the value of the knapsack

current_capacity = capacity
index = 0
optimal_answer = 0

while current_capacity > 0:
    if value_table.Weights[index] <= current_capacity:
        optimal_answer += value_table.Values[index]
        current_capacity -= value_table.Weights[index]
    else:
        multiplier = current_capacity/value_table.Weights[index]
        optimal_answer += value_table.Values[index] * multiplier
        current_capacity -= value_table.Weights[index] * multiplier 
    index += 1

    
print(optimal_answer)

12901.542093685946


In [28]:
#get the values and weights in the new order
values = list(value_table.Values)
weights = list(value_table.Weights)

#start at the upper most node (aka taking NO items)
bucket = [[0] * item_count]
best_value = 0

current_level = 0

In [29]:
#when we choose NOT to include an item, check if the optimal evaluation is greater than the current best value
def opt_eval():
    
    opt_capacity = capacity
    opt_evaluation = 0

    
    index = 0
    
    while opt_capacity > 0 and index < item_count: 
        
        if index in range(current_level + 1):
            if current_node[index] == 1:
            
                opt_evaluation += value_table.Values[index]
                opt_capacity -= value_table.Weights[index]
          
            
        elif value_table.Weights[index] <= opt_capacity:
            opt_evaluation += value_table.Values[index]
            opt_capacity -= value_table.Weights[index]
   
            
        else:
            mult = opt_capacity/value_table.Weights[index]
            opt_evaluation += value_table.Values[index] * mult
            opt_capacity -= value_table.Weights[index] * mult 
        
            
        index += 1

    return opt_evaluation > best_value



In [30]:
while len(bucket) != 0:
    new_bucket = []
    for i in range(len(bucket)):
        
        current_node = bucket[i]        
       
        #left node
        weight_test_node = current_node.copy()
        weight_test_node[current_level] = 1
        
        if (sum([a*b for a,b in zip(weights[0:current_level + 1], weight_test_node[0:current_level + 1])]) <= capacity):
            
            new_bucket.append(weight_test_node)
            
            test_value = sum([c*d for c,d in zip(values[0:current_level + 1], weight_test_node[0:current_level + 1])])
            if test_value > best_value:
                best_node = weight_test_node.copy()
                best_value = test_value
            
        #right node
        if opt_eval():
            new_bucket.append(current_node)
            

    
    if current_level < item_count - 1:
        current_level += 1
        
        
    else:
        break
        
    #print(best_value)
    #print(current_level)
    bucket = new_bucket.copy()


value_table['Answer'] = best_node
print(list(value_table.sort_values('original_order').Answer))
print(best_value)

[0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0]
12248


# Using OR-Tools

In [4]:
from __future__ import print_function
from ortools.algorithms import pywrapknapsack_solver

In [32]:
f = open('knapsack/data/ks_19_0', 'r')
input_data = f.read()

lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacities = [int(firstLine[1])]


weights = []
values = []


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))

weights = [weights]

In [33]:
solver = pywrapknapsack_solver.KnapsackSolver(
    pywrapknapsack_solver.KnapsackSolver.
    KNAPSACK_MULTIDIMENSION_BRANCH_AND_BOUND_SOLVER, 'KnapsackExample')

In [34]:
solver.Init(values, weights, capacities)
computed_value = solver.Solve()
packed_items = []
packed_weights = []
total_weight = 0
print('Total value =', computed_value)
for i in range(len(values)):
    if solver.BestSolutionContains(i):
        packed_items.append(i)
        packed_weights.append(weights[0][i])
        total_weight += weights[0][i]
print('Total weight:', total_weight)
print('Packed items:', packed_items)
print('Packed_weights:', packed_weights)

Total value = 12248
Total weight: 30996
Packed items: [2, 5, 7, 12, 13]
Packed_weights: [7390, 2744, 7280, 3926, 9656]


## Knapsack Problem with 10,000 items

### Branch and bound with linear relaxation

In [7]:
f = open('knapsack/data/ks_10000_0', 'r')
input_data = f.read()

lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])


weights = []
values = []


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))
    
#create a DataFrame table to 1) sort the items by value/weight and 2) capture the original order of the items so that we can put them in their original order at the end
position_table = pd.DataFrame()
position_table['Values'] = values
position_table['Weights'] = weights
position_table['original_order'] = range(item_count)
position_table['value_per_weight'] = [a/b for a,b in zip(values, weights)]

value_table = position_table.sort_values('value_per_weight', ascending = False).reset_index(drop = True)


#find an upper bound for the value of the knapsack

current_capacity = capacity
index = 0
optimal_answer = 0

while current_capacity > 0:
    if value_table.Weights[index] <= current_capacity:
        optimal_answer += value_table.Values[index]
        current_capacity -= value_table.Weights[index]
    else:
        multiplier = current_capacity/value_table.Weights[index]
        optimal_answer += value_table.Values[index] * multiplier
        current_capacity -= value_table.Weights[index] * multiplier 
    index += 1

    

#get the values and weights in the new order
values = list(value_table.Values)
weights = list(value_table.Weights)

#start at the upper most node (aka taking NO items)
bucket = [[0] * item_count]
best_value = 0

current_level = 0

#when we choose NOT to include an item, check if the optimal evaluation is greater than the current best value
def opt_eval():
    
    opt_capacity = capacity
    opt_evaluation = 0

    
    index = 0
    
    while opt_capacity > 0 and index < item_count: 
        
        if index in range(current_level + 1):
            if current_node[index] == 1:
            
                opt_evaluation += value_table.Values[index]
                opt_capacity -= value_table.Weights[index]
          
            
        elif value_table.Weights[index] <= opt_capacity:
            opt_evaluation += value_table.Values[index]
            opt_capacity -= value_table.Weights[index]
   
            
        else:
            mult = opt_capacity/value_table.Weights[index]
            opt_evaluation += value_table.Values[index] * mult
            opt_capacity -= value_table.Weights[index] * mult 
        
            
        index += 1

    return opt_evaluation > best_value

while len(bucket) != 0:
    new_bucket = []
    for i in range(len(bucket)):
        
        current_node = bucket[i]        
       
        #left node
        weight_test_node = current_node.copy()
        weight_test_node[current_level] = 1
        
        if (sum([a*b for a,b in zip(weights[0:current_level + 1], weight_test_node[0:current_level + 1])]) <= capacity):
            
            new_bucket.append(weight_test_node)
            
            test_value = sum([c*d for c,d in zip(values[0:current_level + 1], weight_test_node[0:current_level + 1])])
            if test_value > best_value:
                best_node = weight_test_node.copy()
                best_value = test_value
            
        #right node
        if opt_eval():
            new_bucket.append(current_node)
            

    
    if current_level < item_count - 1:
        current_level += 1
        
        
    else:
        break
        
    #print(best_value)
    #print(current_level)
    bucket = new_bucket.copy()


value_table['Answer'] = best_node
#print(list(value_table.sort_values('original_order').Answer))
print(best_value)

1099893


In [21]:
#the items that are in the knapsack
[i for i in range(10000) if list(value_table.sort_values('original_order').Answer)[i] != 0]

[568,
 1824,
 2192,
 2641,
 2827,
 2946,
 3003,
 3023,
 6113,
 7055,
 7498,
 7577,
 8034,
 9431,
 9756]

### Using OR-Tools 
note: this solver takes a fraction of a second to solve the problem, versus the previous solver (which took about half an hour)

In [13]:
f = open('knapsack/data/ks_10000_0', 'r')
input_data = f.read()

lines = input_data.split('\n')

firstLine = lines[0].split()
item_count = int(firstLine[0])
capacities = [int(firstLine[1])]


weights = []
values = []


for i in range(1, item_count+1):
    values.append(int(lines[i].split()[0]))
    weights.append(int(lines[i].split()[1]))

weights = [weights]

solver = pywrapknapsack_solver.KnapsackSolver(
    pywrapknapsack_solver.KnapsackSolver.
    KNAPSACK_MULTIDIMENSION_BRANCH_AND_BOUND_SOLVER, 'KnapsackExample')

solver.Init(values, weights, capacities)
computed_value = solver.Solve()
packed_items = []
packed_weights = []
total_weight = 0
print('Total value =', computed_value)
for i in range(len(values)):
    if solver.BestSolutionContains(i):
        packed_items.append(i)
        packed_weights.append(weights[0][i])
        total_weight += weights[0][i]
print('Total weight:', total_weight)
print('Packed items:', packed_items)
print('Packed_weights:', packed_weights)

Total value = 1099893
Total weight: 999994
Packed items: [568, 1824, 2192, 2641, 2827, 2946, 3003, 3023, 6113, 7055, 7498, 7577, 8034, 9431, 9756]
Packed_weights: [69221, 56051, 45426, 104666, 4148, 37964, 83877, 161215, 9, 99123, 21, 98789, 62979, 13082, 163423]
