# Knapsack problem
This is a very common combinatorial optimization problem where you are given a knapsack of a given weight capacity $C$ and a bunch of items with values and weight. The goal is to fill the knapsack with the best aggregated value, respecting the weight constraint.

![knapsack problem illustration](https://upload.wikimedia.org/wikipedia/commons/f/fd/Knapsack.svg "Image from wikipedia: https://commons.wikimedia.org/wiki/File:Knapsack.svg").

We handle here the *0-1 knapsack problem* where each item can only be taken once.

Many different optimization approach can be tested on the combinatorial problem, we'll see a few during the notebook:

- [Greedy heuristic methods](#Greedy-heuristic)
- [Mixed Integer Linear Programming (MILP)](#Mixed-integer-linear-programming-(MILP))
- [Constraint Programming (CP)](#Constraint-Programming-(CP))
- [Large neighborhood search](#Large-neighborhood-search), a metaheuristic on top of CP or MILP

## Prerequisites

Before running this notebook, you need to 
- install [minizinc](https://www.minizinc.org/) and config it so that it is found by the jupyter kernel (on linux, it means updating the `PATH` variable)
- install discrete-optimization in your jupyter kernel
    ```
    pip install discrete-optimization
    ```



### Imports

In [None]:
import logging
import random

import nest_asyncio
import numpy as np

from discrete_optimization.datasets import fetch_data_from_coursera
from discrete_optimization.generic_tools.cp_tools import CPSolverName, ParametersCP
from discrete_optimization.generic_tools.do_problem import get_default_objective_setup
from discrete_optimization.generic_tools.lns_cp import LNS_CP
from discrete_optimization.generic_tools.lp_tools import MilpSolverName, ParametersMilp
from discrete_optimization.knapsack.knapsack_parser import (
    get_data_available,
    parse_file,
)
from discrete_optimization.knapsack.knapsack_solvers import look_for_solver
from discrete_optimization.knapsack.solvers.cp_solvers import CPKnapsackMZN2
from discrete_optimization.knapsack.solvers.greedy_solvers import GreedyBest
from discrete_optimization.knapsack.solvers.knapsack_lns_cp_solver import (
    ConstraintHandlerKnapsack,
)
from discrete_optimization.knapsack.solvers.knapsack_lns_solver import (
    InitialKnapsackMethod,
    InitialKnapsackSolution,
)
from discrete_optimization.knapsack.solvers.lp_solvers import LPKnapsack

# patch asyncio so that applications using async functions can run in jupyter
nest_asyncio.apply()

# set logging level
logging.basicConfig(level=logging.INFO)

### Download datasets

If not yet available, we import the datasets from [coursera](https://github.com/discreteoptimization/assignment).

In [None]:
needed_datasets = ["ks_500_0"]
files_available_paths = get_data_available()

download_needed = False
for dataset in needed_datasets:
    if len([f for f in files_available_paths if dataset in f]) == 0:
        download_needed = True
        break

if download_needed:
    fetch_data_from_coursera()

We will use the dataset [ks_500_0](https://github.com/discreteoptimization/assignment/blob/master/knapsack/data/ks_500_0) where we have 500 items at hand to put in the knapsack.

### Set random seed

If reproducible results are wanted, we can fix the random seed.

In [None]:
def set_random_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)


set_random_seed()

## Parse input data

We parse the dataset file to load it as a discrete-optimization problem. In this case we get a `discrete_optimization.knapsack.knapsack_model.KnapsackModel`.

In [None]:
files_available_paths = get_data_available()
model_file = [f for f in files_available_paths if "ks_500_0" in f][0]
model = parse_file(model_file, force_recompute_values=True)
print(type(model))

Here is a representation of the corresponding model.

In [None]:
print(model)

We can get a first solution which respect the constraint (but of course is not optimal) by not taking any item.

In the following representation of a solution:
- "Value" is the aggregated values of the taken items, 
- "Weight" is the aggregated weight of the taken items, which should respect the knapsack capacity constraint
- "Taken" is a list of number of items taken for each type. For instance [0, 1, 0, ...] means that
  - item 0 is not taken
  - item 1 is taken
  - item 2 is not taken
  - ...

In [None]:
solution = model.get_dummy_solution()
print(solution)

## Solve

We can get the list of solvers compatible with this model.

In [None]:
look_for_solver(model)

### Greedy heuristic

The first solver we try here is the greedy solver which is very fast but sub-optimal. The solution it will find is not necessarily the best possible solution, but it will respect the constraints.

The greedy method consists in sorting the items by density which is defined as $\frac{\text{value}}{\text{weight}}$ and trying to fill the knapsack starting by the denser items. We stop when further items cannot respect the capacity constraint.

We first intialize the solver.

In [None]:
greedy_solver = GreedyBest(knapsack_model=model)

We run it.

In [None]:
results_greedy = greedy_solver.solve()

We retrieve and display the best solution found by the greedy solver.

In [None]:
print(results_greedy.get_best_solution())

Different KPI of the solution are printed but you can retrieve them by calling the `evaluate` function of the knapsack problem:

In [None]:
kpis = model.evaluate(results_greedy.get_best_solution())
print(kpis)

### Mixed-integer linear programming (MILP)

[Linear programming (LP)](https://en.wikipedia.org/wiki/Linear_programming) is a powerful tool to optimize a mathematical model where constraints and objective functions are all linear based. 

Mixed Integer linear programming is a special LP model where a given subset of variables have to take integer values, which makes it a **combinatorial** optimization problem, generally NP-Hard.

However using LP relaxations and [Branch and bound](https://en.wikipedia.org/wiki/Branch_and_bound) methods, solving discrete optimization problems using MILP solvers is often very efficient, which is the case for the highly linear problem that is knapsack.

Linear formulation of knapsack is pretty straightforward: 

$X_{opt}=argmax(V.x)\; s.t \; W.x\leq C \;and \; x\in \{0, 1\}^N$ where $V$ is the value vector, $W$ is the weight vector, $C$ is the capacity of the knapsack.

#### COIN-OR Branch-and-Cut solver

We will use here a solver which is a wrap around CBC solver of [mip python library](https://python-mip.readthedocs.io/en/latest/intro.html), itself a wrap around [COIN-OR Branch-and-Cut solver - CBC](https://github.com/coin-or/Cbc).

In [None]:
lp_solver_cbc = LPKnapsack(knapsack_model=model, milp_solver_name=MilpSolverName.CBC)

In [None]:
params_milp = ParametersMilp(
    time_limit=100,
    pool_solutions=10000,
    mip_gap_abs=0.0001,
    mip_gap=0.001,
    retrieve_all_solution=False,
    n_solutions_max=10000,
)
results_cbc = lp_solver_cbc.solve(parameters_milp=params_milp)

In [None]:
print(results_cbc.get_best_solution())

#### Use another MILP solver backend:  Gurobi  (optional)

If you have a license for [gurobi](https://www.gurobi.com/) which is a powerful commercial engine, you can also use it to solve the knapsack problem. 

Please uncomment the next cell, if you want to do so.

### Constraint Programming (CP)

[small description needed] 

We use here the [chuffed](https://github.com/chuffed/chuffed#description) solver which is a state of the art lazy clause solver. 

Here we let the solver run for 50s max before returning the best solution found so far.

In [None]:
cp_solver = CPKnapsackMZN2(knapsack_model=model, cp_solver_name=CPSolverName.CHUFFED)
parameters_cp = ParametersCP.default()
parameters_cp.time_limit = 50
results_cp = cp_solver.solve(parameters_cp)

In [None]:
print(results_cp.get_best_solution())

We see that the CP solver get a worse solution than the LP solver, even worse than the greedy solver. But it can be wrapped in a Large Neighborhood Search solver.

### Large neighborhood search

This is a metaheuristic on top of CP or MILP solvers.

[small description needed]

We use it here on top of the previous CP chuffed solver.

In [None]:
set_random_seed()

params_objective_function = get_default_objective_setup(problem=model)
print(params_objective_function)
params_cp = ParametersCP.default()
params_cp.time_limit = 5
params_cp.time_limit_iter0 = 5
nb_iteration_lns = 10

cp_solver = CPKnapsackMZN2(
    model,
    cp_solver_name=CPSolverName.CHUFFED,
    params_objective_function=params_objective_function,
)

# initial solution: only 0
initial_solution_provider = InitialKnapsackSolution(
    problem=model,
    initial_method=InitialKnapsackMethod.DUMMY,
    params_objective_function=params_objective_function,
)
print(initial_solution_provider)

# constraint handler
constraint_handler = ConstraintHandlerKnapsack(problem=model, fraction_to_fix=0.8)

# solve
lns_solver = LNS_CP(
    problem=model,
    cp_solver=cp_solver,
    initial_solution_provider=initial_solution_provider,
    constraint_handler=constraint_handler,
    params_objective_function=params_objective_function,
)
result_lns = lns_solver.solve_lns(
    parameters_cp=params_cp, nb_iteration_lns=nb_iteration_lns
)

In [None]:
print(result_lns.get_best_solution())

We remark that the result is better than with solely the CP solver even though we pass at most the same total time in a CP solver.

*NB: even setting random seed give different results at each run ...*

Starting from a greedy solution ensures improving the greedy result (even just a little).

In [None]:
set_random_seed()

params_objective_function = get_default_objective_setup(problem=model)
print(params_objective_function)
params_cp = ParametersCP.default()
params_cp.time_limit = 5
params_cp.time_limit_iter0 = 5
nb_iteration_lns = 10

cp_solver = CPKnapsackMZN2(
    model,
    cp_solver_name=CPSolverName.CHUFFED,
    params_objective_function=params_objective_function,
)

# initial solution: only 0
initial_solution_provider = InitialKnapsackSolution(
    problem=model,
    initial_method=InitialKnapsackMethod.GREEDY,
    params_objective_function=params_objective_function,
)
print(initial_solution_provider)

# constraint handler
constraint_handler = ConstraintHandlerKnapsack(problem=model, fraction_to_fix=0.8)

# solve
lns_solver = LNS_CP(
    problem=model,
    cp_solver=cp_solver,
    initial_solution_provider=initial_solution_provider,
    constraint_handler=constraint_handler,
    params_objective_function=params_objective_function,
)
result_lns = lns_solver.solve_lns(
    parameters_cp=params_cp, nb_iteration_lns=nb_iteration_lns
)

In [None]:
print(result_lns.get_best_solution())

## Conclusion

[text needed]