<a href="https://colab.research.google.com/github/Anakeyn/complexity2/blob/main/complexity2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Complexity analysis - Practical session 2: NP Completness and the backpack/knapsack problem

In this TP we propose to examine different ways of coding and solving problems of the NP class. We are interested in the problem of the backpack.
We suppose that we have $l$ items each having a utility (or gain) $u_i$. Each of these items has a weight $m_i$. We try to maximize the gain by packing
as many items as possible in a maximum capacity bag $M$. We distinguish two interesting cases :
 - the items are only available in one copy, i.e. we are trying to determine the quantity $x_i \in \{0,1\}$ associated with each item, 
 - one can take several times the same item, i.e. $x_i \in \mathbb{N}^+$. The problem is formalized as follows:
\begin{aligned}
U & =  \text{max}_{x_i} \sum_i x_i u_i \\\
    & \text{s.c.} \sum_i x_i m_i \leq M
\end{aligned}

We will examine different methods to solve this problem allowing you to feel its complexity. You generate for each test that you will make a utility vector and a vector 
of weights that will be randomly drawn integers in $[1.10]$. You will set $M$ according to the number of possible items, for example if you have $n$ items (which will be a 
parameter of your test procedure), you will be able to choose $M=7n$. You will write a function `solve_bag` for each variant that will take the utility, weight and $M$ vectors as parameters and 
will return the maximum value (total gain) reached, as well as the time related to the calculation. 

Write the code below that will generate the possible data for this problem.


In [36]:
#importation des bibliothèques
import numpy as np

In [102]:

#create positions, utilities and weights vectors
def createPositionsUtilitiesWeightsVectors(items = 10) :
  positions  = np.arange(0, items)
  utilities = np.random.randint(low=1, high=10, size=(items))
  wheights = np.random.randint(low=1, high=10, size=(items))
  return positions, utilities, wheights



In [103]:
myPositions, myUtilities, myWeights = createPositionsUtilitiesWeightsVectors(10)

In [104]:
print(myPositions)

[0 1 2 3 4 5 6 7 8 9]


In [39]:
print(myUtilities)

[5 2 2 4 8 2 9 2 4 6]


In [40]:
print(myWeights)

[1 6 4 7 1 1 1 1 1 6]


In [105]:
#Rem : test with itertools to generate all combinations. #Marche assez bien mais si j'ai bvien compris on ne doit pas utilioser itertools

import itertools

stuff = myPositions
for L in range(2, len(stuff) + 1):
    for subset in itertools.combinations(stuff, L):
        print(L)
        print(subset)

2
(0, 1)
2
(0, 2)
2
(0, 3)
2
(0, 4)
2
(0, 5)
2
(0, 6)
2
(0, 7)
2
(0, 8)
2
(0, 9)
2
(1, 2)
2
(1, 3)
2
(1, 4)
2
(1, 5)
2
(1, 6)
2
(1, 7)
2
(1, 8)
2
(1, 9)
2
(2, 3)
2
(2, 4)
2
(2, 5)
2
(2, 6)
2
(2, 7)
2
(2, 8)
2
(2, 9)
2
(3, 4)
2
(3, 5)
2
(3, 6)
2
(3, 7)
2
(3, 8)
2
(3, 9)
2
(4, 5)
2
(4, 6)
2
(4, 7)
2
(4, 8)
2
(4, 9)
2
(5, 6)
2
(5, 7)
2
(5, 8)
2
(5, 9)
2
(6, 7)
2
(6, 8)
2
(6, 9)
2
(7, 8)
2
(7, 9)
2
(8, 9)
3
(0, 1, 2)
3
(0, 1, 3)
3
(0, 1, 4)
3
(0, 1, 5)
3
(0, 1, 6)
3
(0, 1, 7)
3
(0, 1, 8)
3
(0, 1, 9)
3
(0, 2, 3)
3
(0, 2, 4)
3
(0, 2, 5)
3
(0, 2, 6)
3
(0, 2, 7)
3
(0, 2, 8)
3
(0, 2, 9)
3
(0, 3, 4)
3
(0, 3, 5)
3
(0, 3, 6)
3
(0, 3, 7)
3
(0, 3, 8)
3
(0, 3, 9)
3
(0, 4, 5)
3
(0, 4, 6)
3
(0, 4, 7)
3
(0, 4, 8)
3
(0, 4, 9)
3
(0, 5, 6)
3
(0, 5, 7)
3
(0, 5, 8)
3
(0, 5, 9)
3
(0, 6, 7)
3
(0, 6, 8)
3
(0, 6, 9)
3
(0, 7, 8)
3
(0, 7, 9)
3
(0, 8, 9)
3
(1, 2, 3)
3
(1, 2, 4)
3
(1, 2, 5)
3
(1, 2, 6)
3
(1, 2, 7)
3
(1, 2, 8)
3
(1, 2, 9)
3
(1, 3, 4)
3
(1, 3, 5)
3
(1, 3, 6)
3
(1, 3, 7)
3
(1, 3, 8)
3
(1, 3, 9)
3
(1, 4

## Brute force approach

We don't bother with complex considerations here: write a method that calculates all possible combinations ($2^n$!), evaluates them, and returns the optimal gain. 


In [90]:
#retourne toutes les combinaisons de positions possibles de dimension n à partir de myUtility
def myPosCombinations(myUtility,n) :
    if n==0:
        return [[]]
    positions=[]
    for i in range(0, len(myUtility)):
        m=i  #on garde toutes les positions possibles 
        remMyUtility=myUtility[i+1:]
        for p in myPosCombinations(myUtility,n-1):
            positions.append([m]+p)
    return positions



In [91]:
#test on affiche tous les couples possibles
myCombinationsby2 = myPosCombinations(myUtility, 2)
print(myCombinationsby2)

[[0, 0], [0, 1], [0, 2], [0, 3], [0, 4], [0, 5], [0, 6], [0, 7], [0, 8], [0, 9], [1, 0], [1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 7], [1, 8], [1, 9], [2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [2, 6], [2, 7], [2, 8], [2, 9], [3, 0], [3, 1], [3, 2], [3, 3], [3, 4], [3, 5], [3, 6], [3, 7], [3, 8], [3, 9], [4, 0], [4, 1], [4, 2], [4, 3], [4, 4], [4, 5], [4, 6], [4, 7], [4, 8], [4, 9], [5, 0], [5, 1], [5, 2], [5, 3], [5, 4], [5, 5], [5, 6], [5, 7], [5, 8], [5, 9], [6, 0], [6, 1], [6, 2], [6, 3], [6, 4], [6, 5], [6, 6], [6, 7], [6, 8], [6, 9], [7, 0], [7, 1], [7, 2], [7, 3], [7, 4], [7, 5], [7, 6], [7, 7], [7, 8], [7, 9], [8, 0], [8, 1], [8, 2], [8, 3], [8, 4], [8, 5], [8, 6], [8, 7], [8, 8], [8, 9], [9, 0], [9, 1], [9, 2], [9, 3], [9, 4], [9, 5], [9, 6], [9, 7], [9, 8], [9, 9]]


In [92]:
#test nombre de couples possibles
print(len(myCombinationsby2))

100


In [95]:
print(myCombinationsby2[0][0])

0


In [96]:
#on teste par 3 
myCombinationsby3= myPosCombinations(myUtility, 3)
print(myCombinationsby3)

[[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 0, 3], [0, 0, 4], [0, 0, 5], [0, 0, 6], [0, 0, 7], [0, 0, 8], [0, 0, 9], [0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 1, 3], [0, 1, 4], [0, 1, 5], [0, 1, 6], [0, 1, 7], [0, 1, 8], [0, 1, 9], [0, 2, 0], [0, 2, 1], [0, 2, 2], [0, 2, 3], [0, 2, 4], [0, 2, 5], [0, 2, 6], [0, 2, 7], [0, 2, 8], [0, 2, 9], [0, 3, 0], [0, 3, 1], [0, 3, 2], [0, 3, 3], [0, 3, 4], [0, 3, 5], [0, 3, 6], [0, 3, 7], [0, 3, 8], [0, 3, 9], [0, 4, 0], [0, 4, 1], [0, 4, 2], [0, 4, 3], [0, 4, 4], [0, 4, 5], [0, 4, 6], [0, 4, 7], [0, 4, 8], [0, 4, 9], [0, 5, 0], [0, 5, 1], [0, 5, 2], [0, 5, 3], [0, 5, 4], [0, 5, 5], [0, 5, 6], [0, 5, 7], [0, 5, 8], [0, 5, 9], [0, 6, 0], [0, 6, 1], [0, 6, 2], [0, 6, 3], [0, 6, 4], [0, 6, 5], [0, 6, 6], [0, 6, 7], [0, 6, 8], [0, 6, 9], [0, 7, 0], [0, 7, 1], [0, 7, 2], [0, 7, 3], [0, 7, 4], [0, 7, 5], [0, 7, 6], [0, 7, 7], [0, 7, 8], [0, 7, 9], [0, 8, 0], [0, 8, 1], [0, 8, 2], [0, 8, 3], [0, 8, 4], [0, 8, 5], [0, 8, 6], [0, 8, 7], [0, 8, 8], [0, 8, 9], [0, 9, 0]

In [97]:
#test nombre de triplets possibles
print(len(myCombinationsby3)) #1000 !!!   oups

1000


In [98]:
#on boucle pour toutes les tailles de combinaisons  #Temps ENORME 
for L in range(2,len(myUtility)) :
     myCurrentValues = myPosCombinations(myUtility,L)
     print(len(myCurrentValues))

KeyboardInterrupt: ignored

i 0
i 0
i 0
p []
i 1
p []
i 2
p []
i 3
p []
i 4
p []
i 5
p []
i 6
p []
i 7
p []
p [2]
p [4]
p [8]
p [2]
p [9]
p [2]
p [4]
p [6]
i 1
i 0
p []
i 1
p []
i 2
p []
i 3
p []
i 4
p []
i 5
p []
i 6
p []
p [4]
p [8]
p [2]
p [9]
p [2]
p [4]
p [6]
i 2
i 0
p []
i 1
p []
i 2
p []
i 3
p []
i 4
p []
i 5
p []
p [8]
p [2]
p [9]
p [2]
p [4]
p [6]
i 3
i 0
p []
i 1
p []
i 2
p []
i 3
p []
i 4
p []
p [2]
p [9]
p [2]
p [4]
p [6]
i 4
i 0
p []
i 1
p []
i 2
p []
i 3
p []
p [9]
p [2]
p [4]
p [6]
i 5
i 0
p []
i 1
p []
i 2
p []
p [2]
p [4]
p [6]
i 6
i 0
p []
i 1
p []
p [4]
p [6]
i 7
i 0
p []
p [6]
i 8
p [2, 2]
p [2, 4]
p [2, 8]
p [2, 2]
p [2, 9]
p [2, 2]
p [2, 4]
p [2, 6]
p [2, 4]
p [2, 8]
p [2, 2]
p [2, 9]
p [2, 2]
p [2, 4]
p [2, 6]
p [4, 8]
p [4, 2]
p [4, 9]
p [4, 2]
p [4, 4]
p [4, 6]
p [8, 2]
p [8, 9]
p [8, 2]
p [8, 4]
p [8, 6]
p [2, 9]
p [2, 2]
p [2, 4]
p [2, 6]
p [9, 2]
p [9, 4]
p [9, 6]
p [2, 4]
p [2, 6]
p [4, 6]
i 1
i 0
i 0
p []
i 1
p []
i 2
p []
i 3
p []
i 4
p []
i 5
p []
i 6
p []
p [4]
p [8]
p [2]
p [9]
p

In [13]:
l = n_length_combo(myUtility, 2)
print(l)

<generator object n_length_combo at 0x7fd558f2f5f0>


Same thing but this time you can choose the same item several times. For each item we can determine the maximum limit of the number of times we can choose this item as the integer part of 
$M/m_i$. Be careful, calculation times can become very long for high $M$ values.

## Greedy approach

For each object the gain/mass ratio ($u_i/m_i$) is calculated. The objects are sorted in descending order, then the bag is filled in this order until no more items can be added.
Compare the quality of the solution obtained with that of the previous exact solver. In particular, find cases where the gluttonous strategy does not give the optimal solution of the 
problem. Here again you will code two versions of the function (only one item available and unlimited number of items available). 


## Dynamic programming

We limit ourselves here to the case where only one item of each object is available.

The idea of dynamic programming is to solve incrementally simpler versions of the problem, and to store intermediate results necessary to 
add new variables. A time-space compromise is then made. In the case of the backpack problem, the problem is said to be with *optimal substructure*, that is to say that we 
can find the optimal value of the problem at variable $i$ from the optimal value at variable $i-1$. 
The following quantity $P(k,m)$, describing the state of the system for k variables, is defined by recurrence :
$$
\begin{eqnarray}
P(k,m) & = & \text{max}_{x_i} \sum^k_i x_i u_i \\
    &    & \text{s.c.} \sum_i x_i m_i \leq m
\end{eqnarray}
$$
then the optimal solution is either :
 - the optimal solution $P(k-1,m)$ where one chooses not to add the item, i.e. $x_k=0$.
- the optimal solution $P(k-1,m-m_k)+u_k$ where we choose to add the item, i.e. $x_k=1$.

You just have to build a table of the different possibilities $P(k,m)$. Once this table is built, you just have to start from the element 
$P(k,M)$ and go back to the $P(0,.)$ element to find out whether you choose the item or not and thus construct the solution.

We note then that the complexity of the algorithm is in time and space $o(nM)$. Although polynomial, we have not shown that $P=NP$: 
since the coding of $M$ is done on $log(M)$ bits, we remain within the exponential complexity of the size of the input.
  

Code and test this algorithm.

Optionnaly, you can get some help from the Wikipedia page on [0-1 knapasack problem](https://en.wikipedia.org/wiki/Knapsack_problem).

In [None]:
#Dynamic approach with one item only in each backpack
#note M is Max Weight in the bag
def DybnamicSolveBagOneItem(M, utility, weights) :
  print("Max Weights", M)

  bestSequence = ""
  cumulSequence = ""
  bestGain = 0
  currentGain = 0
  currentWeight = 0
  cumulGain = 0
  cumulWeight = 0
  i=0  #on démarre au premier item
  
  while (i<len(utility) ) :  #first item
      print("i", i)
      cumulSequence = str(i)
      cumulGain = 0
      cumulWeight = 0
      currentGain = utility[i]
      currentWeight = weights[i]
      if currentWeight > M :  #vérifie si le premier item n'est pas superieur au max de poids
        print("currentWeight FIRST item", currentWeight)
        i=i+1
      else :
        cumulGain = currentGain 
        cumulWeight = currentWeight
        j=0  
        while (j < len(utility) ) : #next item
          print("j", j)
          print("1 currentWeight + cumulWeight ", currentWeight+cumulWeight) 
          if (i==j) : j=i+1 #on ne prend pas le premier
          if (j>= len(utility)) : break
          currentGain = utility[j]
          currentWeight = weights[j]
          if currentWeight > M :  #vérifie si l'item suivant n'est pas superieur au max de poids on ne le prend pas
            print("currentWeight next item", currentWeight)
            j=j+1  #on va tester le suivant
          else :
            if currentWeight+cumulWeight > M : #vérifie si le cumul n'est pas superieur au max de poids on ne le prend pas
                print("2 currentWeight + cumulWeight ", currentWeight+cumulWeight)  
                j=j+1  # on ne prend pas l'objet et on va chercher le suivant 
            else : #on prend l'item
                cumulWeight = currentWeight+cumulWeight
                cumulGain = currentGain+cumulGain
                cumulSequence = cumulSequence + str(j)
                if cumulGain > bestGain :
                    bestSequence = cumulSequence
                    bestGain = cumulGain
                    print("cumulWeight ", cumulWeight)
                    print("bestGain ", bestGain)
                    print("bestSequence ", bestSequence)
                j=j+1
      i=i+1    
  return(bestGain)






In [None]:
#creation des vecteur d'utilité et de poinds
myUtility, myWeights = createUtilityWeightsVector(10)

In [None]:
#appel à la fonction
DynamicSolveBagOneItem(40, myUtility, myWeights)

Max Weights 40
i 0
j 0
1 currentWeight + cumulWeight  18
cumulWeight  13
bestGain  4
bestSequence  01
j 2
1 currentWeight + cumulWeight  17
cumulWeight  21
bestGain  7
bestSequence  012
j 3
1 currentWeight + cumulWeight  29
cumulWeight  26
bestGain  13
bestSequence  0123
j 4
1 currentWeight + cumulWeight  31
cumulWeight  30
bestGain  19
bestSequence  01234
j 5
1 currentWeight + cumulWeight  34
cumulWeight  36
bestGain  25
bestSequence  012345
j 6
1 currentWeight + cumulWeight  42
cumulWeight  39
bestGain  32
bestSequence  0123456
j 7
1 currentWeight + cumulWeight  42
2 currentWeight + cumulWeight  41
j 8
1 currentWeight + cumulWeight  41
2 currentWeight + cumulWeight  47
j 9
1 currentWeight + cumulWeight  47
cumulWeight  40
bestGain  39
bestSequence  01234569
i 1
j 0
1 currentWeight + cumulWeight  8
j 1
1 currentWeight + cumulWeight  22
j 3
1 currentWeight + cumulWeight  29
j 4
1 currentWeight + cumulWeight  31
j 5
1 currentWeight + cumulWeight  34
j 6
1 currentWeight + cumulWeight  42

39

## Summary and final conclusions
For each method, vary the number of items $n$, and measure an average time taken on $10$ resolution of the problem. Draw the corresponding average execution time curves
for the 3 methods in both cases (one or more times the same item, where dynamic programming will be excluded). 