#### Purpose  
This notebook (written in Python 3) is intended to address 
a Data Science challenge

#### Candidate    
Andrea Chiappo  

#### Problem  
_Is it possible to fill 10 backpacks with 20 different size boxes?_  


Variables:  
- two different box types: type A and type B  
- boxes have different sizes  
- backpacks capacity differs  
- box timing: boxes at the end of the truck might be better  

#### Solution    
I propose two possible solutions to this Data Science challenge:  
a methodic one and a pragmatic one.  

##### Methodic solution  
The problem at hand resembles the so-called _Knapsack Problem_ 
with three additional levels of complexity:   
- the presence of 10 backpacks of different capacity  
- the presence of two classes of boxes (typeA and typeB)  
- the presence of a time ordering condition to the boxing order  

From a strict interpretation of the problem statement above, we can temporarily 
neglect the last two points above. Considering only the need to allocate the 20 
boxes into 10 different backpacks, a solution can be obtained by implementing a 
"recursive" version of the **0-1 Knapsack Problem**:  
assigning to each box a size and a value and ordering the 10 backpacks' 
capacity in a given sequence, we start by applying the _0-1 Knapsack problem_ 
solution recipe assuming the first capacity of the sequence and using all 20 
boxes. Once the first backpack is filled - the maximum available space has been 
occupied - we can repeat the procedure, adopting only the remaining boxes and 
the second backpack. The process is repeated until all backpacks have been used. 
The upshot will be a series of backpacks filled to a certain degree, as 
quantified by the maximum value obtained via the _0-1 Knapsack_ routine, and a 
set of unallocated boxes. Summing the former quantities gives the total 
performance of the allocation process.  
This procedure does not necessarily produce the most efficient outcome. 
To search for other, more performing possibilities, we can shuffle the order of 
the backpacks - thus their capacity - and repeat the procedure above from the 
beginning, each time recording the global (cumulative) maximum value obtained 
via the _0-1 Knapsack_ procedure. In the end, the sequence of backpacks which 
yields the highest cumulative value represents the best performing choice.  

The time ordering on the boxes and the two class labels can be reintroduced 
adding the features to the data. Using this information at the end of the 
process above, one can group the boxes following their time affinity and class 
membership; inserting this criteria during the boxes allocation procedure above 
might affect the efficiency of the process. At this point, one can identify 
which sequence of backpacks yields the best trade-off between the size-value 
allocation performance and the time or class grouping of the boxes.


##### Pragmatic solution  
A _brute force_ solution could entail exploring all possible allocations 
by simulating the scenarios. In this case, one would compute all possible 
combinations of boxes $\binom nk$ and detect the ones which, for a given 
capacity, occupy to large space available. This approach, on one hand, 
might more easily accommodate the time ordering condition and should 
perform well when small quantities (number of boxes and backpacks) are 
involved. However, its performance might quickly escalate as the number 
of elements involved grows. On the other hand, the _Methodic solution_ 
presented above, scale as $O(nWur)$, with $n$ the number of boxes, $W$ 
the capacity of each backpack, $u$ the number of backpacks (assuming, for 
simplicity that all have $\sim W$) and $r$ the number of reshufflings.

#### Numerical example  
Below is displayed a numerical example of the implementation of the 
_Methodic solution_ presented above. The starting point is a definition 
of the function implementing the _0-1 Knapsack problem_ algorithm. For 
a given capacity of the backpack, a number of boxes and the arrays 
containing the sizes and values of the boxes, this function returns 
the indices of the boxes, and the corresponding cumulative value, that 
can be allocated inside the backpack.

In [281]:
import numpy as np

def knapSack(C, sz, vl, cl, tl, n): 
    # function solving the 0-1 Knapsack problem 
    #
    # input:
    #   - C = capacity of the backpack
    #   - sz = array of sizes (backpacks capacity)
    #   - vl = array of backpacks values
    #   - cl = array of class labels on boxes
    #   - tl = array of time labels on boxes
    #   - n = number of backpacks to be allocated
    #
    # output:
    #   - total value attained from allocation process
    #   - indices of items placed in the knapsack
    #   - time label of allocated boxes
    #   - class membership of allocated boxes
    #
    # NB: allocated items indices entering the second returned value refer
    #     to the positional argument of the provided sz and vl arrays
    #     (e.g. '1,2,3,' means first, second, third argument)
    #
    
    # initialiase an array of empty dictionaries
    K = [[{'v':0, 'i':'', 'c':'', 't':[]} for y in range(C+1)] for z in range(n+1)]
    
    # Build table K[][] in bottom up manner 
    for i in range(n+1): 
        for s in range(C+1): 
            if i==0 or s==0: 
                K[i][s]['v'] = 0
                K[i][s]['i'] = ''
                K[i][s]['c'] = ''
                K[i][s]['t'] = ''

            elif sz[i-1] <= s:
                T1 = vl[i-1] 
                T2 = K[i-1][s-sz[i-1]]['v']
                T3 = K[i-1][s]['v']
                if T1+T2>=T3:
                    K[i][s]['v'] = T1+T2
                    K[i][s]['i'] = K[i][s]['i'] + '%i,'%i
                    K[i][s]['c'] = K[i][s]['c'] + '%s,'%cl[i-1]
                    K[i][s]['t'].append(tl[i-1])
                    if T2!=0:
                        K[i][s]['i'] = K[i][s]['i'] + K[i-1][s-sz[i-1]]['i']
                        K[i][s]['c'] = K[i][s]['c'] + K[i-1][s-sz[i-1]]['c']
                        K[i][s]['t'].extend(K[i-1][s-sz[i-1]]['t'])
                else:
                    K[i][s]['v'] = T3
                    K[i][s]['i'] = K[i-1][s]['i']
                    K[i][s]['c'] = K[i-1][s]['c']
                    K[i][s]['t'].extend(K[i-1][s]['t'])
            else: 
                K[i][s]['v'] = K[i-1][s]['v']
                K[i][s]['i'] = K[i-1][s]['i']
                K[i][s]['c'] = K[i-1][s]['c']
                K[i][s]['t'] = K[i-1][s]['t']
    return K[i][s]

Now I initialise the arrays containing:
- the backpacks' capacity
- the boxes' value
- the boxes' size
- the boxes' class label
- the boxes' time label

In [282]:
nBc = 10   # total number of backpacks
minC = 10  # minimum backpack capacity
maxC = 30  # maximum backpack capacity

WW = np.random.randint(minC, maxC, nBc)  # array of backpacks' capacity
OO = list(range(len(WW)))                # initial ordering of backpacks

nBx = 20   # total number of boxes 
minV = 50  # minimum boxes value
maxV = 100 # maximum boxes value
minS = 1   # minimum boxes capacity
maxS = 20  # maximum boxes capacity

vl = np.random.randint(minV, maxV, nBx)  # boxes values
sz = np.random.randint(minS, maxS, nBx)  # boxes sizes
nn = len(vl)                             # initial number of backpacks

tmin = 100
tmax = 200
tl = np.random.randint(tmin, tmax, nBx)          # boxes time labels
cl = np.random.choice(['typeA', 'typeB'], nBx, ) # boxes class labels

In [290]:
from random import shuffle

Nint = 10 # number of iterations (reshufflings)

Varr = []
Narr = []
Barr = []
Carr = []
Tarr = []

print('{:>10} | {:>10} | {:>10}'.format('iteration', 'local max', 'N boxes used'))
for r in range(Nint):
    
    V = 0
    N = 0
    B = []
    C = []
    T = []

    newvl = vl
    newsz = sz
    newcl = cl
    newtl = tl
    newn = nn

    for w,W in enumerate(WW[OO]):
        res = knapSack(W, newsz, newvl, newcl, newtl, newn)

        V += res['v']
        
        items = res['i'].split(',')
        items = [i-1 for i in map(int,filter(bool,items))]
        
        N += len(items)
        B.extend(items)
        C.extend(cl[items])
        T.extend(tl[items])
        
        newvl = np.delete(newvl, items)
        newsz = np.delete(newsz, items)
        newcl = np.delete(newcl, items)
        newtl = np.delete(newtl, items)
        newn = len(newvl)

    Varr.append(V)
    Narr.append(N)
    Barr.append(B)
    Carr.append(C)
    Tarr.append(T)
    print('{:10} {:10} {:10}'.format(r+1,V,N))
    shuffle(OO)

 iteration |  local max | N boxes used
         1       1098         14
         2       1098         14
         3       1012         13
         4       1098         14
         5       1084         14
         6       1084         14
         7       1098         14
         8       1084         14
         9       1098         14
        10       1098         14


In [285]:
maxV = max(Varr)
indm = np.where(Varr==maxV)[0][0]
minN = Narr[indm]
print('\n global maximum value attainable : ',maxV)
print('\n maximum number of boxes used : ',minN)


 global maximum value attainable :  1098

 maximum number of boxes used :  14
