#### Purpose  
This notebook (written in Python 3) is intended to address 
the Data Science challenge presented by ABA Finance

#### Candidate    
Andrea Chiappo  

#### Problem  
_Is it possible to fill 10 backpacks with 20 different size boxes?_  


Variables:  
- two different box types: type A and type B  
- boxes have different sizes  
- backpacks capacity differs  
- box timing: boxes at the end of the truck might be better  

The first variable - the presence of two classes of boxes - 
appears to be redundant to the solution of the problem statement above.  
Therefore, this information will be neglect in the remainder of this 
document.

#### Solution    
I propose two possible solutions to this Data Science challenge:  
a methodic one and a pragmatic one.  

##### Methodic solution  
The problem at hand resembles the so-called _Knapsack Problem_ 
with two additional leves of complexity:   
- the presence of 10 backpacks of different capacity;  
- the presence of a time ordering condition to the boxing order.  

Neglecting temporarily the second point above, a solution can be 
obtained implementing a "recursive" version of the **0-1 Knapsack Problem**:  
ordering the 10 backpacks' capacity in a given sequence, we start by applying 
the 0-1 Knapsack solution recipe assuming the first capacity of the sequence 
and using all 20 boxes. Once the first backpack is filled - the maximum 
available space has been occupied - we can repeat the procedure, adopting 
only the remaining boxes and the second backpack. The process is repeated 
until all backpacks have been used. The upshot will be a series of backpacks 
filled to a certain degree, as quantified by the maximum value obtained via 
the 0-1 Knapsack routine. Summing these quantities gives the total performance 
of the allocations.  
This procedure does not necessarily produce the most efficient outcome. 
To search for other, more performing results, we can shuffle the order of 
the backpacks - thus their capacity - and repeat the procedure above, each 
time recording the global (cumulative) maximum value obtained via the 
0-1 Knapsack procedure. In the end, the sequence of backpacks which yields 
the highest cumulative value, represents the best performing choice.  

The time ordering on the boxes can be reintroduced adding a feature to the 
data. Using this information at the end of the process above, one can 
identify which backpacks can be grouped together, following their time 
affinity. Inserting this criteria during the boxes allocation procedure 
above might affect the efficiency of the process.

##### Pragmatic solution  
A _brute force_ solution could entail exploring all possible allocations 
by simulating the scenarios. In this case, one would compute all possible 
combinations of boxes $\binom nk$ and detect the ones which, for a given 
capacity, occupy to large space available. This approach, on one hand, 
might more easily accommodate the time ordering condition and should 
perform well when small quantities (number of boxes and backpacks) are 
involved. However, its performance might quickly escalate as the number 
of elements involved grows. On the other hand, the _Methodic solution_ 
presented above, scale as $O(nWur)$, with $n$ the number of boxes, $W$ 
the capacity of each backpack, $u$ the number of backpacks (assuming, for 
simplicity that all have $\sim W$) and $r$ the number of reshufflings.

In [157]:
import numpy as np

In [576]:
def knapSack(W, wt, sz, n): 
    # function which returns a dictionary consisting of:
    # the total value and items which can be place in the knapsack
    #
    # NB: the labelling of the items entering the second returned value refers
    #     to the positional argument of the provided wt and sz arrays
    #     (e.g. '1,2,3,' means first, second, third argument)
    #
    
    # initialiase an array of empty dictionaries
    K = [[{'v':0, 'i':''} for y in range(W+1)] for z in range(n+1)]

    # Build table K[][] in bottom up manner 
    for i in range(n+1): 
        for w in range(W+1): 
            if i==0 or w==0: 
                K[i][w]['v'] = 0
                K[i][w]['i'] = ''

            elif wt[i-1] <= w:
                C1 = sz[i-1] 
                C2 = K[i-1][w-wt[i-1]]['v']
                C3 = K[i-1][w]['v']
                if C1+C2>=C3:
                    K[i][w]['v'] = C1+C2
                    K[i][w]['i'] = K[i][w]['i'] + '%i,'%i
                    if C2!=0:
                        K[i][w]['i'] = K[i][w]['i'] + K[i-1][w-wt[i-1]]['i']
                else:
                    K[i][w]['v'] = C3
                    K[i][w]['i'] = K[i-1][w]['i']
            else: 
                K[i][w]['v'] = K[i-1][w]['v']
                K[i][w]['i'] = K[i-1][w]['i']
    return K[i][w]

#### Additional material

Below I show that the order of the boxes within each execution of  
the _0-1 Knapsack problem_ algorithm does no influece the outcome

In [651]:
sz = np.random.randint(50, 100, 20)
wt = np.random.randint(1, 20, 20)
n = len(sz) 

In [652]:
ind = np.random.permutation(range(n))
newsz = sz[ind]
newwt = wt[ind]

knapSack(30, wt, sz, n), knapSack(30, newwt, newsz, n)

({'v': 581, 'i': '18,14,11,10,9,6,3,2,'},
 {'v': 581, 'i': '20,18,15,13,4,3,2,1,'})

Above 'v' indicates the maximum value achieved,  
while 'i' are the indices of all allocated boxes

### case 1

In [667]:
print('first knapsack assignment, on ',n,' boxes')

W = [24,30,34]

res = knapSack(W[0], wt, sz, n)
print(res)

V = res['v']

items = res['i'].split(',')
items = [i-1 for i in map(int,filter(bool,items))]
newsz = np.delete(sz, items)
newwt = np.delete(wt, items)
newn = len(newsz)

print('second knapsack assignment, on ',newn,' boxes')

res = knapSack(W[1], newwt, newval, newn)
print(res)

V += res['v']

items = res['i'].split(',')
items = [i-1 for i in map(int,filter(bool,items))]
newsz = np.delete(newsz, items)
newwt = np.delete(newwt, items)
newn = len(newsz)

print('second knapsack assignment, on ',newn,' boxes')

res = knapSack(W[2], newwt, newval, newn)
print(res)

V += res['v']

print('Cumulative value: ',V)

first knapsack assignment, on  20  boxes
{'v': 540, 'i': '18,16,14,11,10,6,3,2,'}
second knapsack assignment, on  12  boxes
{'v': 281, 'i': '12,9,8,'}
second knapsack assignment, on  9  boxes
{'v': 181, 'i': '8,1,'}
Cumulative value:  1002


### case 2

In [668]:
print('first knapsack assignment, on ',n,' boxes')

W = [24,30,34]

res = knapSack(W[1], wt, sz, n)
print(res)

V = res['v']

items = res['i'].split(',')
items = [i-1 for i in map(int,filter(bool,items))]
newsz = np.delete(sz, items)
newwt = np.delete(wt, items)
newn = len(newsz)

print('second knapsack assignment, on ',newn,' boxes')

res = knapSack(W[2], newwt, newval, newn)
print(res)

V += res['v']

items = res['i'].split(',')
items = [i-1 for i in map(int,filter(bool,items))]
newsz = np.delete(newsz, items)
newwt = np.delete(newwt, items)
newn = len(newsz)

print('second knapsack assignment, on ',newn,' boxes')

res = knapSack(W[0], newwt, newval, newn)
print(res)

V += res['v']

print('Cumulative value: ',V)

first knapsack assignment, on  20  boxes
{'v': 581, 'i': '18,14,11,10,9,6,3,2,'}
second knapsack assignment, on  12  boxes
{'v': 363, 'i': '12,9,8,7,'}
second knapsack assignment, on  8  boxes
{'v': 96, 'i': '8,'}
Cumulative value:  1040


### case 3

In [669]:
print('first knapsack assignment, on ',n,' boxes')

W = [24,30,34]

res = knapSack(W[2], wt, sz, n)
print(res)

V = res['v']

items = res['i'].split(',')
items = [i-1 for i in map(int,filter(bool,items))]
newsz = np.delete(sz, items)
newwt = np.delete(wt, items)
newn = len(newsz)

print('second knapsack assignment, on ',newn,' boxes')

res = knapSack(W[1], newwt, newval, newn)
print(res)

V += res['v']

items = res['i'].split(',')
items = [i-1 for i in map(int,filter(bool,items))]
newsz = np.delete(newsz, items)
newwt = np.delete(newwt, items)
newn = len(newsz)

print('second knapsack assignment, on ',newn,' boxes')

res = knapSack(W[0], newwt, newval, newn)
print(res)

V += res['v']

print('Cumulative value: ',V)

first knapsack assignment, on  20  boxes
{'v': 626, 'i': '20,18,14,13,11,10,6,3,2,'}
second knapsack assignment, on  11  boxes
{'v': 236, 'i': '9,8,6,'}
second knapsack assignment, on  8  boxes
{'v': 96, 'i': '8,'}
Cumulative value:  958


In [670]:
# I am not worring about the periodicity of the reshuffling, as this
# becomes progressively negligible as the arrays' size increases

In [661]:
WW = np.random.randint(20, 40, 10)
OO = list(range(len(WW)))
shuffle(OO)
print(OO)
shuffle(OO)
print(OO)
shuffle(OO)
print(OO)
shuffle(OO)
print(OO)

[8, 1, 0, 9, 6, 4, 3, 2, 5, 7]
[0, 5, 4, 8, 2, 6, 9, 3, 7, 1]
[0, 5, 3, 6, 8, 7, 1, 2, 9, 4]
[8, 9, 6, 2, 0, 1, 7, 5, 3, 4]


In [25]:
V = 0
for w,W in enumerate(WW):
    V += knapSack(W, wt, val, n)
    print(V)

90
180
