## Backpack packing

### Here is the file format for both problems:
[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

In [2]:
with open('Downloads/algo2knapsack1.txt') as f:
    detes = f.readline()
    items = f.readlines()

In [3]:
detes

'10000 100\n'

In [4]:
len(items)

100

In [5]:
items[0]

'16808 250\n'

In [6]:
detes = detes.strip('\n').split()
W = int(detes[0])  ## weight capacity
n = int(detes[1])  ## num items
print(W, n)

10000 100


In [7]:
items = [(int(item[0]), int(item[1])) for item in [i.strip('\n').split() for i in items]]

In [8]:
items[0]  ## (value, weight)

(16808, 250)

In [9]:
items = [(0, 0)] + items   # add an empty item at start of list to facilitate algorithm

In [10]:
items[:3]

[(0, 0), (16808, 250), (50074, 659)]

In [11]:
subprobs = [[0 for _ in range(n+1)]]  # all items have positive weights so nothing fits in the zero-weight capacity row
for w in range(1,W+1):    ## add a row for every unit of weight up to W
    vals = [0]  ## taking empty item 0 adds no value
    for i in range(1, n+1):
        if items[i][1] > w:   ## item can't fit in backpack, so use solution to subproblem without it
            vals.append(vals[i-1])
        else:
            vals.append(max(vals[i-1], subprobs[w-items[i][1]][i-1] + items[i][0]))
    subprobs.append(vals)

In [12]:
subprobs[-1][-1]  ## hopefully the answer to first Q

2493893

## Now for the Larger Instance of the Problem

In [13]:
with open('Downloads/algo2knapsack_big.txt') as f:
    detes = f.readline().strip('\n').split()
    items = f.readlines()

In [14]:
W = int(detes[0])
n = int(detes[1])
print(W, n)

2000000 2000


In [15]:
items = [(int(item[0]), int(item[1])) for item in [i.strip('\n').split() for i in items]]

In [16]:
items[:3]   ## (value, weight)

[(16808, 241486), (50074, 834558), (8931, 738037)]

### With 2000 items, might as well save time by looking for shortcuts.
E.g., that 3rd item has a small value and fills 37% of the backpack.  Why not sort the items by value/weight ratio and just pay attention the ones with good ratios.

In [19]:
scores = sorted(items, key=lambda x: x[0]/x[1], reverse=True)

In [21]:
top20 = [score[0]/score[1] for score in scores[:20]]
bot20 = [score[0]/score[1] for score in scores[-20:]]
print(top20)
print(bot20)

[8.68350450639279, 7.698208240442573, 6.716861676550729, 6.3084979564032695, 5.561591734786558, 5.354336966394187, 5.1440735694822886, 5.087565023532326, 4.746957311534969, 4.594335727850714, 4.472665082033741, 4.430410856440669, 4.130336058128973, 3.3303930171338845, 3.2793146446219925, 3.2363760217983653, 3.1861943687556766, 2.9983898934852613, 2.9956276535073822, 2.9603495255097245]
[0.0019888224065114264, 0.0019298864597635677, 0.0018233455137522257, 0.0018218889237191184, 0.0016954588566823283, 0.0016186904794021637, 0.0012009284488848522, 0.001156810761715394, 0.0009614083908245889, 0.0008545800560735991, 0.0008045227561673594, 0.0006020044096507491, 0.0005845197825497508, 0.0005530106237366568, 0.0004616291484480975, 0.00027715732478862367, 0.00019684835424415412, 4.255118679819036e-05, 3.46698785526995e-05, 2.2128822200814636e-05]


In [32]:
print(sum([score[1] for score in scores[:62]]))  ## how much do the top 62 weigh?

1980699


In [29]:
scores[60:70]

[(97318, 75235),
 (89326, 69363),
 (97807, 78905),
 (16146, 13212),
 (66901, 55417),
 (98766, 82942),
 (49539, 42205),
 (38206, 32663),
 (84510, 72666),
 (83455, 74134)]

There's no way the algorithm is going to need to look much past scores[70] to find a couple of items to substitute for one of scores[:62] in order to fill that empty ~20000 units of capacity.  Score[64] + score[67], e.g., sum up to ~(105K, 88K) and could substitute in for score[61] with its ~(89K, 69K).  So run it on scores[:100] and check empty capacity.

In [33]:
subprobs = [[0 for _ in range(101)]]  # all items have positive weights so nothing fits in the zero-weight capacity row
for w in range(1,W+1):    ## add a row for every unit of weight up to W
    vals = [0]  ## taking empty item 0 adds no value
    for i in range(1, 101):    ## Checking the first 100 sorted items
        if scores[i][1] > w:   ## item can't fit in backpack, so use solution to subproblem without it
            vals.append(vals[i-1])
        else:
            vals.append(max(vals[i-1], subprobs[w-scores[i][1]][i-1] + scores[i][0]))
    subprobs.append(vals)

In [38]:
subprobs[-1][-1]  ## see what the largest subsolution is.  Can't be right, based on the lower bound, below.

4171056

In [37]:
print(sum([score[0] for score in scores[:62]]), sum([score[1] for score in scores[:62]])) 
## Has to be at least 16K higher value than this, based on the eyeballed substitution mentioned above

4225353 1980699


Forgot to add in the (0, 0) empty item at start of scores. Thus, the algo was skipping the best item available.

In [40]:
scores = [(0,0)] + scores

In [41]:
subprobs = [[0 for _ in range(101)]]  # all items have positive weights so nothing fits in the zero-weight capacity row
for w in range(1,W+1):    ## add a row for every unit of weight up to W
    vals = [0]  ## taking empty item 0 adds no value
    for i in range(1, 101):    ## Checking the first 100 sorted items
        if scores[i][1] > w:   ## item can't fit in backpack, so use solution to subproblem without it
            vals.append(vals[i-1])
        else:
            vals.append(max(vals[i-1], subprobs[w-scores[i][1]][i-1] + scores[i][0]))
    subprobs.append(vals)

In [42]:
subprobs[-1][-1]  ## this is correct answer

4243395

In [43]:
solution = 4243395
weight = W+1
while True:
    weight -= 1
    if subprobs[weight][-1] != solution:
        print(weight)
        break

1999782


In [44]:
subprobs[1999782][-1]

4241988

So when capacity gets up to 1,999,782 units, the final item can be added, which means that's the total weight of all items packed optimally.

### That's all correct, but takes 4 minutes to run, thanks to 2M rows, 1 for each unit of weight.  And it involves too much art.  Work on finding a more solid, faster solution.