# XXXIII. APPROXIMATION ALGORITHMS FOR NP-COMPLETE PROBLEMS 

### A Greedy Knapsack Heuristic

So let's talk through a potentially greedy approach to the knapsack problem. Probably the first idea you'd have would be to consider the items in some order. And when you consider an item, you make an irrevocable decision at that point whether to include it in your solution or not.

So the question that is, in what order should you look at the items? Well, what's our objective? Our objective is to maximize the value of our set. So obviously, high value items are important. So maybe the first idea would be to sort the items from highest value to lowest value.

But if you think about that proposal for some time, you quickly realize that this is a little naive. This is not the whole story. If you have a high value item that fills up the whole knapsack, it seems like that's not quite as useful if you had an almost as high value item that basically had size close to zero but didn't use up any of the knapsack at all.

Remember that each item has two properties, not just its value but also its size, and both of these are important. We want to prefer items that have a lot of value, but we also want to prefer items that are small.

So, if you want to take the two parameters of each item, form a single parameter by which we can then sort the jobs, a natural first cut to look at is a ratio.

Since we prefer high values, and we prefer low weights, the sensible thing to look at is the ratio of the value of an item divided by its size. And we'll then going to consider items from the highest values of these ratios to the lowest values of these ratios.

So now that we have our greedy ordering, we just proceed to the items one at a time, and we pack the items into the knapsack in this ordering. Now, what happens here, and didn't actually trouble us in the scheduling problem, is at some point, we might no longer be able to pack items into the knapsack, the thing might get full. So once that happens, once we've reached an item which doesn't fit in the knapsack given the items that we've already packed into it, we just stop the greedy algorithm.

![Greedy Knapsack Algorithm](images/18_greedy_Knapsack_Algorithm.png)


eg.  Let's consider the following three item instance.<br>
$v_1$ = 2, $w_1$ = 1<br>
$v_2$ = 1000, $w_2$ = 1000<br>
$W$ = 1000<br>

Here the greedy solution is : 2<br>
And optimal solution is : 1000

Since the knapsack capacity is 1000, and the sum of the sizes of the job is 1001, there is no room for both of them. The greedy algorithm, unfortunately, because the first tiny item has a smaller ratio, will pack in item number one. And that leaves no room for item number two, so the value of the greedy algorithm solution is just two, whereas the optimal solution is of course to just take the second item. Yeah it's ratio is worse, but on the other hand, it fills up the whole knapsack. And so overall, your value is 1000, which obviously blows away the greedy solution.

There is, however, a simple fix to address this issue. We're going to add a very simple Step 3 to our greedy heuristic. So the new Step 3 is just going to compare two different candidate solutions and return whichever one is better, which everyone has a bigger total value. The first candidate is just what we were doing before, it's just the output of Step 2 of the algorithm. The second candidate is whichever item by itself has the maximum value.

In other words, this new greedy algorithm, it just runs the old one, but then it does a final sanity check. It looks at each item individually, and it says, well, if this item, just by itself, dominates the solution I've computed thus far, I return this single item by itself instead.

![Greedy Knapsack Algorithm](images/19_greedy_Knapsack_Algorithm.png)




In [13]:
def knapsack_problem_greedy_heuristic(value, weights, W): 
    A = [[value[i]/weights[i], i] for i in range(len(value))]
    A = sorted(A) 
    A.reverse()
    knapsack = 0
    knapsack_items = []
    for i in range(len(A)):
        wt = weights[A[i][1]]
        if knapsack + wt <= W:
            knapsack += wt
            knapsack_items.append(value[A[i][1]])
        else:
            break
    knapsack_fix = -1
    for i in range(len(value)):
        if value[i] > knapsack:
            if value[i] > knapsack_fix:
                knapsack_fix = value[i]  
    if knapsack_fix == -1:    
        print("A : {}\n\nMax Value of optimal solution: {}\nKnapsack items: {}".format(A, knapsack, knapsack_items))
    else:
        print("A : {}\n\nMax Value of optimal solution: {}\nKnapsack items: {}".format(A, knapsack_fix, knapsack_fix))
 

#knapsack_problem_greedy_heuristic([1, 2, 5, 6], [2, 3, 4, 5], 8)
knapsack_problem_greedy_heuristic([2, 4, 3], [1, 3, 3], 5)
print()
knapsack_problem_greedy_heuristic([2, 1000], [1, 1000], 1000)




A : [[2.0, 0], [1.3333333333333333, 1], [1.0, 2]]

Max Value of optimal solution: 4
Knapsack items: [2, 4]

A : [[2.0, 0], [1.0, 1]]

Max Value of optimal solution: 1000
Knapsack items: 1000


### A Dynamic Programming Heuristic for Knapsack

![Dynamic Programming Heuristic Knapsack](images/20_DynamicProgrammingHeuristic_Knapsack.png)

So here is the algorithm in full detail. It really only has two steps. Fist, we do the transformation. We round the item values to the small integers, then we invoke the second dynamic programming algorithm on the transformed instance. Precisely, here is how we round each item value $v_i$. To begin, we decrease $v_i$ to the nearest multiple of a parameter $m$.

![Dynamic Programming Heuristic Knapsack](images/21_DynamicProgrammingHeuristic_Knapsack.png)

Let's now wrap things up with the pseudocode of the new dynamic programming algorithm. So A will be our usual table. It has two dimensions, because subproblems are indexed by two parameters, i, the prefix that ranges from 0 to n, and x, the value target that ranges from 0 to, to the maximum imaginable value, let's say n times Vax. So the base case is when i equal 0, that is, you're not allowed to use any items. In this case, we have to fill it in with plus infinity. Well, except if x itself is 0, then the answer is 0. Now, we just populate the table using the recurrence in a double for loop. The structure here is exactly the same as in our first knapsack dynamic programming algorithm. In the first dynamic programming solution that we developed for knapsack, we could return the correct answer in constant time given the filled-in table. That's because of one of the subproblems in that dynamic programming algorithm, literally was the original problem. When i=n and x is equal to the full map sack capacity capital W, the responsibility of that subproblem was to compute the max value solution subject to the capacity capital W using all of the items, that's literally the original problem. By contrast, in this dynamic programming algorithm, none of the subproblems are literally the original problem that we wanted to solve. In this new dynamic programming algorithm, however, none of thesubproblems correspond directly to the original problem that we wanted to solve, none of them tell you the maximum value of a feasible solution. To see why, let's inspect the largest batch of subproblems. When i is equal to n and you can use whatever subset of items that you want. The second index of one of these problems is a target value x. That might be a number, like say 17,231. So after you've run this algorithm and you've filled in the whole table, what do you know in this subproblem? You will have computed the smallest total size of a bunch, bunch of items that has total value at least 17,231. So that's going to be some number, maybe it's 11,298. But, what if your knapsack capacity's only 10,000? Then, this is not a feasible solution, so it doesn't do you any good. Okay, well that's not quite true, it does do you some good. If you know that every single solution that gets value 17,231 or more has size bigger than your knapsack capacity, well then, you know that the optimal solution has to be less than 17,231. There is no feasible way to get a total value that high.

Now, you realize that if you knew this information for every single target value x, and you do once you've filled in the entire table, that's sufficient to figure out the optimal solution, figure out the biggest value of a feasible solution. All you need to do is scan through this final batch of subproblems. You begin with the largest conceivable target value x, and then you get less and less ambitious as you keep finding infeasible solutions. So you scan from high target values to low target values and you're looking for the first that is the largest target value x, such that there exists a subset of items meeting that target value whose total size is at most your knapsack capacity. That is going to be the optimal solution, that first target value x, that can be physically met given your knapsack is at capacity. 

![Dynamic Programming Heuristic Knapsack](images/22_DynamicProgrammingHeuristic_Knapsack_RT.png)




In [38]:
import numpy as np

def knapsack_problem_tabular(v_hat, weights, A, W):
    #print("v_hat {}, weights {}, W {}".format(v_hat, weights, W))
    r, c = A.shape
    for i in range(1, r):
        for x in range(c):
            #print("i: {}, x: {}".format(i,x))
            if A[i-1][x] != None:
                aexcluding = A[i-1][x]
            if v_hat[i-1] > x :
                aincluding = 0 
            else:
                aincluding = A[i-1][x - v_hat[i-1]] + weights[i-1] 
            #print("aexcluding: {}, aincluding: {}".format(aexcluding, aincluding))
            A[i][x] = min(aexcluding, aincluding)
    max_last_row = -1
    for i in range(c):
        if (A[-1][i] > max_last_row) and (A[-1][i] != float('inf')) and (A[-1][i] <= W):
            max_last_row = A[-1][i] 
    return max_last_row
    
 
    
def knapsack_problem_dynamic_heuristic(value, weights, W, epsilon): 
    # value: profit  ,  W: capacity  , epsilon : error
    n = len(value)
    v_max = max(value)
    m = epsilon * v_max / n
    if m != 0 :
        v_hat = [ int(vi//m) for vi in value]
    else:
        v_hat = value
    A = np.full((n+1, (n*v_max)+1), None)  # max weight
    A[0][:] = float('inf')
    A[0][0] = 0
    max_value = knapsack_problem_tabular(v_hat, weights, A, W)
    print("Max Value of optimal solution: {}".format( max_value))
    #reconstruction_algorithm(value, weights, A)
    
 


knapsack_problem_dynamic_heuristic([1, 2, 5, 6], [2, 3, 4, 5], 8, 1)
knapsack_problem_dynamic_heuristic([2, 4, 3], [1, 3, 3], 5, 1)
knapsack_problem_dynamic_heuristic([2, 4, 3], [1, 3, 3], 5, 30)
knapsack_problem_dynamic_heuristic([2, 1000], [1, 1000], 1000, 1)
    

Max Value of optimal solution: 5
Max Value of optimal solution: 3
Max Value of optimal solution: 0
Max Value of optimal solution: 1000


### Challenge Problem

In this problem we will revisit an old friend, the traveling salesman problem (TSP). You will implement a heuristic for the TSP, rather than an exact algorithm, and as a result will be able to handle much larger problem sizes. Here is a data file describing a TSP instance (original source: http://www.math.uwaterloo.ca/tsp/world/bm33708.tsp): nn.txt 

The first line indicates the number of cities. Each city is a point in the plane, and each subsequent line indicates the x- and y-coordinates of a single city.

The distance between two cities is defined as the Euclidean distance --- that is, two cities at locations  $(x,y)$ and $(z,w)$ have distance $\sqrt{(x-z)^2 + (y-w)^2}$  between them.<br><br>

**You should implement the nearest neighbor heuristic:**

1. Start the tour at the first city.
2. Repeatedly visit the closest city that the tour hasn't visited yet. In case of a tie, go to the closest city with the lowest index. For example, if both the third and fifth cities have the same distance from the first city (and are closer than any other city), then the tour should begin by going from the first city to the third city.
3. Once every city has been visited exactly once, return to the first city to complete the tour.

Find the cost of the traveling salesman tour computed by the nearest neighbor heuristic for this instance, rounded down to the nearest integer.

[Hint: when constructing the tour, you might find it simpler to work with squared Euclidean distances (i.e., the formula above but without the square root) than Euclidean distances. But don't forget to report the length of the tour in terms of standard Euclidean distance.]

In [None]:
import urllib3
import networkx as nx


def TSP_nearest_neighbor_heuristic(url):
    # pre-processing input dataset
    challenge_graph = nx.Graph()
    http = urllib3.PoolManager()
    r1 = http.request('GET', url)
    IntegerMatrixStringJoin = r1.data.decode('utf8').split('\r\n')
    IntegerMatrixStringJoin.remove('')
    n = int(IntegerMatrixStringJoin[0])
    IntegerMatrixStringJoin.remove(IntegerMatrixStringJoin[0])
    coords_cities = []
    edges = []
    for i in IntegerMatrixStringJoin:
        node_coord = i.split(' ')
        city = int(node_coord[0])
        x = float(node_coord[1])
        y = float(node_coord[2])
        coords_cities.append([x, y, city])
    for i in range(len(coords_cities)):
        challenge_graph.add_node(coords_cities[i][0], pos=(coords_cities[i][0], coords_cities[i][1]))
    for c1 in range(n-1):
        for c2 in range(c1+1, n):
            euclidean_distance = ((coords_cities[c1][0]-coords_cities[c2][0])**2 + (coords_cities[c1][1]-coords_cities[c2][1])**2)**0.5
            edges.append((coords_cities[c1][2], coords_cities[c2][2], euclidean_distance))
    challenge_graph.add_weighted_edges_from(edges)
    # TSP
    not_visited = [_ for _ in range(1, n+1)]
    x = 1
    tsp_tour = 0
    not_visited.remove(x)
    while len(not_visited) != 0:
        print(len(not_visited))
        neighbour = list(challenge_graph.neighbors(x))
        minimum_neighbour = float('inf')
        minimum_neighbour_x = n
        for nei in neighbour:
            if nei in not_visited:
                wt = challenge_graph.edges[x, nei]['weight']
                if (wt < minimum_neighbour) or ( (wt == minimum_neighbour) and (nei < minimum_neighbour_x) ):                    
                    minimum_neighbour = wt
                    minimum_neighbour_x = nei
        tsp_tour += minimum_neighbour
        x = minimum_neighbour_x
        not_visited.remove(minimum_neighbour_x)
    tsp_tour += challenge_graph.edges[1, x]['weight']
    print("Cost of the traveling salesman tour computed by the nearest neighbor heuristic:".format(tsp_tour))


TSP_nearest_neighbor_heuristic("https://d3c33hcgiwev3.cloudfront.net/_ae5a820392a02042f87e3b437876cf19_nn.txt?Expires=1564531200&Signature=IUPe~3Fk8ERGhOqNqtT7Xo6jUL66lzchM2PAgqwxuTvdNYpRmSZRKpCNAydcAP579dU8WNSkwWISgMllNTGWorZdSf9adNgZFWNfY-wpfnTmw4k8mE0PMPydHFlAu6r8ZM6Pv5i3hj5TnfftrawqpeKRS7dNEL~MZV8Es1c0GiU_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A")



