# Python Algorithm 
## Chapter 7 Greed Is Good? Prove It! 
### Staying Safe, Step by Step
+ A set of candidate elements, or pieces, with some value attached
+ A way of checking whether a partial solution is valid, or feasible
### The Knapsack Problem
The knapsack problem: We have a set of items that we want to take with us, each with a certain weight and value; however, our knapsack has a maximum capacity (an upper bound on the total weight), and we want to maximize the total value we get
#### Fractional Knapsack
This is the simplest of the knapsack problems. Here we’re not required to include or exclude entire objects; we might be stuffing our backpack with tofu, whiskey, and gold dust, for example. We needn’t allow arbitrary fractions, though. We could, for example, use a resolution of grams or ounces    
The important thing here is to find the value-to-weight ratio. For example, most people would agree that gold dust has the most value per gram; let’s say the whiskey falls between the two. In that case, to get the most out of our backpack, we’d stuff it full with gold dust—or at least with the gold dust we have. If we run out, we start adding the whiskey. If there’s still room left over when we’re out of whiskey, we top it all off with tofu
#### Integer Knapsack
For now, let’s say we’re still dealing with categories of objects, so we can add an integer amount from each category. Each category then has a fixed weight and value that holds for all objects. For example, all gold bars weigh the same and have the same value; the same holds for bottles of whiskey and packages of tofu.   
The bounded case assumes we have a fixed number of objects in each category, and the unbounded case lets us use as many as we want. Sadly, greed won’t work in either case. In fact, these are both unsolved problems, in the sense that no polynomial algorithms are known to solve them in general
### Huffman’s Algorithm
Let’s say you’re working with some emergency central where people call for help. You’re trying to put together some simple yes/no questions that can be posed in order to help the callers diagnose an acute medical problem and decide on the appropriate course of action. You have a list of the conditions that should be covered, along with a set of diagnostic criteria, severity, and frequency of occurrence. 
you want a weighted balancing: You want the expected number of questions to be as low as possible. You want to minimize the 
expected depth of your traversal from root to leaf.   
The original (and most common) application is compression—representing a text more compactly—through variable-length codes. Each character in your text has a frequency of occurrence, and you want to exploit this information to give the characters encodings of different lengths so as to minimize the expected length of any text. Equivalently, for any character, you want to minimize the expected length of its encoding  
Now, instead of minimizing the number of yes/no questions needed to 
identify some medical affliction, we want to minimize the number of bits needed to identify a character. Both the yes/no answers and the bits uniquely identify paths to leaves in a binary tree  For example, consider the characters a through f. One way of encoding them is given by Figure 7-2. For example, the code for g would be 101. Because all characters are in the leaves, there would be no ambiguity when decoding a text that had been compressed with this scheme. This property, that no valid code is a prefix of another, gives rise to 
the term prefix code  
![](../images/python%20algorithm/7.jpg)
#### The Algorithm
The most obvious greedy strategy would, perhaps, be to add the characters (leaves) one by one, starting with the one with the greatest frequency. But where would we add them? Another way to go is to let a partial solution consist of several tree fragments and then repeatedly combine them. When we combine two trees, we add a new, shared root and give it a weight equal to the sum of its 
children, that is, the previous roots.

In [3]:
from heapq import heapify, heappush, heappop
from itertools import count
def huffman(seq,freq):
    num = count()
    trees = list(zip(freq,num,seq))
    heapify(trees)
    while len(trees) > 1:
        fa,_,a = heappop(trees)
        fb,_,b = heappop(trees) # two smallest, frequency, dumb, char
        n =  next(num)
        heappush(trees,(fa+fb,n,[a,b])) #combine
    return trees[0][-1] # return the huffman tree
def code(tree,prefix=''): #get the huffman code
    if len(tree) == 1:
        yield (tree,prefix)
        return
    for bit, child in zip('01',tree):
        for pair in code(child,prefix+bit):
            yield pair                                  
seq = "abcdefghi"
frq = [4, 5, 6, 9, 11, 12, 15, 16, 20]
tree = huffman(seq,frq)
print(tree)
list(code(tree))

[['i', [['a', 'b'], 'e']], [['f', 'g'], [['c', 'd'], 'h']]]


[('i', '00'),
 ('a', '0100'),
 ('b', '0101'),
 ('e', '011'),
 ('f', '100'),
 ('g', '101'),
 ('c', '1100'),
 ('d', '1101'),
 ('h', '111')]


#### The First Greedy Choice
The greedy choice property means that the greedy choice gives us a new partial solution that is part of an optimal one. The optimal substructure means that the rest of the problem, after we’ve made our choice, can also be solved just like the original—if we can find an optimal solution to the subproblem, we can combine it with our greedy choice to get a solution to the entire problem. In other words, an 
optimal solution is built from optimal subsolutions.   
To show the greedy choice property for Huffman’s algorithm, we can use an exchange argument. This is a general technique used to show that our solution is at least as good as an optimal one —or in this case, that there exists a solution with our greedy choice that is at least this good. The “at least as good” part is proven by taking a hypothetical optimal solution and then gradually changing it into our solution without making it worse.
#### Going the Rest of the Way
#### Optimal Merging
### Minimum Spanning Trees
We’re basically looking for the cheapest way of connecting all the nodes of a weighted graph, given that we can 
use only a subset of its edges to do the job. The cost of a solution is simply the weight sum for the edges we use.
#### The Shortest Edge
A cut is simply a partitioning of the graph nodes into two sets, and in this context we’re interested in the edges that pass between these two node sets. We say that these edges cross the cut. For example, imagine drawing a vertical line in Figure 7-3, right between d and g; this would give a cut that is crossed by five edges. By now I’m sure you’re catching on: We can be certain that it will be safe to include the shortest edge across the cut, in this case (d,j).
![](../images/python%20algorithm/8.jpg)
#### Kruskal’s Algorithm
This algorithm is close to the general greedy approach outlined at the beginning of this chapter: Sort the edges and 
start picking. Because we’re looking for short edges, we sort them by increasing length (or weight). The only wrinkle is how to detect edges that would lead to an invalid solution. The only way to invalidate our solution would be to add a cycle, but how can we check for that? A straightforward solution would be to use traversal; every time we consider an edge (u,v), we traverse our tree from u to see whether there is a path to v. If there is, we discard it. 

In [5]:
G= {# an undirected weighted graph, but each edge is only represented once
    'a':{'b':10,'c':5},
    'b':{'d':3},
    'c':{'e':6},
    'd':{},
    'e':{}
}

In [7]:
def naivefind(C,u):
    while C[u] != u:
        u = C[u] # continue rep chain, until the initial one
    return u 
def naiveUnion(C,u,v):
    u = naivefind(C,u)
    v = naivefind(C,v)
    C[u] = v
def naiveKruskal(G):
    E = [(G[u][v],u,v) for u in G for v in G[u]]
    T = set() # solution set
    C = {u:u for u in G} #representative, initially itself for any node
    for w,u,v in sorted(E): #sorted by weight
        if naivefind(C,u) != naivefind(C,v):
            T.add((u,v)) # from different tree, safe to add
            naiveUnion(C,u,v)   #combine the two
    return T 
naiveKruskal(G)

{('a', 'b'), ('a', 'c'), ('b', 'd'), ('c', 'e')}

In [8]:
def find(C,u):
    if C[u] != u:
        C[u] = find(C,C[u]) # path compression, done recursively
    return C[u]
def union(C,R,u,v):
    u,v = find(C,u),find(C,v)
    if R[u] > R[v]:
        C[v] = u
    else:
        C[u] = v
    if R[u] == R[v]:
        R[v] += 1
def Kruskal(G):
    E = [(G[u][v],u,v) for u in G for v in G[u]]
    T = set() # solution set
    C,R = {u:u for u in G},{u:0 for u in G} #representative, initially itself for any node
    for w,u,v in sorted(E): #sorted by weight
        if find(C,u) != find(C,v):
            T.add((u,v)) # from different tree, safe to add
            union(C,R,u,v)   #combine the two
    return T 
Kruskal(G)

{('a', 'b'), ('a', 'c'), ('b', 'd'), ('c', 'e')}

#### Prim’s Algorithm
The main idea in Prim’s algorithm is to traverse the graph from a starting node, always 
adding the shortest edge connected to the tree. This is safe because the edge will be the shortest one crossing the cut around our partial solution, as explained earlier.

In [None]:
from heapq import heappop, heappush
def prim(G,s):
    P,Q = {},[(0,None,s)]
    while Q:
        _,p,u = heappop(Q)
        if u in P:
            continue
        P[u] = p 
        for v,w in G[u].itmes():
            heappush(Q,(w,u,v))
    return P

### Greed Works. But When?
#### Keeping Up with the Best
resource scheduling: The problem involves selecting a set of compatible intervals. Normally, we think of these intervals as time intervals. Compatibility simply means that none of them should overlap, so this could be used to 
model requests for using a resource, such as a lecture hall, for certain time periods. Another example would be to 
let you be the “resource” and to let the intervals be various activities you’d like to participate in. Either way, our optimization task is to choose as many mutually compatible (nonoverlapping) intervals as possible. For simplicity, we can assume that no start or end points are identical. 
1. Include the interval with the lowest finish time in the solution.
2. Remove all of the remaining intervals that overlap with the one from step 1.
3. Any remaining intervals? Go to step 1
#### No Worse Than Perfect
Instead of having fixed starting and ending times, we now have a duration and a deadline, and you’re free to schedule the intervals—let’s call them tasks—as you want, as long as they don’t overlap. You also have a given starting time, of course.
#### Staying Safe
This is where we started: To make sure a greedy algorithm is correct, we must make sure each greedy step along the 
way is safe. One way of doing this is the two-part approach of showing (1) the greedy choice property, that is, that a greedy choice is compatible with optimality, and (2) optimal substructure, that is, that the remaining subproblem is a smaller instance that must also be solved optimally. The greedy choice property, for example, can be shown using an exchange argument
### Exercises
1. Give an example of a set of denominations that will break the greedy algorithm for giving change.   
   The british system
2. Assume that you have coins whose denominations are powers of some integer $k > 1$. Why can you be certain that the greedy algorithm for making change would work in this case?   
   This is equal to a base $k$ number system
3. If the weights in some selection problem are unique powers of two, a greedy algorithm will generally maximize the weight sum. Why?   
   Choosing the greatest remaining one will always be worth it  
4. In the stable marriage problem, we say that a marriage between two people, say, Jack and Jill, is feasible if there exists a stable pairing where Jack and Jill are married. Show that the Gale-Shapley  algorithm will match each man with his highest-ranking feasible wife.   
   If the some man was not paired with his most feasible wife, there would be contradiction. 
5. Jill is Jack’s best feasible wife. Show that Jack is Jill’s worst feasible husband.     
   Let’s say Jack was married to Alice and Jill to Adam in a stable pairing. Because Jill is Jack’s best feasible wife, he will prefer her to Alice. Because the pairing is stable, Jill must prefer Adam.   
6. Let’s say the various things you want to pack into your knapsack are partly divisible. That is, you can divide them at certain evenly spaced points (such as a candy bar divided into squares). The different items have different spacings between their breaking points. Could a greedy algorithm still work?  
   A greedy algorithm would certainly work if the capacity of your knapsack was divisible by all the various 
increments
7. Show that the codes you get from a Huffman code are free of ambiguity. That is, when decoding a Huffman-coded text, you can always be certain of where the symbol boundaries go and which symbols go where.  
   This follows from the tree structure  
8. In the proof for the greedy choice property of Huffman trees, it was assumed that the frequencies of a and d were different. What happens if they’re not?   
   Then all the frequencies would be equivalent. 
9. Show that a bad merging schedule can give a worse running time, asymptotically, than a good one and that this really depends on the frequencies.   
    balanced: loglinear, unbalanced: quadric
10. Under what circumstances can a (connected) graph have multiple minimum spanning trees?   
    edges with the same weight and both possible of being a part of the solution. 
11. How would you build a maximum spanning tree (that is, one with maximum edge-weight sum)?   
    By negating the edges and find the minimum spanning tree. 
12. Show that the minimum spanning tree problem has optimal substructure.   
    It is impossible to find an edge that would lead to a better global solution that is not the best local solution. 
13. What will Kruskal’s algorithm find if the graph isn’t connected? How could you modify Prim’s algorithm to do the same?   
    A forest. By using a loop
14. What happens if you run Prim’s algorithm on a directed graph?   
    It would not necessarily find the best traversal  
15. For $n$ points in the plane, no algorithm can find a minimum spanning tree (using Euclidean distance) faster than loglinear in the worst case. How come?    
    It would be used to sort real numbers
16. Show that $m$ calls to either union or find would have a running time of $\Theta(m \lg n)$ if you used 
union by rank.    
    The tree can grow at most to a logarithmic height
17. Show that when using a binary heap as priority queue during a traversal, adding nodes once for 
each time they’re encountered won’t affect the asymptotic running time.   
    The heap is logarithmic
18. In selecting the largest nonoverlapping subset of a set of intervals, going left to right, why can’t we 
use a greedy algorithm based on starting times?  
    It could cover all the rest, with nonoverlapping tasks. 
19. What would the running time be of the algorithm finding the largest set of nonoverlapping intervals?  
    It is dominated by the sorting, loglinear. 
20. Implement the greedy solution for the scheduling problem where each task has a cost and a hard deadline and where all tasks take the same amount of time to perform


In [12]:
tasks = {# cost means profit, and each task takes t=5,from present to finish time of 25 (5 slots)
    'a':{'cost':1,'ddl':20},
    'b':{'cost':5,'ddl':10},
    'c':{'cost':10,'ddl':5},
    'd':{'cost':6,'ddl':25},
    'e':{'cost':3,'ddl':30},
    'f':{'cost':15,'ddl':25},
}

In [22]:
def task(T):
    res = [0 for i in range(5)]
    costsort = sorted(T.items(), key= lambda x:x[1]['cost'],reverse= True)
    for t in costsort:
        d = int(t[1]['ddl']/5)
        for i in range(min(d,4),-1,-1):
            if res[i] == 0:
                res[i] = t[0]
                break
    return res
task(tasks)

['e', 'c', 'b', 'd', 'f']