# Python algorithm
## Chapter 6 Divide, Combine, and Conquer
### Tree-Shaped Problems: All About the Balance
The skyline problem: You are given a sorted sequence of triples $(L,H,R)$, where $L$ is the left $x$-coordinate of a building, $H$ is its height, and $R$ is its right $x$-coordinate. In other words, each triple represents the (rectangular) silhouette of a building, from a given vantage 
point. Your task is to construct a skyline from these individual building silhouettes.
### The Canonical D&C Algorithm
### Searching by Halves
#### Traversing Search Trees with Pruning
When we look at the root, we need to be able to prune one of the subtrees. (If we found the value we wanted in an internal node and the tree didn’t contain duplicates, we wouldn’t continue in either subtree, of course.) The one thing we need is the so-called search tree property: For a subtree rooted at $r$, all the values in the left
subtree are smaller than (or equal to) the value of $r$, while those in the right subtree are greater. In other words, the value at a subtree root bisects the subtree.

In [34]:
class Node:
    lft = None
    rgt = None
    def __init__(self,key,val) :
        self.key = key
        self.val = val
def insert(node,key,val):
    if node is None:
        return Node(key,val)
    if node.key == key:
        node.val = val
    if node.key > key:
        node.lft = insert(node.lft,key,val)
    else:
        node.rgt = insert(node.rgt,key,val)
    return node
def search(node,key):
    if node is None:
        raise KeyError
    if node.key == key:
        return node.val
    if node.key > key:
        return search(node.lft,key)
    else:
        return search(node.rgt,key)
class Tree:
    root = None
    def __setitem__(self,key,val):
        self.root = insert(self.root,key,val)
    def __getitem__(self,key):
        return search(self.root,key)
tree= Tree()

In [35]:
import random
for i in [random.randint(0,100) for j in range(20)]:
    tree[i] = i

In [38]:
tree.root.lft.lft.val

3

#### Selection
The problem is to find the $k$th largest number in an unsorted sequence, in linear time. The most important case is, perhaps, to find the median—the element that would have been in the middle position (that is, `(n+1)//2`), had the 
sequence been sorted. Interestingly, as a side effect of how the algorithm works, it will also allow us to identify which objects are smaller than the object we seek. That means we’ll be able to find the $k$ smallest (and simultaneously, the $n-k$ largest) elements with a running time of $\Theta(n)$, meaning that the value of $k$ doesn’t matter.    
Keep the $k$ smallest objects found so far either at the beginning of the sequence or in a separate sequence. If you kept track of which one of them was largest, checking each large object in the main sequence would be fast 
. If you needed to add an object, though, and you already had $k$, you’d have to remove one. You’d remove the largest, of course, but then you’d have to find out which one was now largest. You could keep them 
sorted, but the running time would be $\Theta(nk)$ anyway.   
One step up from this would be to use a heap, essentially transforming our “partial insertion sort” into a “partial heap sort,” making sure that there are never more than $k$ elements in the heap. This would give you a running time of $\Theta(n \lg k)$, and for a reasonably small $k$, this is almost identical to $\Theta(n)$, and it lets you iterate over the main sequence without jumping around in memory, so in practice it might be the solution of choice.  
We divide the problem in half by performing linear work, but we manage to eliminate one half, taking us closer to binary search. What we need to figure out, in order to design this algorithm, is how to partition the data in linear time so that we end up with all our objects in one half.  
We’ve arrived at a point where what we need is to partition a sequence into two halves, one consisting of small values and one of large values. And we don’t have to guarantee that the halves are equal—only that they’ll be equal on average. A simple way of doing this is to choose one of the values as a so-called pivot and use it to divide the others: All those smaller than the pivot end up in the left half, while those larger end up on the right.

In [54]:
def partition(seq):
    pi = seq[0]
    seq = seq[1:]
    small = [i for i in seq if i <= pi]
    large = [i for i in seq if i > pi]
    return small, pi, large
def select(seq,k): #Find the k smallest, the smallest not included
    small, pi, large = partition(seq)
    m = len(small)
    if m ==k : #found
        return pi
    if m > k:
        return select(small,k)
    else:
        return select(large,k-m-1)

In [55]:
n = [random.randint(0,100) for j in range(10)]
print(select(n,4))
print(sorted(n))

45
[2, 15, 33, 41, 45, 56, 65, 68, 92, 93]


### Sorting by Halves

In [59]:
def quicksort(seq):
    if len(seq) <= 1:
        return seq
    small,pi,large = partition(seq)
    #print(small,pi,large)
    return quicksort(small) + [pi] + quicksort(large)
n = [random.randint(0,100) for j in range(10)]
print(n)
quicksort(n)

[83, 56, 98, 25, 94, 87, 93, 7, 1, 66]


[1, 7, 25, 56, 66, 83, 87, 93, 94, 98]

#### How Fast Can We Sort?
### Three More Examples
#### Closest Pair
The problem: You have a set of points in the plane, and you want to find the two that are closest to each other. The first idea that springs to mind is, perhaps, to use brute force: For each point, check all the others, or, at least, the ones we haven’t looked at yet. This is, by the handshake sum, a quadratic algorithm, of course. Using divide and conquer, we can get that down to loglinear.   
We’ll divide the points into two subsets, recursively find the closest pair in each, and then—in linear time—merge 
the results. By the power of induction/recursion, we have now reduce the problem to this merging operation. The result of the merge must be either (1) the closest pair from the left side, (2) the closest pair on the right side,or (3) a pair consisting of one point from either side. In other words, what we need to do is find the closest pair “straddling” the division line. While doing this, we also have an upper limit to the distance involved (the minimum of the closest pairs from the left and right sides).   
Let’s say, for the moment, that we have sorted all points in the middle region (of width 2$d$) by their $y$-coordinate. We then want to go through them in order, considering other points to see whether we find any points closer than $y$ (the smallest distance found so far). For each point, how many other “neighbors” must we consider?   
On either side of the midline, we know that all points are at least a distance of d apart. Because what we’re looking for is a pair at most a distance apart, straddling the midline, we need to consider only a vertical slice of height $d$ (and width 2$d$) at any one time. 
We have no lower bounds on the distances between left and right, so in the worst case, we may have coinciding points on the middle line. Beyond that, it’s quite easy to show that at most four points with a minimum distance of $d$ can fit inside a $d \times d$ square, which we have on either side. This means that we need to consider at most eight points in total in such a slice, which means our current point at most needs to be compared to its next seven neighbors.
#### Convex Hull
Here’s another geometric problem: Imagine pounding n nails into a board and strapping a rubber band around them; the shape of the rubber band is a so-called convex hull for the points represented by the nails. It’s the smallest  convex region containing the points, that is, a convex polygon with lines between the “outermost” of the points.   
Without going into implementation details, assume that you can check whether a line is an upper tangent for 
either half. (The lower part works in a similar manner.) You can then start with the rightmost point of the left half and the leftmost point of the right half. As long as the line between your points is not an upper tangent for the left part, you move to the next point along the subhull, counterclockwise. Then you do the same for the right half. You may have to do this more than once. Once the top is fixed, you repeat the procedure for the lower tangent. Finally, you remove the line segments that now fall between the tangents, and you’re done.
#### Greatest Slice
Here’s the last example: You have a sequence `A` containing real numbers, and you want to find a slice (or segment) 
`A[i:j]` so that `sum(A[i:j])` is maximized.  
consider all intervals of length $k$ in one iteration, then move to $k+1$, and so on. This would still give us quadratic number of intervals to check, but we could use a trick to make the scan cost linear: We calculate the sum for the first interval as normal, but each time the interval is shifted one position to the right, we simply subtract the element that now falls outside it, and we add the new element    
Divide the sequence in two, find the greatest slice in either half (recursively), and then see whether there’s a greater one straddling the middle (as in the closest point example). In other words, the only thing that requires creative problem solving is finding the greatest slice straddling the middle. We can reduce that even further—that slice will necessarily consist of the greatest slice extending from the middle to the left and the greatest slice extending from the middle to the right. We can find these separately, in linear time, by simply traversing and summing from the middle in either direction.
### Tree Balance and Balancing
+ Node splitting (and merging). Nodes are allowed to have more than two children (and more than one key), and under certain circumstances, a node can become overfull. It is then split into two nodes (potentially making its parent overfull).
+ Node rotations. Here we still use binary trees, but we switch edges. If $x$ is the parent of $y$, we now make $y$ the parent of $x$. For this to work, $x$ must take over one of the children of $y$.
### Exercises
1. Write a Python program that implements the solution to the skyline problem.


In [61]:
l,h,r = [random.randint(0,20) for j in range(10)],[random.randint(0,100) for j in range(10)],[random.randint(0,5) for j in range(10)]
d = []
for i in range(10):
    d.append((l[i],h[i],l[i] + r[i]))
d

[(7, 20, 12),
 (1, 21, 3),
 (8, 11, 13),
 (20, 39, 23),
 (17, 7, 17),
 (14, 67, 19),
 (4, 40, 8),
 (14, 7, 14),
 (14, 12, 15),
 (15, 82, 16)]

In [63]:
def sky(seq):
    sky = [0 for i in range(25)]
    for b in seq:
        for i in range(b[0],b[2]):
            if b[1] > sky[i]:
                sky[i] = b[1]
    return sky
print(sky(d))

[0, 21, 21, 0, 40, 40, 40, 40, 20, 20, 20, 20, 11, 0, 67, 82, 67, 67, 67, 0, 39, 39, 39, 0, 0]


2. Binary search divides the sequence into two approximately equal parts in each recursive step. Consider ternary search, which divides the sequence into three parts. What would its asymptotic complexity be? What can you say about the number of comparisons in binary and ternary search?  
   $B(n) = B(\dfrac{n}{2}) + 1, T(n) = T(\dfrac{n}{3}) + 2, B(n) < \lg n +1 < T(n)$  
3. What is the point of multiway search trees, as opposed to binary search trees?    
   reduce the number of disk access. 
4. How could you extract all keys from a binary search tree in sorted order, in linear time?  
   traverse the tree recursively, printing the left subtree first  
5. How would you delete a node from a binary search tree?  
   Choose the biggest descandent in the left subtree or the smallest in the right one, and replace the original node with this one 
6. Let’s say you insert $n$ random values into an initially empty binary search tree. What would, on average, be the depth of the leftmost (that is, smallest) node?  
   $S = \sum(\dfrac{1}{n}), \Theta(\lg n)$
7. In a min-heap, when moving a large node downward, you always switch places with the smallest child. Why is that important?   
   avoiding breaking node property
8. How (or why) does the heap encoding work?  
   The children of a node $i$ is $2i +1$ and $2i + 2$
9. Why is the operation of building a heap linear?   
    Make a heap of the left subtree and then the right one, and then repair the root. $T(n) = 2(T\dfrac{n}{2}) + \Theta(\lg n), linear$
10. Why wouldn’t you just use a balanced binary search tree instead of a heap?  
   heap is perfectly balanced and linear to build