# Chapter 11: Searching

This chapter focuses on static data stored in a sorted order in an array.

**Binary Search**
* O(n) time complexity
* requires a sorted array which takes O(nlogn)

**Library**
bisect
* bisect.bisect_left(a, x) returns the first element in a that is not less than x
* bisect.bisect_right(a, x) returns the first element in a that is greater than x

In [1]:
def binary_search(a,k):
    start, end = 0, len(a) - 1
    while start <= end:
        mid = (end - start) / 2 + start
        if a[mid] == k:
            return mid
        elif a[mid] < k:
            start = mid + 1
        else:
            end = mid - 1
    return -1

assert(binary_search([0,1,2,3,4,5,6], 5) == 5)
assert(binary_search([0,1,2,3,4,5,6], 6) == 6)
assert(binary_search([0,1,2,3,4,5,6], 0) == 0)
assert(binary_search([0,1,2,3,4,5,6], 3) == 3)
assert(binary_search([0,1,2,3,4,5,6], 7) == -1)

assert(binary_search([0,1,2,3,4,5,6,7], 5) == 5)
assert(binary_search([0,1,2,3,4,5,6,7], 6) == 6)
assert(binary_search([0,1,2,3,4,5,6,7], 0) == 0)
assert(binary_search([0,1,2,3,4,5,6,7], 3) == 3)
assert(binary_search([0,1,2,3,4,5,6,7], 8) == -1)

## 11.1 Search a sorted Array for first Occurrence of k
Write a method that takes a sorted array and a key and returns the index of the _first_ occurrence of that key in the array. For example, when applied to the array [-14,-10,2,108,108,243,285,285,285,401] you should get 3 for 108 and 6 for 285

In [2]:
def binary_search_first_occurance(a,k):
    start, end = 0, len(a) - 1
    first_occurance = -1
    while start <= end:
        mid = (end - start) / 2 + start
        if a[mid] == k:
            first_occurance = mid
            end = mid - 1
        elif a[mid] < k:
            start = mid + 1
        else:
            end = mid -1
    return first_occurance

assert(binary_search_first_occurance([-14,-10,2,108,108,243,285,285,285,401], 108) == 3)
assert(binary_search_first_occurance([-14,-10,2,108,108,243,285,285,285,401], 285) == 6)
assert(binary_search_first_occurance([-14,-10,2,108,108,243,285,285,285,401], 500) == -1)

import bisect
def binary_first_occurance_bisect(a,k):
    return bisect.bisect_left(a,k)

assert(binary_first_occurance_bisect([-14,-10,2,108,108,243,285,285,285,401], 108) == 3)
assert(binary_first_occurance_bisect([-14,-10,2,108,108,243,285,285,285,401], 285) == 6)
assert(binary_first_occurance_bisect([-14,-10,2,108,108,243,285,285,285,401], 500) == 10)

## 11.3 Search a cyclically sorted array
A cyclically sorted arry can be made sorted by shifting elements. Design an O(log n) algorithm for finding the position of the smallest element in a cyclically sorted array. Assume all elements are distinct. 

For example given [378,478,550,631,103,203,220,234,279,368] return 4.

In [6]:
def cyclic_sorted_array_min(a):
    start, end, mid = 0, len(a) - 1, (len(a) - 1) / 2
    min_index = end / 2
    
    while start < end:
        mid = (end - start) / 2 + start
        if a[mid] > a[end]:
            start = mid + 1
        else:
            end = mid
    return start

assert(cyclic_sorted_array_min([378,478,550,631,103,203,220,234,279,368]) == 4)       
assert(cyclic_sorted_array_min([103,203,220,234,279,368,378,478,550,631]) == 0)
assert(cyclic_sorted_array_min([103,203,220,234,279,368,378,478,550,100]) == 9)       

## 11.4 Compute the Integer Square Root
Write a program which takes a nonnegative integer and returns the largest integer whose square is less than or equal to the given integer. 

For example for 16 = 4, 300 = 17

In [22]:
# basic idea: create an array from 0 to value / 2
# use binary search to find the bisect left where x^2 < value
import bisect
def int_square_root_bisect(value):
    options = range(1,value/2)
    return options[bisect.bisect_right([comp_square(s) for s in options], value) - 1]
    
def comp_square(x): 
    return x*x

assert(int_square_root_bisect(16) == 4)
assert(int_square_root_bisect(300) == 17)


def int_square_root(value):
    start, end = 1, value / 2 + 1
    while start <= end:
        mid = (end - start) / 2 + start
        if mid ** 2 == value:
            return mid
        elif mid ** 2 > value:
            end = mid - 1
        else:
            start = mid + 1
    if mid**2 > value:
        return mid - 1
    return mid

assert(int_square_root(16) == 4)
assert(int_square_root(300) == 17)
assert(int_square_root(4) == 2)

## 11.8 Find the kth Largest Element
Design an algorithm for computing the kth largest element in an array. Assume entries are distinct

In [5]:
def kth_largest(a,k):
    start, end = 0, len(a) - 1
    pivot = -1
    while pivot != k - 1:
        pivot = start
        this_end = end
        i = 1
        while this_end > pivot:
            if a[i] < a[pivot]:
                a[pivot], a[i] = a[i], a[pivot]
                pivot = i
                i += 1
            else:
                a[i], a[this_end] = a[this_end], a[i]
                this_end -= 1
        if pivot < k - 1:
            start = pivot + 1
        else:
            end = pivot - 1
    return a[pivot]

assert(kth_largest([4,3,6,1,2,7,8,0,5,9], 8) == 7)
assert(kth_largest([4,3,6,1,2,7,8,0,5,9], 1) == 0)
assert(kth_largest([4,3,6,1,2,7,8,0,5,9], 10) == 9)

## Find the Missing IP Address
The storage capacity of hard drives dwarfs that of RAM. This can lead to interesting space-time trade-offs. 

Suppose you were given a file containing roughly one billion IP addresses, each of which is a 32-bit quantity. How would you programmatically find an IP address that is not in the file? Assume you have unlimited drive space but only a few megabytes of RAM at your displsal 