# The Binary Search

<img src=http://interactivepython.org/runestone/static/pythonds/_images/binsearch.png \>

## loop

In [17]:
def BinarySearch(orderedList,target):
    # initialize
    first=0
    last=len(orderedList)-1
    found=False
    
    while(first<=last and not found):
        midpoint=(first+last)/2
        if(target==orderedList[midpoint]):
            found=True
        elif(target>orderedList[midpoint]):
            first=midpoint+1
        else:
            last=midpoint-1
            
    return (found,midpoint)

In [26]:
a=[17,22]
a[2:1]

[]

In [24]:
BinarySearch(a,55)

(False, 0)

## recursion

In [43]:
def BinarySearch(orderedList,target):
    midpoint=(len(orderedList)-1)/2
    if(orderedList==[]):
        return False
    if(target==orderedList[midpoint]):
        return True
    elif(target>orderedList[midpoint]):
        return BinarySearch(orderedList[midpoint+1:],target)
    else:
        return BinarySearch(orderedList[:midpoint],target)

In [37]:
a=[17,20,26,31,44,54,55,65,77,93]
a[0:1]

[17]

In [47]:
BinarySearch(a,21)

False

# Hashing

A hash table is a collection of items which are stored in such a way as to make it easy to find them later. Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on. Initially, the hash table contains no items so every slot is empty. We can implement a hash table by using a list with each element initialized to the special Python value None. 

<img src=http://interactivepython.org/runestone/static/pythonds/_images/hashtable.png \>

The mapping between an item and the slot where that item belongs in the hash table is called the hash function. The hash function will take any item in the collection and return an integer in the range of slot names, between 0 and m-1. Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. Our first hash function, sometimes referred to as the “remainder method,” simply takes an item and divides it by the table size, returning the remainder as its hash value ($ h(item)=item\%11 $). Table 4 gives all of the hash values for our example items. Note that this remainder method (modulo arithmetic) will typically be present in some form in all hash functions, since the result must be in the range of slot names.

we use two lists to create a HashTable class that implements the Map abstract data type. One list, called slots, will hold the key items and a parallel list, called data, will hold the data values. When we look up a key, the corresponding position in the data list will hold the associated data value. We will treat the key list as a hash table using the ideas presented earlier. Note that the initial size for the hash table has been chosen to be 11. Although this is arbitrary, it is important that the size be a prime number so that the collision resolution algorithm can be as efficient as possible.

The collision resolution technique is linear probing with a "plus 1" rehash function. The put function (see Listing 3) assumes that there will eventually be an empty slot unless the key is already present in the self.slots. It computes the original hash value and if that slot is not empty, iterates the rehash function until an empty slot occurs. If a nonempty slot already contains the key, the old data value is replaced with the new data value. Dealing with the situation where there are no empty slots left is an exercise.

Likewise, the get function (see Listing 4) begins by computing the initial hash value. If the value is not in the initial slot, rehash is used to locate the next possible position. Notice that line 15 guarantees that the search will terminate by checking to make sure that we have not returned to the initial slot. If that happens, we have exhausted all possible slots and the item must not be present.

In [29]:
class HashTable:
    def __init__(self):
        self.size=11
        self.slots=[None]*self.size
        self.data=[None]*self.size
        
    def put(self,key,data):
        hashvalue=self.hashfunction(key,len(self.slots))
        
        if self.slots[hashvalue]==None:
            self.slots[hashvalue]=key
            self.data[hashvalue]=data
        else:
            if self.slots[hashvalue]==key:
                self.data[hashvalue]=data #replace
            else:
                nextslot=self.rehash(hashvalue,len(self.slots))
                while(self.slots[nextslot]!=None and self.slots[nextslot]!=key):
                    nextslot=self.rehash(nextslot,len(self.slots))
                
                if(self.slots[nextslot]==None):
                    self.slots[nextslot]=key
                    self.data[nextslot]=data
                else:
                    self.data[nextslot]=data #replace
                    
    def get(self,key):
        hashvalue=self.hashfunction(key,len(self.slots))
        
        if self.slots[hashvalue]==key:
            return self.data[hashvalue]
        elif self.slots[hashvalue]==None:
            return None
        else:
            nextslot=self.rehash(hashvalue,len(self.slots))
            while(self.slots[nextslot]!=None and self.slots[nextslot]!=key and nextslot!=hashvalue):
                nextslot=self.rehash(nextslot,len(self.slots))
                
            if self.slots[nextslot]==key:
                return self.data[nextslot]
            elif nextslot==hashvalue:
                return None
            else:
                return None
        
    def hashfunction(self,key,size):
        return key%size
        
    def rehash(self,oldhash,size):
        return (oldhash+1)%size
    
    def __setitem__(self,key,data):
        self.put(key,data)
        
    def __getitem__(self,key):
        return self.get(key)

In [30]:
H=HashTable()
H[54]="cat"
H[26]="dog"
H[93]="lion"
H[17]="tiger"
H[77]="bird"
H[31]="cow"
H[44]="goat"
H[55]="pig"
H[20]="chicken"
H.slots

[77, 44, 55, 20, 26, 93, 17, None, None, 31, 54]

In [31]:
H.data

['bird',
 'goat',
 'pig',
 'chicken',
 'dog',
 'lion',
 'tiger',
 None,
 None,
 'cow',
 'cat']

In [21]:
H[20]

'chicken'

In [22]:
H[17]

'tiger'

In [33]:
H[29]='me'
H[30]='you'
print H[99]

None


## Analysis of Hashing

The best case hashing would provide a $O(1)$. However, due to collisions, the number of comparisons is typically not so simple. The most important piece of information we need to analyze the use of a hash table is the load factor, λ
. Conceptually, if λ
 is small, then there is a lower chance of collisions, meaning that items are more likely to be in the slots where they belong. If λ
 is large, meaning that the table is filling up, then there are more and more collisions. This means that collision resolution is more difficult, requiring more comparisons to find an empty slot. With chaining, increased collisions means an increased number of items on each chain.

As before, we will have a result for both a successful and an unsuccessful search. For a successful search using open addressing with linear probing, the average number of comparisons is approximately $\frac{1}{2}(1+\frac{1}{1−λ})$
 and an unsuccessful search gives $\frac{1}{2}(1+(\frac{1}{1−λ})^2)$
 If we are using chaining, the average number of comparisons is $1+\frac{λ}{2}$
 for the successful case, and simply λ
 comparisons if the search is unsuccessful.

# Sorting

## The Bubble Sort

<img src=http://interactivepython.org/runestone/static/pythonds/_images/bubblepass.png \>

In [49]:
def BubbleSort(list):
    size=len(list)
    while(size>=2):
        for i in range(size-1):
            if(list[i]>list[i+1]):
                temp=list[i]
                list[i]=list[i+1]
                list[i+1]=temp
        size-=1
    return list

In [54]:
a=[26,54,17,17,77,31,44,55,20]
BubbleSort(a)

[17, 17, 20, 26, 31, 44, 54, 55, 77]

In [53]:
b=[1]
BubbleSort(b)

[1]

## The Selection Sort

<img src=http://interactivepython.org/runestone/static/pythonds/_images/selectionsortnew.png \>

In [59]:
def SelectionSort(list):
    size=len(list)
    while(size>=2):
        max=list[size-1]
        for i in range(size-1):
            if(list[i]>max):
                max=list[i]
                list[i]=list[size-1]
                list[size-1]=max
        size-=1
    
    return list

In [60]:
a=[26,54,17,17,77,31,44,55,20]
SelectionSort(a)

[17, 17, 20, 26, 31, 44, 54, 55, 77]

In [63]:
b=[21,20]
SelectionSort(b)

[20, 21]

In [64]:
c=[]
SelectionSort(c)

[]

## The Insertion Sort

<img src=http://interactivepython.org/runestone/static/pythonds/_images/insertionsort.png \>

In [65]:
def InsertionSort(list):
    size=1
    while(size < len(list)):
        for i in range(size):
            insert=list[size]
            if(list[i]>insert):
                temp=list[i:size]
                list[i+1:size+1]=temp
                list[i]=insert
        size+=1
    
    return list

In [66]:
a=[26,54,17,93,77,31,44,55,20]
InsertionSort(a)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

## The Shell Sort

<img src=http://interactivepython.org/runestone/static/pythonds/_images/shellsortA.png \>
<img src=http://interactivepython.org/runestone/static/pythonds/_images/shellsortB.png \>
<img src=http://interactivepython.org/runestone/static/pythonds/_images/shellsortC.png \>
<img src=http://interactivepython.org/runestone/static/pythonds/_images/shellsortD.png \>

In [None]:
def ShellSort(list):
    
    return list

## The Merge Sort

<img src=http://interactivepython.org/runestone/static/pythonds/_images/mergesortA.png \>

<img src=http://interactivepython.org/runestone/static/pythonds/_images/mergesortB.png \>

<img src=https://upload.wikimedia.org/wikipedia/commons/c/cc/Merge-sort-example-300px.gif \>

Given alist:
* split into list with single element
* merge sublists back into the whole

In [87]:
def MergeSort(list):
    if(len(list)>1):
        # split until there is only single element in the sublist
        midpoint=len(list)/2
        left=MergeSort(list[:midpoint])
        right=MergeSort(list[midpoint:])
        
        # merge left and right sublist
        i=0
        j=0
        sorted_list=[]
        while(i<len(left) or j<len(right)):
            if(j>=len(right)):
                sorted_list.append(left[i])
                i+=1
            elif(i>=len(left)):
                sorted_list.append(right[j])
                j+=1
            elif(left[i]<right[j]):
                sorted_list.append(left[i])
                i+=1
            else:
                sorted_list.append(right[j])
                j+=1
            
        return sorted_list
    
    #return the full list or list with single element
    return list

In [89]:
a=[26,54,17,93,93,31,44,55,20]
MergeSort(a)

[17, 20, 26, 31, 44, 54, 55, 93, 93]

In [91]:
b=[]
MergeSort(b)

[]

## The Quick Sort

<img src=http://interactivepython.org/runestone/static/pythonds/_images/firstsplit.png \>

<img src=http://interactivepython.org/runestone/static/pythonds/_images/partitionA.png \>

<img src=http://interactivepython.org/runestone/static/pythonds/_images/partitionB.png \>

In [106]:
def QuickSort(list):
    if(len(list)>1):
        #initialize
        pivot=0
        leftmarker=0
        rightmarker=len(list)-1
        #exchange process
        while(leftmarker<=rightmarker):
            if(list[leftmarker]<=list[pivot]):
                leftmarker+=1
            elif(list[rightmarker]>=list[pivot]):
                rightmarker-=1
            else:
                temp=list[leftmarker]
                list[leftmarker]=list[rightmarker]
                list[rightmarker]=temp
                
        #split point found
        temp=list[rightmarker]
        list[rightmarker]=list[pivot]
        list[pivot]=temp
        pivot=rightmarker
        #sublist quicksort
        leftsublist=list[:pivot]
        rightsublist=list[pivot+1:]
        return QuickSort(leftsublist)+[list[pivot]]+QuickSort(rightsublist)
        
    return list

In [107]:
a=[54,26,93,17,77,31,44,55,20]
QuickSort(a)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

In [110]:
b=[54,17]
QuickSort(b)

[17, 54]

# Crack coding interview: Searching and Sorting

**1**. You are given two sorted arrays, A and B, and A has a large enough buffer at the end to hold B. Write a method to merge B into A in sorted order.
pg

Solution: This should be part of Merge sort algorithm.
<img src=https://upload.wikimedia.org/wikipedia/commons/c/cc/Merge-sort-example-300px.gif \>

In [4]:
def merge(L1,L2):
    result=[]
    k=0
    j=0
    while(k<len(L1) and j<len(L2)):
        if(L1[k]<=L2[j]):
            result.append(L1[k])
            k+=1
        else:
            result.append(L2[j])
            j+=1
    if(k==len(L1)):
        result+=L2[j:]
    else:
        result+=L1[k:]
    return result

In [10]:
a=[]
b=[]
merge(a,b)

[]

**2**. Write a method to sort an array of strings so that all the anagrams are next to each other.

Solution: We seek a comparator that sort the string in the list before sorting the list of the string.

In [34]:
def anagrams_comparator(a,b):
    a=list(a)
    b=list(b)
    a.sort()
    b.sort()
    if a>b:
        return 1
    elif a==b:
        return 0
    else:
        return -1

In [30]:
a=["dcba","abc","cba","bcad"]

In [31]:
sorted(a)

['abc', 'bcad', 'cba', 'dcba']

In [35]:
sorted(a,cmp=anagrams_comparator)

['abc', 'cba', 'dcba', 'bcad']

**3**. Given a sorted array of n integers that has been rotated an unknown number of times, give an O(log n) algorithm that finds an element in the array. You may assume that the array was originally sorted in increasing order.
EXAMPLE: Input: find 5 in array (15 16 19 20 25 1 3 4 5 7 10 14) Output: 8 (the index of 5 in the array)

Solution: If the whole list is sorted, then the binary sort is O(log n). The problem becomes how to locate the truncation point in an O(log n) algorithm. We could divide the list by half and compare the middle point element with the first element. If it is less than the first element, we discard the right half, else we discard the left half. We will carry out the division on the index list

In [42]:
def binarySearch(alist, item):
    first = 0
    last = len(alist)-1
    found = False
    
    while first<=last and not found:
        midpoint = (first + last)//2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint-1
            else:
                first = midpoint+1
    
    return found

In [56]:
def rot_index(alist):
    start=0
    end=len(alist)-1
    while start<end:
        mid=(start+end)//2+1
        if(alist[mid]>=alist[start]):
            start=mid
        else:
            end=mid-1
    return end

In [63]:
a=[6,7,8,9,10,11,12,13,14,15,16, 25, 1, 2,3,4,5]
a_sorted=a[rot_index(a)+1:]+a[:rot_index(a)+1]
a_sorted
binarySearch(a_sorted,25)

True

**4**. If you have a 2 GB file with one string per line, which sorting algorithm would you use to sort the file and why?

Solution: Obviously, we could not bring all the file into memory.

How much memory do we have available? Let’s assume we have X MB of memory available.

1. Divide the file into K chunks, where X * K = 2 GB. Bring each chunk into memory and sort the lines as usual using any O(n log n) algorithm. Save the lines back to the file.
2. Now bring the next chunk into memory and sort. 
3. Once we’re done, merge them one by one. The above algorithm is also known as external sort. 

Step 3 is known as N-way merge
The rationale behind using external sort is the size of data.

**5**. Given a sorted array of strings which is interspersed with empty strings, write a method to find the location of a given string.

Example: find “ball” in [“at”, “”, “”, “”, “ball”, “”, “”, “car”, “”, “”, “dad”, “”, “”] will return 4 

Example: find “ballcar” in [“at”, “”, “”, “”, “”, “ball”, “car”, “”, “”, “dad”, “”, “”] will return -1

Solution: First I noticed that the array is sorted, which suggests a binary search. But the mid-point element could be an empty string. If this is the case, we could either move left or move right to find a non-empty string to compare with. If there is no non-empty string in the left/right half, we could simply discard that half.

In [76]:
def emptybinarySearch(alist,item):
    first = 0
    last = len(alist)-1
    found = -1
    
    while first<=last and found==-1:
        midpoint = (first + last)//2
        #move left to find a non-empty element
        while alist[midpoint]=="" and midpoint>=first:
            midpoint-=1
        #if non-empty string not found, discard left sublist
        if(midpoint<first):
            first=(first + last)//2+1
        else:        
            if alist[midpoint] == item:
                found = midpoint
            else:
                if item < alist[midpoint]:
                    last = midpoint-1
                else:
                    first = midpoint+1
    
    return found

In [77]:
alist=["at", "", "", "", "ball", "", "", "car", "", "", "dad", "", ""]
print emptybinarySearch(alist,"ball")
blist=[]
print emptybinarySearch(blist,"ball")
clist=["at", "", "", "", "", "ball", "car", "", "", "dad", "", ""]
print emptybinarySearch(clist,"ballcar")
dlist=["", "", "", "", "", "", "", "", ""]
print emptybinarySearch(dlist,"")

4
-1
-1
-1


**6**. Given a matrix in which each row and each column is sorted, write a method to find an element in it.

Solution: The key word is "sorted", which suggests binary search. If binary search is applied to each row, the algorithm will be O(n log n). But there should be a better way utilizing sorted nature in each column.

Assumptions:

1. Every row is sorted in ascending order from left to right; Every column is sorted in ascending order from up to down.
2. The matrix is N$\times$N.

The algorithm is elimination. Each left move will rule out the elements below the current entry in that column; Each down move will rule out the elements to the left of the current entry in that row.

In [137]:
def matrixSort(matrix,item):
    found=(-1,-1)
    (m,n)=np.shape(matrix)
    i=0
    j=n-1
    
    while(i<m and j>=0):
        if(matrix[i,j]==item):
            found=(i,j)
            return found
        elif(matrix[i,j]<item):
            matrix=np.delete(matrix,i,0)
            i+=1
        else:
            matrix=np.delete(matrix,j,1)
            j-=1
            
    return found

In [139]:
import numpy as np
a=np.array([[1,2]])
matrixSort(a,2)

(0, 1)

**7**. <font color='red'>A circus is designing a tower routine consisting of people standing atop one another’s shoulders. For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him or her. Given the heights and weights of each person in the circus, write a method to compute the largest possible number of people in such a tower.

EXAMPLE: Input (ht, wt): (65, 100) (70, 150) (56, 90) (75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and includes from top to bottom: (56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)</font>

Solution: Since the input are tuples, if we sort ht first, then the wt might be unsorted.