#### 1. Set up a random experiment to test the difference between a sequential search and a binary search on a list of integers.


In [37]:
import timeit

def seqSearch(alist, item):
    for n in alist:
        if n == item:
            return True
    return False

def binSearchR(alist, item):
    if len(alist) == 0:
        return False
    midpoint = len(alist) // 2
    if alist[midpoint] == item:
        return True
    else:
        if alist[midpoint] < item:
            return binSearchR(alist[midpoint+1:], item)
        else:
            return binSearchR(alist[:midpoint], item)

def main():
    sequential = timeit.Timer("seqSearch(alist, item)", "from __main__ import seqSearch, alist, item")
    binary = timeit.Timer("binSearchR(alist, item)", "from __main__ import binSearchR, alist, item")  
    seq_t = sequential.timeit(number=100000)
    bin_t = binary.timeit(number=100000)
    print("The time of 100,000 times sequential serach is %.5fs." % seq_t)
    print("The time of 100,000 times binary serach is %.5fs." % bin_t)
    
if __name__ == '__main__':
    alist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
    item = 13
    main()

The time of 100,000 times sequential serach is 0.03462s.
The time of 100,000 times binary serach is 0.03692s.


#### 2. Use the binary search functions given in the text (recursive and iterative). Generate a random, ordered list of integers and do a benchmark analysis for each one. What are your results? Can you explain them?


In [38]:
import timeit

def binSearchR(alist, item):
    if len(alist) == 0:
        return False
    midpoint = len(alist) // 2
    if alist[midpoint] == item:
        return True
    else:
        if alist[midpoint] < item:
            return binSearchR(alist[midpoint+1:], item)
        else:
            return binSearchR(alist[:midpoint], item)

def binSearchI(alist, item):
    first = 0
    last = len(alist)-1
    found = False
    while first <= last and not found:
        midpoint = (first+last)//2
        if alist[midpoint] == item:
            found = True
        else:
            if alist[midpoint] < item:
                first = midpoint + 1
            else:
                last = midpoint - 1
    return found

def main():
    recursive = timeit.Timer("binSearchR(alist, item)", "from __main__ import binSearchR, alist, item")
    iterative = timeit.Timer("binSearchI(alist, item)", "from __main__ import binSearchI, alist, item")
    rec_t = recursive.timeit(number=100000)
    itr_t = iterative.timeit(number=100000)
    print("The time of 100,000 times recursive binary serach is %.5fs." % rec_t)
    print("The time of 100,000 times iterative binary serach is %.5fs." % itr_t)
    
if __name__ == '__main__':
    alist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
    item = 17
    main()

The time of 100,000 times recursive binary serach is 0.15765s.
The time of 100,000 times iterative binary serach is 0.05687s.


The recursive call uses the slice operator to create the left half of the list that is then passed to the next invocation (similarly for the right half as well). The analysis that we did above assumed that the slice operator takes constant time. However, we know that the slice operator in Python is actually O(k). This means that the binary search using slice will not perform in strict logarithmic time. Luckily this can be remedied by passing the list along with the starting and ending indices.

#### 3. Implement the binary search using recursion without the slice operator. Recall that you will need to pass the list along with the starting and ending index values for the sublist. Generate a random, ordered list of integers and do a benchmark analysis.


In [45]:
def binSearchR2(alist, item, first=0, last=len(alist)-1):
    if len(alist) == 0:
        return False
    midpoint = (first+last) // 2
    if alist[midpoint] == item:
        return True
    else:
        if alist[midpoint] < item:
            first = midpoint + 1
            return binSearchR2(alist, item, first, last)
        else:
            last = midpoint - 1
            return binSearchR2(alist, item, first, last)

def main():
    recursive2 = timeit.Timer("binSearchR2(alist, item)", "from __main__ import binSearchR2, alist, item")
    iterative = timeit.Timer("binSearchI(alist, item)", "from __main__ import binSearchI, alist, item")
    rec_t2 = recursive2.timeit(number=100000)
    itr_t = iterative.timeit(number=100000)
    print("The time of 100,000 times recursive binary serach is %.5fs." % rec_t2)
    print("The time of 100,000 times iterative binary serach is %.5fs." % itr_t)

if __name__ == '__main__':
    alist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
    item = 17
    main()

The time of 100,000 times recursive binary serach is 0.09288s.
The time of 100,000 times iterative binary serach is 0.05853s.


#### 4. Implement the len method (__len__) for the hash table Map ADT implementation.
#### 5. Implement the in method (__contains__) for the hash table Map ADT implementation.
#### 6. How can you delete items from a hash table that uses chaining for collision resolution? How about if open addressing is used? What are the special circumstances that must be handled? Implement the del method for the HashTable class.
+ Delete items from a hash table using chaining: first caculate the position of the item in the hashtable, the slot stores the address of the head of the linked list so next traverse the linked list and compare the item with the keys in the linked list to see if the required item is in the linked list. If the item exists, delete it like deleting a node from a linked list.
+ Delete items from a hash table using open addressing: first calculate the position of the item that located in the hashtable, compare the item with the key in current slot. If two values don't match, compare the rehashed key with the desired item until find the item or meet the starting point again. If the item is found, simply delete the value in the slot.

#### 7. In the hash table map implementation, the hash table size was chosen to be 101. If the table gets full, this needs to be increased. Re-implement the put method so that the table will automatically resize itself when the loading factor reaches a predetermined value (you can decide the value based on your assessment of load versus performance).
Suppose the predetermined value of the loading factor is 75%. 
#### 8. Implement quadratic probing as a rehash technique.


In [94]:
import random

class HashTable:
    def __init__(self):
        self.size = 101
        self.slots = [None] * self.size
        self.data = [None] * self.size
    
    def put(self, key, data):
        # resize the table if the predetermined load factor is reached
        factor = 0.75
        count = sum(1 for n in self.slots if n)
        if factor < (count/self.size):
            print("Resizing the hash table since the threshold of load factor is reached......")
            ac = int((count // factor) + 1 - self.size)
            self.slots = self.slots + [None] * ac
            self.data = self.data + [None] * ac
        
        # start put
        hashvalue = self.hashfunction(key, len(self.slots))
        
        if self.slots[hashvalue] == None:
            self.slots[hashvalue] = key
            self.data[hashvalue] = data
        else:
            if self.slots[hashvalue] == key:
                self.data[hashvalue] = data    # replace
            else:
                rehash_counter = 1
                nextslot = self.rehash(hashvalue, len(self.slots), rehash_counter)
                while self.slots[nextslot] != None and self.slots[nextslot] != key:
                    rehash_counter += 1
                    nextslot = self.rehash(nextslot, len(self.slots), rehash_counter)
                    
                if self.slots[nextslot] == None:
                    self.slots[nextslot] = key
                    self.data[nextslot] = data
                else:
                    self.data[nextslot] = data    # replace
                   
                   
    def hashfunction(self, key, size):
        return key%size
    
    def rehash(self, oldhash, size, rehash_counter):
        return (oldhash + rehash_counter**2) % size
    
    def get(self, key):
        startslot = self.hashfunction(key, len(self.slots))  
        data = None
        stop = False
        found = False
        position = startslot
        rehash_counter = 1
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                found = True
                data = self.data[position]
            else:
                position = self.rehash(position, len(self.slots), rehash_counter)
                rehash_counter += 1
                if position == startslot:
                    stop = True
        return data
    
    def __getitem__(self, key):
        return self.get(key)
    
    def __setitem__(self,key,data):
        self.put(key,data)
        
    def __len__(self):
        return self.size
        
    def __contains__(self, key):
        startslot = self.hashfunction(key, self.size)
        stop = False
        found = False
        position = startslot
        rehash_counter = 1
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                found = True
            else:
                position = self.rehash(position, self.size, rehash_counter)
                rehash_counter += 1
                if position == startslot:
                    stop = True
        return found
    
    def __delitem__(self, key):
        startslot = self.hashfunction(key, self.size)
        stop = False
        found = False
        position = startslot
        rehash_counter = 1
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                self.slots[position] = None
                self.data[position] = None
                found = True
                return
            else:
                position = self.rehash(position, self.size, rehash_counter)
                rehash_counter += 1
                if position == startslot:
                    stop = True
        print("The item is not in the hash table.")
            
    
def main():
    H=HashTable()
    H[54]="cat"
    H[26]="dog"
    H[93]="lion"
    H[17]="tiger"
    H[77]="bird"
    H[31]="cow"
    H[44]="goat"
    H[55]="pig"
    H[20]="chicken"
    print(H.slots)
    print(H.data)
    print(len(H))
    
    item = 2
    if item in H:
        print('%d is in the hashtable' % item)
    else:
        print('%d is not in the hashtable' % item)
    
    print(H[20])
    H[20]='duck'
    print(H[20])
    print(H[99])
    del H[77]
    print(H.slots)
    print(H.data)
    del H[88]
    
    for i in range(80):
        n = random.randint(0, 200)
        H[n] = str(n)
    print(H.slots)
    print(H.data)
    

if __name__ == '__main__':
    main()

[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 17, None, None, 20, None, None, None, None, None, 26, None, None, None, None, 31, None, None, None, None, None, None, None, None, None, None, None, None, 44, None, None, None, None, None, None, None, None, None, 54, 55, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 77, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 93, None, None, None, None, None, None, None]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'tiger', None, None, 'chicken', None, None, None, None, None, 'dog', None, None, None, None, 'cow', None, None, None, None, None, None, None, None, None, None, None, None, 'goat', None, None, None, None, None, None, None, None, None, 'cat', 'pig', None, None, None, None, None, None, None, None, None, None, 

#### 9. Using a random number generator, create a list of 500 integers. Perform a benchmark analysis using some of the sorting algorithms from this chapter. What is the difference in execution speed?
#### 10. Implement the bubble sort using simultaneous assignment.
#### 11. Implement the selection sort using simultaneous assignment.
#### 12. Implement the mergeSort function without using the slice operator.


In [120]:
def bubbleSort(alist):    #O(n2)
    for passnum in range(len(alist)-1, 0, -1):
        for i in range(passnum):
            if alist[i] > alist[i+1]:
                alist[i], alist[i+1] = alist[i+1], alist[i]
                
                
def shortBubbleSort(alist):    #O(n2)
    exchange = True
    passnum = len(alist) - 1
    while passnum > 0 and exchange:
        exchange = False
        for i in range(passnum):
            if alist[i] > alist[i+1]:
                exchange = True
                alist[i], alist[i+1] = alist[i+1], alist[i]
        passnum -= 1


def selectionSort(alist):    #O(n2)
    for passnum in range(len(alist), 0, -1):
        index_max = 0
        for i in range(1, passnum):
            if alist[index_max] < alist[i]:
                index_max = i
        alist[index_max], alist[passnum-1] = alist[passnum-1], alist[index_max]
                

def insertionSort(alist):    #O(n2)
    for i in range(1, len(alist)): # index of the first postion on the right side
        while i > 0 and alist[i] < alist[i-1]:
            alist[i], alist[i-1] = alist[i-1], alist[i]
            i -= 1
            

def shellSort(alist):    # between O(n) and O(n2)
    gap = len(alist) // 2
    while gap > 0:
        for start in range(gap):    # indexes for all the elements in each sublist
            for i in range(start+gap, len(alist), gap):    # get the elements with same sublist index from the second to the last sublists
                curr = alist[i]    # current value
                pos = i
                while pos >= gap and alist[pos-gap] > curr:    # enter the while loop only when swaps are needed
                    alist[pos] = alist[pos-gap]
                    pos = pos - gap
                alist[pos] = curr    # find the proper position for current value
                
        gap = gap // 2    # reduce the gap by 2
    

def mergeSort(alist, first=0, last=len(alist)):
    print("Splitting", first, last)
    if (last-first) > 1:
        mid = (last+first) // 2
        
        mergeSort(alist, first, mid)    #left half
        mergeSort(alist, mid, last)    #right half

        i = first    #index of lefthalf
        j = mid    #index of righthalf
        temp = []    #index of whole list
        
        while i < mid and j < last:
            if alist[i] < alist[j]:
                temp.append(alist[i])
                i += 1
            else:
                temp.append(alist[j])
                j += 1
        while i < mid:
            temp.append(alist[i])
            i += 1
        while j < last:
            temp.append(alist[j])
            j += 1
        alist[first:last] = temp
        
    
    

#def quickSort(alist):

alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print(alist)

Splitting 0 9
Splitting 0 4
Splitting 0 2
Splitting 0 1
Splitting 1 2
Splitting 2 4
Splitting 2 3
Splitting 3 4
Splitting 4 9
Splitting 4 6
Splitting 4 5
Splitting 5 6
Splitting 6 9
Splitting 6 7
Splitting 7 9
Splitting 7 8
Splitting 8 9
[17, 20, 26, 31, 44, 54, 55, 77, 93]


In [117]:
def mergeSort(a_list, start=0, stop=None):
	if stop == None:
		stop = len(a_list)
	if stop - start > 1:
		mid = (start + stop) // 2
	
		mergeSort(a_list,start,mid)
		mergeSort(a_list,mid,stop)

		j = mid
		i = start
		n_list = []
		while i < mid and j < stop:
			if a_list[i] < a_list[j]:
				n_list.append(a_list[i])
				i += 1
			else:
				n_list.append(a_list[j])
				j += 1

		while i < mid :
				n_list.append(a_list[i])
				i += 1
		
		while j < stop :
				n_list.append(a_list[j])
				j += 1
		a_list[start:stop] = n_list

	
a_list = [54,26,93,17,77,31,44,55,20]
print('before mergeSort\n\t',a_list)
mergeSort(a_list)
print('after mergeSort\n\t',a_list)

before mergeSort
	 [54, 26, 93, 17, 77, 31, 44, 55, 20]
after mergeSort
	 [17, 20, 26, 31, 44, 54, 55, 77, 93]


A bubble sort can be modified to “bubble” in both directions. The first pass moves “up” the list, and the second pass moves “down.” This alternating pattern continues until no more passes are necessary. Implement this variation and describe under what circumstances it might be appropriate.
Perform a benchmark analysis for a shell sort, using different increment sets on the same list.
One way to improve the quick sort is to use an insertion sort on lists that have a small length (call it the “partition limit”). Why does this make sense? Re-implement the quick sort and use it to sort a random list of integers. Perform an analysis using different list sizes for the partition limit.
Implement the median-of-three method for selecting a pivot value as a modification to quickSort. Run an experiment to compare the two techniques.