# Searching and Sorting

In [1]:
from pathlib import Path

In [2]:
p = Path('/Users/olli/Desktop/PythonNotebooks')
sub_dir = 'pics'
pics = p/sub_dir
pics

WindowsPath('/Users/olli/Desktop/PythonNotebooks/pics')

* In ```Python``` we can use the ```in``` operator

In [3]:
13 in [1,2,30]

False

### Sequential Search 

* Basic searching technique, sequentially go through the data structure, comparing elements as you go along 
* Searching for the element 50 means in an ordered list you can stop once you find an element that is too big when the item is not present

![Image](md_images/sequential-search.png)

**Unordered List**

Case | Best Case | Worse Case | Average Case 
---- | --------- | ----------- | ---------
Item is present | 1 | n | $\frac{n}{2}$ 
item is not present | n | n | n 

**Ordered List**

Case | Best Case | Worse Case | Average Case 
---- | --------- | ----------- | ---------
Item is present | 1 | n | $\frac{n}{2}$ 
item is not present | 1 | n | $\frac{n}{2}$ 

____
## Sequential Search

In [4]:
def seq_search(arr,ele):
    """
    General Sequential Search. Works on Unordered lists.
    """
    
    # Start at position 0
    pos = 0
    # Target becomes true if ele is in the list
    found = False
    
    # go until end of list
    while pos < len(arr) and not found:
        
        # If match
        if arr[pos] == ele:
            found = True
            
        # Else move one down
        else:
            pos = pos+1
    
    return found

In [5]:
arr = [1,9,2,8,3,4,7,5,6]

In [6]:
print(seq_search(arr,1))

True


In [7]:
print(seq_search(arr,10))

False


## Ordered List

If we know the list is ordered than, we only have to check until we have found the element or an element greater than it.

In [8]:
def ordered_seq_search(arr,ele):
    """
    Sequential search for an Ordered list
    """
    # Start at position 0
    pos = 0
    
    # Target becomes true if ele is in the list
    found = False
    
    # Stop marker
    stopped = False
    
    # go until end of list
    while pos < len(arr) and not found and not stopped:
        # If match
        if arr[pos] == ele:
            found = True
        else:
            # Check if element is greater
            if arr[pos] > ele:
                stopped = True
            # Otherwise move on
            else:
                pos  = pos+1
    
    return found

In [9]:
arr.sort() 

In [10]:
ordered_seq_search(arr,3)

True

In [11]:
ordered_seq_search(arr,1.5)

False

## Binary Search

* Binary search uses Divide and Conquer 
* Divide the problem into smaller pieces in some way, and then reassemble the whole problem to get the result
* Half of the remaining items are eliminated on each comparison 

![Image](md_images/binary-search.png)

**Analysis of operations in binary search**

This is exactly what is meant by logarithmic time complexity. The search space is reduced by half at each iteration

Comparisons | Number of items left 
---- | --------- 
1 | $\frac{n}{2}$
2 | $\frac{n}{4}$
3 | $\frac{n}{8}$
... | $...$
i | $\frac{n}{2^{i}}$

In [12]:
def binary_search(arr,ele):
    
    # First and last index values
    first = 0
    last = len(arr) - 1
    
    found = False
    
    while first <= last and not found:
        
        mid = (first+last)//2
        
        # Match found
        if arr[mid] == ele:
            found = True
        
        # Set new midpoints up or down depending on comparison
        else:
            # Set down
            if ele < arr[mid]:
                last = mid - 1
            # Set up 
            else:
                first = mid + 1
                
    return found

In [13]:
# list must already be sorted!
arr = [1,2,3,4,5,6,7,8,9,10]

In [14]:
binary_search(arr,4)

True

In [15]:
binary_search(arr,2.2)

False

## Recursive Version of Binary Search

In [16]:
def rec_bin_search(arr,ele):
    
    # Base Case!
    if len(arr) == 0:
        return False
    
    # Recursive Case
    else:
        mid = len(arr)//2
        # If match found
        if arr[mid]==ele:
            return True
        else:
            if ele<arr[mid]:
                return rec_bin_search(arr[:mid],ele)
            else:
                return rec_bin_search(arr[mid+1:],ele)

In [17]:
rec_bin_search(arr,3)

True

In [18]:
rec_bin_search(arr,15)

False

# Hash Tables 

* A **hash table** is a collection of items which are stored in such a way as to make it easy to find them later
* Initially the hash table contains no items so every slot is empty

### Hash function 
Mapping between an item and the slot where that item belongs in the hash table is called the **hash function**. In this case we use the remainder function : 

```h(item) = item%m```

where m is the length of the list 

![Image](md_images/hash-function.png)

* We are now ready to occupy 6 out of the 11 slots 
* This is referred to as the **load factor** as is commonly denoted by 

$$\lambda=\frac{\text{number of items}}{\text{table size}}$$

![Image](md_images/load-factor.png)

* What if there are two items that have the same hash function ie ```44%11``` and ```77%11```? 
* This is known as a **collision**
* A perfect hash function has no collisions

### Folding Method 

* Dividing the item into equal-size pieces 
* Pieces are then added together to give the resulting hash value 

**Example folding method** 436-555-4601

Step | instruction | result 
---- | --------- | ----------- 
1 | divide into groups of 2 | (43,65,55,46,01) 
2 | Add the items together | 210 
3 | 210 % 11 | 1

### Mid-Square Method 

* Square the item
* Extract some portion of the resulting digits 

**Example** 44

Step | instruction | result 
---- | --------- | ----------- 
1 | square | 1936
2 | extract middle two digits | 93
3 | 93 % 11 | 1

### Non-integer elements 
* Strings can be thought of as a sequence of ordinal values 

In [19]:
# hash function for word
res = ord('c') + ord('a') + ord('t')
res%11

4

### Collision resolution 

* Looks into the hash table and tries to find another open slot to hold the item that caused the collision 

**open addressing** - process that tries to find the next open slot in the hash table 

* Linear probing - keep moving down until you find an empty slot 
* Variation on Linear Probing - skips slots for more even distribution 
* Quadratic probing - h+1, h+4, h+9, h+16 ... 

![Image](md_images/quadratic-probing.jpeg)
* Chaining - Allows many items to exist at the same location in the hash table 

![Image](md_images/chaining.png)

### Implementation of a Hash Table

In this lecture we will be implementing our own Hash Table to complete our understanding of Hash Tables and Hash Functions! Make sure to review the video lecture before this to fully understand this implementation!

Keep in mind that Python already has a built-in dictionary object that serves as a Hash Table, you would never actually need to implement your own hash table in Python.

___
## Map
The idea of a dictionary used as a hash table to get and retrieve items using **keys** is often referred to as a mapping. In our implementation we will have the following methods:


* **HashTable()** Create a new, empty map. It returns an empty map collection.
* **put(key,val)** Add a new key-value pair to the map. If the key is already in the map then replace the old value with the new value.
* **get(key)** Given a key, return the value stored in the map or None otherwise.
* **del** Delete the key-value pair from the map using a statement of the form del map[key].
* **len()** Return the number of key-value pairs stored 
* **in** the map in Return True for a statement of the form **key in map**, if the given key is in the map, False otherwise.

In [20]:
class HashTable(object):
    
    def __init__(self,size):
        
        # Set up size and slots and data
        self.size = size
        self.slots = [None] * self.size
        self.data = [None] * self.size
        
    def put(self,key,data):
        #Note, we'll only use integer keys for ease of use with the Hash Function
        
        # Get the hash value
        hashvalue = self.hashfunction(key,len(self.slots))

        # If Slot is Empty
        if self.slots[hashvalue] == None:
            self.slots[hashvalue] = key
            self.data[hashvalue] = data
        
        else:
            
            # If key already exists, replace old value
            if self.slots[hashvalue] == key:
                self.data[hashvalue] = data  
            
            # Otherwise, find the next available slot
            else:
                
                nextslot = self.rehash(hashvalue,len(self.slots))
                
                # Get to the next slot - linear probing
                # stop when the next slot has no key or the next slot is equal to the current key  
                while self.slots[nextslot] != None and self.slots[nextslot] != key:
                    nextslot = self.rehash(nextslot,len(self.slots))
                
                # Set new key, if NONE
                if self.slots[nextslot] == None:
                    self.slots[nextslot] = key
                    self.data[nextslot] = data
                    
                # Otherwise replace old value
                else:
                    self.data[nextslot] = data 

    def hashfunction(self,key,size):
        # Remainder Method
        return key%size

    def rehash(self,oldhash,size):
        # For finding next possible positions
        return (oldhash+1)%size
    
    
    def get(self,key):
        
        # Getting items given a key
        
        # Set up variables for our search
        startslot = self.hashfunction(key,len(self.slots))
        data = None
        stop = False
        found = False
        position = startslot
        
        # Until we discern that its not empty or found (and haven't stopped yet)
        while self.slots[position] != None and not found and not stop:
            
            if self.slots[position] == key:
                found = True
                data = self.data[position]
                
            else:
                position = self.rehash(position,len(self.slots))
                # after the rehash if we end up on startslot again stop 
                # and return data
                if position == startslot:
                    stop = True
        return data

    # Special Methods for use with Python indexing
    def __getitem__(self,key):
        return self.get(key)

    def __setitem__(self,key,data):
        self.put(key,data)

Let's see it in action!

In [21]:
h = HashTable(5)

In [22]:
# Put our first key in
h[1] = 'one'

In [23]:
h[2] = 'two'

In [24]:
h[3] = 'three'

In [25]:
h[1]

'one'

In [26]:
h[1] = 'new_one'

In [27]:
h[1]

'new_one'

In [28]:
print(h[3])

three


# Bubble sort 
**items bubbling up whilst doing exchanges**

Check out the resources below for a review of Bubble sort!

* [Wikipedia](https://en.wikipedia.org/wiki/Bubble_sort)
* [Visual Algo](http://visualgo.net)
* [Animation](http://www.cs.armstrong.edu/liang/animation/web/BubbleSort.html)
* [Sorting Algorithms Animation with Pseudocode](http://www.sorting-algorithms.com/bubble-sort)

In [29]:
# TERMINATION CONDITION IMPROVES BEST CASE PERFORMANCE TO O(N)
def bubble_sort(arr):
    # For every element (arranged backwards)
    for n in range(len(arr)-1,0,-1):
        noswaps = True
        # this ensures the inner loop doesn't perform extra comparison operations
        for k in range(n):
            # If we come to a point to switch
            if arr[k]>arr[k+1]:
                noswaps = False
                temp = arr[k]
                arr[k] = arr[k+1]
                arr[k+1] = temp
        if noswaps == True: break

In [30]:
arr = [3,2,13,4,6,5,7,8,1,20]
bubble_sort(arr)

In [31]:
arr

[1, 2, 3, 4, 5, 6, 7, 8, 13, 20]

# Selection Sort

The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the largest item is in the correct place. After the second pass, the next largest is in place. This process continues and requires n−1 passes to sort n items, since the final item must be in place after the (n−1) st pass.

# Resources for Review

Check out the resources below for a review of Selection sort!

* [Wikipedia](https://en.wikipedia.org/wiki/Selection_sort)
* [Visual Algo](http://visualgo.net)
* [Animation](http://cs.armstrong.edu/liang/animation/web/SelectionSort.html)
* [Sorting Algorithms Animation with Pseudocode](http://www.sorting-algorithms.com/selection-sort)

In [32]:
def selection_sort(arr):
    
    # For every slot in array
    # go in reverse
    for fillslot in range(len(arr)-1,0,-1):
        positionOfMax=0
        
        # For every set of 0 to fillslot+1
        for location in range(1,fillslot+1):
            # Set maximum's location
            if arr[location]>arr[positionOfMax]:
                positionOfMax = location

        # perform the swap 
        temp = arr[fillslot]
        arr[fillslot] = arr[positionOfMax]
        arr[positionOfMax] = temp

In [33]:
arr = [3,5,2,7,6,8,12,40,21]
selection_sort(arr)
arr

[2, 3, 5, 6, 7, 8, 12, 21, 40]

# Insertion Sort

Insertion Sort builds the final sorted array (or list) one item at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. 

![Image](md_images/insert-sort.gif)


* Always maintains a sorted sublist in the lower positions of the list  

![Image](md_images/insertion-sort.png)

# Resources for Review

Check out the resources below for a review of Insertion sort!

* [Wikipedia](https://en.wikipedia.org/wiki/Insertion_sort)
* [Visual Algo](http://visualgo.net)
* [Animation](http://cs.armstrong.edu/liang/animation/web/InsertionSort.html)
* [Sorting Algorithms Animation with Pseudocode](http://www.sorting-algorithms.com/insertion-sort)

In [34]:
def insertion_sort(arr):
    
    # For every index in array
    for i in range(1,len(arr)):
        
        # Set current values and position
        currentvalue = arr[i]
        position = i
        
        # Sorted Sublist
        while position > 0 and arr[position-1] > currentvalue:
            # shift items to the right if they are bigger
            arr[position] = arr[position-1]
            position = position - 1
        # you need to write the value in the correct position after the loop          
        arr[position] = currentvalue
        
    return arr

In [35]:
arr =[3,5,4,6,8,1,2,12,41,25]
insertion_sort(arr)

[1, 2, 3, 4, 5, 6, 8, 12, 25, 41]

# Shell Sort

The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far apart elements, it can move some out-of-place elements into position faster than a simple nearest neighbor exchange.

![Image](md_images/shell-sort.png)

* Do insertion sort on sublists 
* Do final pass insertion sort which will have less operations

# Resources for Review

Check out the resources below for a review of Shell sort!

* [Wikipedia](https://en.wikipedia.org/wiki/Shellsort)
* [Visual Algo](http://visualgo.net)
* [Sorting Algorithms Animation with Pseudocode](http://www.sorting-algorithms.com/shell-sort)

The shell sort improves on the insertion sort by breaking the original list into a number of smaller sublists, each of which is sorted using an insertion sort. **The unique way that these sublists are chosen is the key to the shell sort**. Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment i, sometimes called the gap, to create a sublist by choosing all items that are i items apart.

In [36]:
def shell_sort(arr):
    sublistcount = len(arr)//2
    # While we still have sub lists
    while sublistcount > 0:
        for start in range(sublistcount):
            # Use a gap insertion
            gap_insertion_sort(arr, start, sublistcount)
        sublistcount = sublistcount // 2

def gap_insertion_sort(arr,start,gap):
#     print(f'arr: {arr}')
#     print(f'start: {start}')
#     print(f'gap: {gap}')
    for i in range(start + gap, len(arr), gap):
#         print(f'index : {i}, value : {arr[i]}')
        currentvalue = arr[i]
        position = i
        # Using the Gap
        while position >= gap and arr[position-gap] > currentvalue:
            # swap the elements at these indexes
            arr[position] = arr[position-gap]
            position = position - gap
        
        # Set current value
        arr[position] = currentvalue

In [37]:
arr = [45,67,23,45,21,24,7,2,6,4,90]
shell_sort(arr)
arr

[2, 4, 6, 7, 21, 23, 24, 45, 45, 67, 90]

# Merge Sort

Merge sort is a recursive algorithm that continually splits a list in half. If the list is empty or has one item, it is sorted by definition (the base case). If the list has more than one item, we split the list and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental operation, called a merge, is performed. Merging is the process of taking two smaller sorted lists and combining them together into a single, sorted, new list. 

![Image](md_images/merge-sort.gif)

Here is a graphic showing the mergesort algorithm :

![Image](md_images/ms.png)

# Resources for Review

Check out the resources below for a review of Merge sort!

* [Wikipedia](https://en.wikipedia.org/wiki/Merge_sort)
* [Visual Algo](http://visualgo.net)
* [Sorting Algorithms Animation with Pseudocode](http://www.sorting-algorithms.com/merge-sort)

In [38]:
def merge_sort(arr):
    
    if len(arr)>1:
        mid = len(arr)//2
        lefthalf = arr[:mid]
        righthalf = arr[mid:]

        # print(lefthalf,righthalf)
        # recursive call stops when the list has one item         
        merge_sort(lefthalf)
        merge_sort(righthalf)
        
        # sorting part 
        i=0
        j=0
        k=0
        
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                arr[k]=lefthalf[i]
                i=i+1
            else:
                arr[k]=righthalf[j]
                j=j+1
            k=k+1

        # deal with the leftovers
        while i < len(lefthalf):
            arr[k]=lefthalf[i]
            i=i+1
            k+=1

        while j < len(righthalf):
            arr[k]=righthalf[j]
            j=j+1
            k=k+1

In [39]:
arr = [11,2,5,4,7,6,8,1,23]
merge_sort(arr)
arr

[1, 2, 4, 5, 6, 7, 8, 11, 23]

# Quick Sort

Quick sort uses divide and conquer to gain the same advantages as the merge sort while not using additional storage. However, as a trade-off, it is possible that the list may not be divided in half.

A quick sort first selects a value, which is called the pivot value. Although there are many different ways to choose the pivot value, we will simply use the first item in the list. The role of the pivot value is to assist with splitting the list. The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will be used to divide the list for subsequent calls to the quick sort.

1. Pick an element, called a pivot, from the array.
2. Partitioning: reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
3. Recursively apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.

![Image](md_images/quick-sort.png)

# Resources for Review

Check out the resources below for a review of Insertion sort!

* [Wikipedia](https://en.wikipedia.org/wiki/Quicksort)
* [Visual Algo](http://visualgo.net)
* [Sorting Algorithms Animation with Pseudocode](http://www.sorting-algorithms.com/quick-sort)

In [40]:
def quick_sort(arr):
    
    quick_sort_help(arr,0,len(arr)-1)

def quick_sort_help(arr,first,last):
    
    if first<last:
        
        # find the split point
        splitpoint = partition(arr,first,last)
        # split the list 
        quick_sort_help(arr,first,splitpoint-1)
        quick_sort_help(arr,splitpoint+1,last)


def partition(arr,first,last):
    
    pivotvalue = arr[first]

    leftmark = first+1
    rightmark = last

    done = False
    while not done:
        # increment leftmark
        while leftmark <= rightmark and arr[leftmark] <= pivotvalue:
            leftmark += 1
        # increment rightmark
        while arr[rightmark] >= pivotvalue and rightmark >= leftmark:
            rightmark -= 1
        # stop once we find the split point
        if rightmark < leftmark:
            done = True
        # swap left and right mark 
        else:
            temp = arr[leftmark]
            arr[leftmark] = arr[rightmark]
            arr[rightmark] = temp
    
    # do the final swap 
    temp = arr[first]
    arr[first] = arr[rightmark]
    arr[rightmark] = temp
    return rightmark

In [41]:
arr = [2,5,4,6,7,3,1,4,12,11]
quick_sort(arr)
arr

[1, 2, 3, 4, 4, 5, 6, 7, 11, 12]