__The content of this notebook is the same as in the BASIC notebook, but it is shortened for better readability and memorability__

## SORTING ALGORITHMS (nLogn)

## Merge Sort (mid)
Usage: sorting __linked lists__, inversion count in nearly sorted array  
( i < j, but A[i] > A[j] ), external sort (data too big for memory)

In [1]:
# time c. O(nLogn), space c. O(n)
def merge_sort(arr):
    # arr of length 1 - returned as is
    if len(arr) > 1:
        mid = len(arr) // 2
        left =  merge_sort(arr[ :mid ])           # sort the first half
        right = merge_sort(arr[ mid: ])           # sort the first half
        i, j, k = 0, 0, 0
        while i < len(left) and j < len(right):
            if left[i] < right[j]:
                arr[k] = left[i]
                i += 1
            else:
                arr[k] = right[j]
                j += 1
            k += 1
        while i < len(left):
            arr[k] = left[i]
            i += 1
            k += 1
        while j < len(right):
            arr[k] = right[j]
            j += 1
            k += 1
    return arr

In [2]:
# test sorting f(x)
a, b, c, d = [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55], [8, 7, 4, 2, 1], [8, 7, 2, 11], [8, 7]
for myarr in [a, b, c, d]:
    print(myarr, end=' ')
    print('=> ', merge_sort(myarr))

[8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55] =>  [1, 2, 4, 5, 7, 8, 25, 29, 38, 45, 55, 72, 73, 74, 97, 101]
[8, 7, 4, 2, 1] =>  [1, 2, 4, 7, 8]
[8, 7, 2, 11] =>  [2, 7, 8, 11]
[8, 7] =>  [7, 8]


## Quick Sort (pivot)
Pick __pivot__ element (_first, last, random, median_),  
and __partition__ array: all smaller elems before pivot,  
greater elements after pivot
Usage: sorting arrays

In [11]:
# O(nLogn), worst case O(n^2)
# time c. O(logn), qualifies as in-place
def quick_sort(arr, low, high):
    if low < high:  
        pi = partition(arr, low, high)                        # pi = partitioning index, arr[pi] at right place  
        quick_sort(arr, low, pi-1)                            # sort elements before and after partition
        quick_sort(arr, pi+1, high)       
        
def partition(arr, low, high):        
    pivot = arr[high]                                          # pivot
    i = low - 1                                                # index of smaller element      
    for j in range(low , high):        
        if  arr[j] < pivot:                                    # current element smaller than pivot            
            i += 1                                             # increment index of smaller element
            arr[i], arr[j] = arr[j], arr[i]
    # place pivot in the middle
    arr[i+1], arr[high] = arr[high], arr[i+1]
    return i+1

## Heap Sort
Usage: sort nearly sorted array,  
k largest (smallest) elems  
__Complete binary tree__ = every level filled,  
except possibly last, and nodes are in far left  
__Binary Heap__ = cbt where parent >(<) children  
Procedure: build max heap; replace max w/ last elem;  
reduce heap by 1; heapify root; repeat
Array repr:  
root = arr[0];  
for any i-th node arr[i]:    
a) arr[(i-1)/2]	= parent node  
b) arr[(2\*i)+1]	= left child  
c) arr[(2\*i)+2]	= right child

In [1]:
# time O(nLogn), space in-place
def heap_sort(arr):        
    n = len(arr)    
    for i in range(n, -1, -1):                        # build max heap 
        heapify(arr, n, i)    
    for i in range(n-1, 0, -1):                       # One by one extract elements 
        arr[i], arr[0] = arr[0], arr[i]               # swap 
        heapify(arr, i, 0)       

def heapify(arr, n, i):        
    largest = i                                       # find largest among root and children
    l = 2 * i + 1     
    r = 2 * i + 2   
    if (l < n and arr[i] < arr[l]):                   
        largest = l   
    if (r < n and arr[largest] < arr[r]):              
        largest = r    
    if (largest != i):                                # If root is not largest, swap with largest and continue heapifying
        arr[i],arr[largest] = arr[largest],arr[i]           
        heapify(arr, n, largest)                      

In [2]:
# test sorting f(x)
a, b, c, d = [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55], [8, 7, 4, 2, 1], [8, 7, 2, 11], [8, 7]
for arr in [a, b, c, d]:
    print(arr, end=' ')
    heap_sort(arr)                         # in-place sorting    
    print('=> ', arr)

[8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55] =>  [1, 2, 4, 5, 7, 8, 25, 29, 38, 45, 55, 72, 73, 74, 97, 101]
[8, 7, 4, 2, 1] =>  [1, 2, 4, 7, 8]
[8, 7, 2, 11] =>  [2, 7, 8, 11]
[8, 7] =>  [7, 8]


## SEARCH ALGORITHMS

### Linear / Sequential  Search
Worst-case performance	O(n)
Best-case performance	O(1)
Average performance	O(n)
Worst-case space complexity O(1) iterative

Linear search is rarely used practically because other search algorithms such as the binary search algorithm and hash tables have a significantly performance

In [9]:
def linear_search(arr, x): 
  
    for i in range (0, len(arr)): 
        if (arr[i] == x): 
            return i
    return -1

In [None]:
# If we know the list is ordered than, we only have to check until we have found the element or an element greater than it
def ordered_seq_search(arr,ele):
    """
    Sequential search for an Ordered list
    """
    # Start at position 0
    pos = 0
    
    # Target becomes true if ele is in the list
    found = False
    
    # Stop marker
    stopped = False
    
    # go until end of list
    while pos < len(arr) and not found and not stopped:
        
        # If match
        if arr[pos] == ele:
            found = True
            
        else:
            
            # Check if element is greater
            if arr[pos] > ele:
                stopped = True
                
            # Otherwise move on
            else:
                pos  = pos+1
    
    return found

In [10]:
arr = [ 2, 3, 4, 10, 40 ]
x = 10
result = linear_search(arr, x)
if(result == -1): 
    print("Element is not present in array") 
else: 
    print("Element is present at index", result)

Element is present at index 3


## Binary Search
Sorted list => reduces time complexity to __O(Log n)__  
Auxiliary Space: __O(1) iterative__ implementation, __O(Logn) recursion__

In [15]:
# time O(Logn), space (O1)
def binary_search(arr, value):    
    if len(arr) == 0: return None    
    min_idx, max_idx= 0, len(arr)        
    while min_idx < max_idx:
        mid = (min_idx + max_idx) // 2    
        if arr[mid] == value:
            return mid
        elif arr[mid] < value:
            min_idx = mid + 1
        else: max_idx = mid    
    return None

In [16]:
# recursive
# have to keep arr intact and pass array bounds to recursion to get the correct mid point index
# if array bounds are not passed, only the boolean version works (found / not found)
def binary_search_rec(arr, value, start=None, end=None):
        
    length = len(arr)
    
    if start is None:
        start = 0
    if end is None:
        end = len(arr) - 1
    
    if not length or start >= end:
        return None
    
    mid = (start + end) // 2
    if arr[mid] == value:
        return mid
    
    elif arr[mid] > value:
        return binary_search_rec(arr, value, start = start, end = mid)
    
    else:
        return binary_search_rec(arr, value, start = mid + 1, end = length)
    
    return None

In [20]:
array = [1,2,3,4,5,6,7,8,9]
num = 8
print(binary_search(array, num))
print(binary_search_rec(array, num))

7
7


## Hash Table
___

Hash Table with Hash Functions (mapping). __Python's dictionary => Hash Table__

Methods:

* **HashTable()** returns an empty map collection.
* **put(key,val)** add a new key-value pair, if key in the map - replace with the new value.
* **get(key)** return the value for a given key or None otherwise.
* **del** delete key-value pair: del map[key].
* **len()** number of key-value pairs 
* key __in__ map: True if key in map, False otherwise

## Hash function
Two heuristic methods:

__Hashing by division__ (mod method):  
Map a key into one of the slots of table by taking the remainder of key divided by table_size:  
__h(key) = key % table_size__

Fast - single division.  
Avoid certain values of table_size: if table_size = r^p, then h(key) is just the p lowest-order bits of key - better off designing the hash function to depend on all the bits of the key unless we know that all low-order p-bit patterns are equally likely.  
Best results when table size = prime with additional restriction - if r = number of possible character codes on a computer, and if table_size = prime such that r % table_size = 1, then h(key) = key % table_size is sum of the binary representation of the characters in key % table_size.

Example:  
Suppose r = 256 and table_size = 17, in which r % table_size i.e. 256 % 17 = 1.  
Key = 37596, its hash is 37596 % 17 = 12  
But for key = 573, its hash function is also 573 % 12 = 12 - collision

A prime not too close to an exact power of 2 is often good choice for table_size.  

__Hashing by multiplication__:  
Multiply key k by constant real number c, 0 < c < 1, => extract fractional part => multiply this by table_size m and take floor:  
__h(k) = floor (m * frac (k * c))__
or  
h(k) = floor (m * (k * c mod 1))  
floor(x) from math.h yields integer part of real number x, and frac(x) yields fractional part (frac(x) = x – floor(x))

Value of m is not critical, typically choose a power of 2 (m = 2p for some integer p)  

Example:

Suppose k = 123456, p = 14,  
m = 2^14 = 16384, and w = 32.  
Adapting Knuth’s suggestion, c to be fraction of the form s / 2^32.  
Then key * s = 327706022297664 = (76300 * 2^32) + 17612864,  
So r1 = 76300 and r0 = 176122864.  
The 14 most significant bits of r0 yield the value h(key) = 67.

In [7]:
class HashTable:
    
    def __init__(self, size):
        
        # Set up size and keys and values
        self.size = size
        self.keys = [None] * self.size
        self.values = [None] * self.size
        
    def put(self, key, value):
        
        #Note, we'll only use integer keys for ease of use with the Hash Function
        # Get the hash value
        hashvalue = self.hashfunction(key, len(self.keys))

        # If key is empty
        if self.keys[hashvalue] == None:
            self.keys[hashvalue] = key
            self.values[hashvalue] = value
        
        else:
            
            # If key exists, replace with new value
            if self.keys[hashvalue] == key:
                self.values[hashvalue] = value  
            
            # If hashvalue has a different key
            else:
                
                nextkey = self.rehash(hashvalue, len(self.keys))
                
                # Get to next key
                while self.keys[nextkey] != None and self.keys[nextkey] != key:
                    nextkey = self.rehash(nextkey, len(self.keys))
                
                # Set new key, if NONE
                if self.keys[nextkey] == None:
                    self.keys[nextkey]=key
                    self.values[nextkey]=value
                    
                # Otherwise replace old value
                else:
                    self.values[nextkey] = value 

    def hashfunction(self, key, size):
                
        # Remainder Method
        return key%size

    def rehash(self, oldhash, size):
                
        # For finding next possible keys
        return (oldhash+1)%size
    
    
    def get(self, key):
        
        # Get value by key        
        # Set up variables for search
        startkey = self.hashfunction(key, len(self.keys))
        value = None        
        found = False
        stop = False
        position = startkey
        
        # Until we discern that its not empty or found (and haven't stopped yet)
        while self.keys[position] != None and not found and not stop:
            
            if self.keys[position] == key:
                found = True
                value = self.values[position]
                
            else:
                position=self.rehash(position, len(self.keys))
                                
                if position == startkey:                    
                    stop = True
                                        
        return value

    # Special Methods for use with Python list indexing
    # https://stackoverflow.com/questions/43627405/understanding-getitem-method
    def __getitem__(self, key):
        return self.get(key)

    def __setitem__(self, key, value):
        self.put(key, value)

In [8]:
h = HashTable(5)

# Put our first key in
h[0] = 'one'
h[2] = 'two'
h[3] = 'three'
print(h[0])

one


In [9]:
h[1] = 'new_one'
h[1]

'new_one'

In [10]:
print(h[4])

None


In [11]:
'two' in h

True

## Edit Levenshtein Distance
https://stackoverflow.com/questions/2460177/edit-distance-in-python
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python

In [4]:
def levenshtein(s1, s2):        
    if len(s1) > len(s2):
        s1, s2 = s2, s1
    distances = range(len(s1) + 1)
    for idx2, char2 in enumerate(s2):                
        distances_ = [ idx2+1 ]
        for idx1, char1 in enumerate(s1):
            if char1 == char2:
                distances_.append( distances[ idx1] )
            else:
                distances_.append( 1 + min((distances[idx1],\
                         distances[idx1+1], distances_[-1])))
        distances = distances_                
    return distances[-1]

In [5]:
levenshtein('aborigenous', 'sc')

11

## Iterators

Iterators in Python
Iterators are everywhere in Python. They are elegantly implemented within for loops, comprehensions, generators etc. but are hidden in plain sight.

Iterator in Python is simply an object that can be iterated upon. An object which will return data, one element at a time.

Technically speaking, a Python iterator object must implement two special methods, `__iter__()` and `__next__()`, collectively called the iterator protocol.

An object is called iterable if we can get an iterator from it. Most built-in containers in Python like: list, tuple, string etc. are iterables.

The iter() function (which in turn calls the `__iter__()` method) returns an iterator from them

Iterating Through an Iterator  
We use the next() function to manually iterate through all the items of an iterator. When we reach the end and there is no more data to be returned, it will raise the StopIteration Exception. Following is an example

In [13]:
my_list = [4, 7, 0, 3]
my_iter = iter(my_list)                                                             # get an iterator using iter()

# iterate
print(next(my_iter))
print(next(my_iter))

print(my_iter.__next__())                                                           # next(obj) is same as obj.__next__()
print(my_iter.__next__())

next(my_iter)                                                                       # This will raise error, no items left

4
7
0
3


StopIteration: 

Building Custom Iterators
Building an iterator from scratch is easy in Python. We just have to implement the __iter__() and the __next__() methods.

The __iter__() method returns the iterator object itself. If required, some initialization can be performed.

The __next__() method must return the next item in the sequence. On reaching the end, and in subsequent calls, it must raise StopIteration.

Here, we show an example that will give us the next power of 2 in each iteration. Power exponent starts from zero up to a user set number.

In [14]:
class PowTwo:
    """Class to implement an iterator
    of powers of two"""

    def __init__(self, max=0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.max:
            result = 2 ** self.n
            self.n += 1
            return result
        else:
            raise StopIteration


# create an object
numbers = PowTwo(3)

# create an iterable from the object
i = iter(numbers)

# Using next to get to the next iterator element
print(next(i))
print(next(i))
print(next(i))
print(next(i))
print(next(i))

1
2
4
8


StopIteration: 

Infinite iterators
It is not necessary that the item in an iterator object has to be exhausted. There can be infinite iterators (which never ends). We must be careful when handling such iterators.

Here is a simple example to demonstrate infinite iterators.

The built-in function iter() function can be called with two arguments where the first argument must be a callable object (function) and second is the sentinel. The iterator calls this function until the returned value is equal to the sentinel.

In [16]:
print(int())
inf = iter(int,1)
print(next(inf))
print(next(inf))

0
0
0


In [18]:
class InfIter:
    """Infinite iterator to return all
        odd numbers"""

    def __iter__(self):
        self.num = 1
        return self

    def __next__(self):
        num = self.num
        self.num += 2
        return num
    
    
a = iter(InfIter())
for i in range(5):
    print(next(a))

1
3
5
7
9


Be careful to include a terminating condition, when iterating over these types of infinite iterators.

The advantage of using iterators is that they save resources. Like shown above, we could get all the odd numbers without storing the entire number system in memory. We can have infinite items (theoretically) in finite memory.

There's an easier way to create iterators in Python. To learn more visit: Python generators using yield

## Generators
Python generators are a simple way of creating iterators. All the work we mentioned above are automatically handled by generators in Python.

Simply speaking, a __generator is a function that returns an iterator object__ which we can iterate over (one value at a time)

It is fairly simple to create a generator in Python. It is as easy as defining a normal function, but with a yield statement instead of a return statement.

If a function contains at least one yield statement (it may contain other yield or return statements), it becomes a generator function. Both yield and return will return some value from a function.

The difference is that while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls

Differences between Generator function and Normal function
Here is how a generator function differs from a normal function.

Generator function contains one or more yield statements.
When called, it returns an object (iterator) but does not start execution immediately.
Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next().
Once the function yields, the function is paused and the control is transferred to the caller.
Local variables and their states are remembered between successive calls.
Finally, when the function terminates, StopIteration is raised automatically on further calls.

__Memory Efficiency__: A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill, if the number of items in the sequence is very large.

Generator implementation of such sequences is memory friendly and is preferred since it only produces one item at a time

Generators can be implemented in a clear and concise way as compared to their iterator class counterpart. Following is an example to implement a sequence of power of 2 using an iterator class

In [25]:
def PowTwoGen(max=0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1

y = PowTwoGen(3)
for i in range(5):
    print(next(y))

1
2
4


StopIteration: 

Generators are excellent mediums to represent an __infinite stream of data__ which is not stored in memeory

In [None]:
# do not run :)
def all_even():
    n = 0
    while True:
        yield n
        n += 2

Multiple generators can be used to __pipeline a series of operations__. E.g. sum of squares of numbers in the first 10 Fibonacci series. Result: efficient, easy to read, a lot cooler!

In [19]:
def fibonacci_numbers(nums):
    x, y = 0, 1
    for _ in range(nums):
        x, y = y, x+y
        yield x

def square(nums):
    for num in nums:
        yield num**2

print(sum(square(fibonacci_numbers(10))))

4895
