# INTERVIEW PREP
* __writing code on a whiteboard__ (see _Recommendations Combined.docx_)
* details of __O(nlogn) sorts__ (at least one, maybe quick sort and merge sort - merge sort can be highly useful in situations where quick sort is impractical: which ones?);
* __binary search__ (write and debug); other search algos?
* Optional, from another [article](http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html): know at least one flavor of __balanced binary tree__: red/black, splay or AVL tree (how it's implemented)
* __hashtables__: the single most important data structure; _know_ how they work; _implement one using only arrays_ during the interview (45 min);
* __edit distance__ and other __DP__ algos;
* __BST traversals__ (various, left-to-right, etc.);
* __BFS & DFS__, and know the difference between __inorder, postorder and preorder__. [More here](http://www.geeksforgeeks.org/fundamentals-of-algorithms/);
* __Graphs__ - important at Google; __3 representations__ in memory (objects and pointers, matrix, and adjacency list); know each representation, pros & cons. Implement basic __graph traversal algorithms__: BFS & DFS (computational complexity, tradeoffs).

Optional, from another [article](http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html):
* Study fancier graph algorithms: __Dijkstra and A*__ - they're really great. ADVICE: if you get __any problem - first think graphs__: fundamental and flexible way of representing relationships, 50-50 chance that any interesting design problem has a graph involved!
* Learn data structures and algos, should especially know the most famous classes of __NP-complete problems (traveling salesman and knapsack)__ => recognize them when __in disguise__; and what NP-complete means
* __Descrete Math__ (counting problems, probability problems, and other Discrete Math 101 situations)
* __OS__: threads and concurrency; locks, mutexes, semaphores, monitors - how they work; deadlock, livelock - how to avoid. What resources a process needs, a thread needs, how context switching works, how it's initiated by OS and underlying hardware. Know a little about scheduling.

Data Structures and Algorithms:
* [Karya Umum Dsa](C:/Users/anedilko/Documents/04_Personal/Interviews/coding);
* [Geeks for Geeks: 500 Data Structures and Algorithms](http://www.geeksforgeeks.org/data-structures/);
* [Top 10 Algos in Interview Questions](http://www.geeksforgeeks.org/top-10-algorithms-in-interview-questions/);
* [Big-O Cheat Sheet](http://bigocheatsheet.com/) and [Big-O notation](https://www.interviewcake.com/article/java/big-o-notation-time-and-space-complexity).
* [Competitive Programmer's Handbook](https://cses.fi/book.html?utm_source=hackernewsletter&utm_medium=email&utm_term=books)

Websites Recommended by Google Software Engineers for practice:
* Top Coder: https://www.topcoder.com/
* HackerRank: https://www.hackerrank.com/
* Project Euler: https://projecteuler.net/
* LeetCode: LeetCode.com
* Interview Cake: https://www.interviewcake.com/
* CodeFights https://codefights.com/

__Second company__:  
Critical  
* Arrays and lists 
* Stacks and queues
* Binary trees, Graphs, Graph traversals: BFS, DFS
* Search: __iterator__, binary, hash
* Sort: merge, quick, __bucket__
* Hash tables
* Recursion
* Complexity, Big O notation 

Nice to Have
* __Trie__
* __Red-black trees__ (theory)
* __Randomized quicksort__ (theory)
* __Spanning tree, minimum cut__ (theory)
* Heap - part of __heap sort__ (heapify)
* __Radix sort__ (theory) 
* Dynamic programming (However: there is no need to learn or know how to do dynamic programming or memorization)
* Set

Another website for pratice: https://leetcode.com/problemset/all/

__THEORY__
* __Red–black__ tree is a kind of __self-balancing BST__. Each node stores an extra bit representing color, used to ensure that the tree remains approximately balanced during insertions and deletions. Rules:


1. Every node is either red or black.  
2. Every leaf (NULL) is black.  
3. If a node is red, then both its children are black.  
4. Every simple path from a node to a descendant leaf contains the same number of black nodes  

![image.png](attachment:image.png)
Lemma  
A red-black tree with n internal nodes has __height at most 2log(n+1)__ - RBTs are good for search: can always be __searched in O(log n) time__ (insertions and deletions too while usually it's O(h) or even O(n) for imbalanced tree)


* __Randomized quicksort__: random number to pick next __random pivot__ (or shuffle the array). `randpivot = random.randrange(start, stop)`


* __Minimum cut__ - min subset of edges that, when removed, __disconnects__ [undirected, weighted] graph. Application: in communications, nodes of the cut are critical for network integrity


* __Spanning tree__ - subgraph (tree) of undirected graph that includes __ALL VERTICES & MIN # EDGES__ (can be several, but min. s. t. = min. sum of weights)

## SORTING ALGORITHMS (nLogn)

In [9]:
from __future__ import print_function

## Merge Sort (mid)
https://www.geeksforgeeks.org/merge-sort/  


Find __mid point__, merge sort _left half_, merge sort _right half_, __merge the two sorted halves__  


* Divide and Conquer algorithm
* __O(nLogn)__ - time complexity for all three cases (worst, average, best)
* Stable, but not in place - O(n) space complexity


Usage:  
a) Sorting __linked lists__  
b) Inversion Count Problem in a nearly sorted array ( i < j, but A[i] > A[j] )  
c) Used in External Sorting (data too big to fit into memory, resides in slower external memory (hard disk)) 

In [1]:
def merge_sort(arr):
        
    # arrays of length 1 will be returned as is
    if len(arr) > 1:
                
        # find mid point, sort each half
        mid = len(arr) // 2
        left =  merge_sort(arr[ :mid ])           # sort the first half
        right = merge_sort(arr[ mid: ])           # sort the first half
        
        # copy data from temp arrays 'left' and 'right'
        i, j, k = 0, 0, 0
        while i < len(left) and j < len(right):
            if left[i] < right[j]:
                arr[k] = left[i]
                i += 1
            else:
                arr[k] = right[j]
                j += 1
            k += 1
        
        # check if anything is left
        while i < len(left):
            arr[k] = left[i]
            i += 1
            k += 1

        while j < len(right):
            arr[k] = right[j]
            j += 1
            k += 1

    return arr

In [2]:
# test sorting f(x)
a, b, c, d = [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55], [8, 7, 4, 2, 1], [8, 7, 2, 11], [8, 7]
for myarr in [a, b, c, d]:
    print(myarr, end=' ')
    print('=> ', merge_sort(myarr))

[8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55] =>  [1, 2, 4, 5, 7, 8, 25, 29, 38, 45, 55, 72, 73, 74, 97, 101]
[8, 7, 4, 2, 1] =>  [1, 2, 4, 7, 8]
[8, 7, 2, 11] =>  [2, 7, 8, 11]
[8, 7] =>  [7, 8]


## Quick Sort (pivot)
https://www.geeksforgeeks.org/quick-sort/  


Picks __pivot__ element (_first, last, random, median_) and __partition__ array: put all smaller elements before pivot, greater elements after pivot (linear time), and the pivot in between


* Divide and Conquer algorithm
* __O(nLogn)__ - time complexity for average and best cases, __O(n^2)__ worst case
* __O(log(n))__ extra space, but qualifies as in-place (no extra space)
* Not stable, but can be made stable

Usage:  

Although the worst case time complexity is more than many other sorting algorithms, __QuickSort is faster in practice__ because its __inner loop__ can be _efficiently_ implemented.  
__Quick sort__ is preferred over merge sort for __sorting arrays__ (because __in-place__ sort while merge sort = O(N) space (N = array size) - memory allocation increases run time  
__Merge sort__ is preferred over quick sort for __linked lists__ (different memory alloc.: unlike arrays, a) _linked list nodes may not be adjacent in memory_, and b) _insert is O(1) extra space and O(1) time_ in linked lists => __merge__ operation  - __without extra space__)

__3 Way QuickSort__, an array arr[l..r] is divided in 3 parts:  
a) arr[l..i] elements less than pivot  
b) arr[i+1..j-1] elements equal to pivot  
c) arr[j..r] elements greater than pivot

In [11]:
# USE THIS!!
# Take last element as pivot, place it in its correct position:
#  elements smaller than pivot - to the left
# elements greater than pivot - to the right 
def quick_sort(arr, low, high):
    '''    
        low   -> Starting index,
        high  -> Ending index
    '''
    if low < high:  
        
        pi = partition(arr, low, high)                        # pi = partitioning index, arr[pi] at right place  
        quick_sort(arr, low, pi-1)                            # sort elements before and after partition
        quick_sort(arr, pi+1, high)
        
        
        
        
def partition(arr, low, high):
        
    pivot = arr[high]                                          # pivot
    i = low - 1                                                # index of smaller element 
     
    for j in range(low , high):  
        
        if  arr[j] < pivot:                                    # current element smaller than pivot            
            i += 1                                             # increment index of smaller element
            arr[i], arr[j] = arr[j], arr[i]
  
    arr[i+1], arr[high] = arr[high], arr[i+1]                  # all elems < pivot are not in arr[low:i], this line
                                                               # places pivot in the middle (it was arr[high] before)
            
    return i+1

#### Comment for the second solution  
* First call - consider all the elements (arr[0 : n-1].
* If pivot's index = k, repeat the process for elements [0 : k-1] and [k+1 : n-1].
* While sorting the elements from k+1 to n-1, the current pivot would end up in some position p. We'd then sort the elements from k+1 to p-1 and p+1 to n-1, and so on.
* Pivot as first or last element => bad performance on sorted lists, but still most operationally efficient (has to be done repeatedly; choosing a random pivot - expensive). One can also pick an element from the middle

In [4]:
# second in-place algo
def quick_sort2(array, start, end):
        
    if start >= end:
        return

    p = partition2(array, start, end)
    quick_sort2(array, start, p-1)
    quick_sort2(array, p+1, end)
    
    
def partition2(array, start, end):
        
    pivot = array[start]
    low = start + 1
    high = end

    while True:
        # if current value > pivot => OK, move left,keep track of low pointer (to know when all moved)
        while low <= high and array[high] >= pivot:
            high = high - 1

        # Opposite to the above
        while low <= high and array[low] <= pivot:
            low = low + 1

        # either found an out-of-order value for high and low OR low > high => exit loop
        if low <= high:
            array[low], array[high] = array[high], array[low]    # loop continues            
        else:            
            break                                                # exit loop

    array[start], array[high] = array[high], array[start]

    return high



In [6]:
# is this really a 3-way quick sort? + how can I get rid of random?
# https://stackoverflow.com/questions/36972714/implementing-3-way-quicksort
import random
def quick_sort_3way(a, l, r):
    if l >= r:
        return
    k = random.randint(l, r)
    a[l], a[k] = a[k], a[l]
    m1, m2 = partition3(a, l, r)
    quick_sort_3way(a, l, m1 - 1)
    quick_sort_3way(a, m2 + 1, r)

def partition3(a, l, r):
    x, j, t = a[l], l, r
    i = j

    while i <= t :
        if a[i] < x:
            a[j], a[i] = a[i], a[j]
            j += 1

        elif a[i] > x:
            a[t], a[i] = a[i], a[t]
            t -= 1
            i -= 1 # remain in the same i in this case
        i += 1   
    return j, t

In [7]:
# short version of quicksort - not in place, requires storage
def quick_sort3(_list):
    
    if len(_list) <= 1:
        return _list
    
    pivot = _list[len(_list) // 2]                       # it can also be first or last element of the list
    left = [x for x in _list if x < pivot]
    middle = [x for x in _list if x == pivot]
    right = [x for x in _list if x > pivot]
        
    return quick_sort3(left) + middle + quick_sort3(right)

In [7]:
# test sorting f(x)
a, b, c, d = [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55], [8, 7, 4, 2, 1], [8, 7, 2, 11], [8, 7]
for arr in [a, b, c, d]:
    print(arr, end=' ')
    quick_sort(arr, 0, len(arr) - 1)      # in-place sorting    
    print('=> ', arr)
    #print('=> ', quick_sort3(arr))       # not in-place

Partition: pivot =  55
partition: i =  -1
Before last swap: [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55]
After last swap: [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 55, 97, 73, 74, 72, 101]

Partition: pivot =  5
partition: i =  -1
Before last swap: [4, 2, 1, 7, 8, 25, 29, 38, 45, 5, 55, 97, 73, 74, 72, 101]
After last swap: [4, 2, 1, 5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]

Partition: pivot =  1
partition: i =  -1
Before last swap: [4, 2, 1, 5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]
After last swap: [1, 2, 4, 5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]

Partition: pivot =  4
partition: i =  0
Before last swap: [1, 2, 4, 5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]
After last swap: [1, 2, 4, 5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]

Recursive call for [1, 2]
Recursive call for [5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]
Recursive call for []
Recursive call for [2, 4, 5, 8, 25, 29, 38, 45, 7, 55, 97, 73, 74, 72, 101]
Partition: pivot =

## Heap Sort

* __O(nLogn)__ - time complexity; rearrange heap for the next largest (or smallest) element OR heapify() = O(Logn), repeating this over n elements OR createAndBuildHeap() = O(n)
* in-place
* not stable, but can be made stable

Usage:
1. Sort a nearly sorted (or __K sorted) array__
2. k largest (or smallest) elements in an array  
Limited usage because Quicksort and Mergesort are better in practice, but the Heap data structure is widely used

__Complete binary tree__ = binary tree in which _every level is completely filled_, except possibly the last, and all nodes are _as far left_ as possible.  
__Binary Heap__ = Complete Binary Tree where items are in a special order: _value in parent node is greater_ (or smaller) than values in its two children nodes OR _max heap_ (min heap); can be __represented as binary tree or array__. Array - space efficient. If node's index = i, node's parent = (i-1)/2, left child = 2 * i + 1, right child = 2 * i + 2, (if 0-based indexing).

Heap Sort for sorting in increasing order (comparison based):
1. Build max heap from input => max item at the root of heap;
2. Replace max with last item of heap and reduce heap size by 1; heapify root;
3. Repeat above while size of heap > 1.

heapify(node) possible _only if node's children nodes are heapified_ - __heapify() is done bottom up__

Array Representation
* root = arr[0];
* for any i-th node arr[i]:  
a) arr[(i-1)/2]	= parent node  
b) arr[(2*i)+1]	= left child  
c) arr[(2*i)+2]	= right child

__Traversal method__ to achieve array representation - __Level Order__
![image.png](attachment:image.png)

8, 7, 4, 2, 1

In [1]:
def heap_sort(arr):
        
    n = len(arr)        
    
    for i in range(n, -1, -1):                        # build max heap 
        heapify(arr, n, i)  
    
    for i in range(n-1, 0, -1):                       # One by one extract elements 
        arr[i], arr[0] = arr[0], arr[i]               # swap 
        heapify(arr, i, 0)                            # heapify root
               
        

def heapify(arr, n, i):
        
    largest = i                                       # find largest among root and children
    l = 2 * i + 1     
    r = 2 * i + 2       
    
    if (l < n and arr[i] < arr[l]):                   
        largest = l  
    
    if (r < n and arr[largest] < arr[r]):              
        largest = r  
    
    if (largest != i):                                # If root is not largest, swap with largest and continue heapifying
        arr[i],arr[largest] = arr[largest],arr[i]           
        heapify(arr, n, largest)                      

In [2]:
# test sorting f(x)
a, b, c, d = [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55], [8, 7, 4, 2, 1], [8, 7, 2, 11], [8, 7]
for arr in [a, b, c, d]:
    print(arr, end=' ')
    heap_sort(arr)                         # in-place sorting    
    print('=> ', arr)

[8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55] =>  [1, 2, 4, 5, 7, 8, 25, 29, 38, 45, 55, 72, 73, 74, 97, 101]
[8, 7, 4, 2, 1] =>  [1, 2, 4, 7, 8]
[8, 7, 2, 11] =>  [2, 7, 8, 11]
[8, 7] =>  [7, 8]


## Bucket Sort
Useful when __input uniformly distributed over a range__  
Time c. __O(n)__ on average if all numbers are uniformly distributed  
1) Create n empty buckets (Or lists).  
2) Insert every arr[i] into bucket[n*arr[i]]  
3) Sort individual buckets using insertion sort.  
4) Concatenate all sorted buckets.  

In [41]:
# https://www.techconductor.com/algorithms/python/Sort/Bucket_Sort.php
import math

def insertion_sort(bucket):
        
    for i in range(1, len(bucket)):
                
        key = bucket[i] 
        j = i - 1
        while j >=0 and bucket[j] > key:  
            bucket[j + 1] = bucket[j] 
            j -= 1
        bucket[j + 1] = key
                
    return bucket      


def bucket_sort(arr, bucket_size = DEFAULT_BUCKET_SIZE):
        
    if len(arr) == 0: return[]

    minValue = arr[0]                                                                  # finding min and max
    maxValue = arr[0]
        
    for i in range(0, len(arr)):
        if arr[i] < minValue:
            minValue = arr[i]
        elif arr[i] > maxValue:
            maxValue = arr[i]
    
    buckets = []    
    bucket_count = math.floor((maxValue - minValue) / bucket_size) + 1                      # Initialize buckets
        
    for i in range(0, bucket_count):
        buckets.append([])
    
    for i in range(0, len(arr)):                                                       # put values in buckets
        buckets[math.floor((arr[i] - minValue) / bucket_size)].append(arr[i])
    
    arr = []
    for i in range(0, len(buckets)):                                                   # Sort buckets and place back into array
        buckets[i] = insertion_sort(buckets[i])
        for j in range(0, len(buckets[i])):
            arr.append(buckets[i][j])

    return arr


DEFAULT_BUCKET_SIZE = 5
arr2 = [12, 23, 4, 5, 3, 2, -12, 81, 56, 95]
sortedArray = bucket_sort(arr2)
print(sortedArray)

[-12, 2, 3, 4, 5, 12, 23, 56, 81, 95]


## Radix sort
Digit by digit sort starting from least significant digit to most significant digit. Use counting sort to sort individual digits.  
Steps:  
For each digit i (least significant digit < i < most significant digit): sort input array using counting sort on the i’th digit

Radix sort is an integer sorting algorithm that sorts data with integer keys by grouping the keys by individual digits that share the same significant position and value (place value). Radix sort uses counting sort as a subroutine to sort an array of numbers. Because integers can be used to represent strings (by hashing the strings to integers), radix sort works on data types other than just integers. Because radix sort is not comparison based, it is not bounded by \Omega(n \log n)Ω(nlogn) for running time — in fact, radix sort can perform in linear time

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [12]:
# Using counting sort to sort the elements in the basis of significant places
def countingSort(array, place):
    size = len(array)
    output = [0] * size
    count = [0] * 10

    # Calculate count of elements
    for i in range(0, size):
        index = array[i] // place
        count[index % 10] += 1

    # Calculate cummulative count
    for i in range(1, 10):
        count[i] += count[i - 1]

    # Place the elements in sorted order
    i = size - 1
    while i >= 0:
        index = array[i] // place
        output[count[index % 10] - 1] = array[i]
        count[index % 10] -= 1
        i -= 1

    for i in range(0, size):
        array[i] = output[i]


# Main function to implement radix sort
def radixSort(array):
    # Get maximum element
    max_element = max(array)

    # Apply counting sort to sort elements based on place value.
    place = 1
    while max_element // place > 0:
        countingSort(array, place)
        place *= 10


data = [121, 432, 564, 23, 1, 45, 788]
radixSort(data)
print(data)

[1, 23, 45, 121, 432, 564, 788]


## Iterators

Iterators in Python
Iterators are everywhere in Python. They are elegantly implemented within for loops, comprehensions, generators etc. but are hidden in plain sight.

Iterator in Python is simply an object that can be iterated upon. An object which will return data, one element at a time.

Technically speaking, a Python iterator object must implement two special methods, `__iter__()` and `__next__()`, collectively called the iterator protocol.

An object is called iterable if we can get an iterator from it. Most built-in containers in Python like: list, tuple, string etc. are iterables.

The iter() function (which in turn calls the `__iter__()` method) returns an iterator from them

Iterating Through an Iterator  
We use the next() function to manually iterate through all the items of an iterator. When we reach the end and there is no more data to be returned, it will raise the StopIteration Exception. Following is an example

In [13]:
my_list = [4, 7, 0, 3]
my_iter = iter(my_list)                                                             # get an iterator using iter()

# iterate
print(next(my_iter))
print(next(my_iter))

print(my_iter.__next__())                                                           # next(obj) is same as obj.__next__()
print(my_iter.__next__())

next(my_iter)                                                                       # This will raise error, no items left

4
7
0
3


StopIteration: 

Building Custom Iterators
Building an iterator from scratch is easy in Python. We just have to implement the __iter__() and the __next__() methods.

The __iter__() method returns the iterator object itself. If required, some initialization can be performed.

The __next__() method must return the next item in the sequence. On reaching the end, and in subsequent calls, it must raise StopIteration.

Here, we show an example that will give us the next power of 2 in each iteration. Power exponent starts from zero up to a user set number.

In [14]:
class PowTwo:
    """Class to implement an iterator
    of powers of two"""

    def __init__(self, max=0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.max:
            result = 2 ** self.n
            self.n += 1
            return result
        else:
            raise StopIteration


# create an object
numbers = PowTwo(3)

# create an iterable from the object
i = iter(numbers)

# Using next to get to the next iterator element
print(next(i))
print(next(i))
print(next(i))
print(next(i))
print(next(i))

1
2
4
8


StopIteration: 

Infinite iterators
It is not necessary that the item in an iterator object has to be exhausted. There can be infinite iterators (which never ends). We must be careful when handling such iterators.

Here is a simple example to demonstrate infinite iterators.

The built-in function iter() function can be called with two arguments where the first argument must be a callable object (function) and second is the sentinel. The iterator calls this function until the returned value is equal to the sentinel.

In [16]:
print(int())
inf = iter(int,1)
print(next(inf))
print(next(inf))

0
0
0


In [18]:
class InfIter:
    """Infinite iterator to return all
        odd numbers"""

    def __iter__(self):
        self.num = 1
        return self

    def __next__(self):
        num = self.num
        self.num += 2
        return num
    
    
a = iter(InfIter())
for i in range(5):
    print(next(a))

1
3
5
7
9


Be careful to include a terminating condition, when iterating over these types of infinite iterators.

The advantage of using iterators is that they save resources. Like shown above, we could get all the odd numbers without storing the entire number system in memory. We can have infinite items (theoretically) in finite memory.

There's an easier way to create iterators in Python. To learn more visit: Python generators using yield

## Generators
Python generators are a simple way of creating iterators. All the work we mentioned above are automatically handled by generators in Python.

Simply speaking, a __generator is a function that returns an iterator object__ which we can iterate over (one value at a time)

It is fairly simple to create a generator in Python. It is as easy as defining a normal function, but with a yield statement instead of a return statement.

If a function contains at least one yield statement (it may contain other yield or return statements), it becomes a generator function. Both yield and return will return some value from a function.

The difference is that while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls

Differences between Generator function and Normal function
Here is how a generator function differs from a normal function.

Generator function contains one or more yield statements.
When called, it returns an object (iterator) but does not start execution immediately.
Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next().
Once the function yields, the function is paused and the control is transferred to the caller.
Local variables and their states are remembered between successive calls.
Finally, when the function terminates, StopIteration is raised automatically on further calls.

__Memory Efficiency__: A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill, if the number of items in the sequence is very large.

Generator implementation of such sequences is memory friendly and is preferred since it only produces one item at a time

Generators can be implemented in a clear and concise way as compared to their iterator class counterpart. Following is an example to implement a sequence of power of 2 using an iterator class

In [25]:
def PowTwoGen(max=0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1

y = PowTwoGen(3)
for i in range(5):
    print(next(y))

1
2
4


StopIteration: 

Generators are excellent mediums to represent an __infinite stream of data__ which is not stored in memeory

In [None]:
# do not run :)
def all_even():
    n = 0
    while True:
        yield n
        n += 2

Multiple generators can be used to __pipeline a series of operations__. E.g. sum of squares of numbers in the first 10 Fibonacci series. Result: efficient, easy to read, a lot cooler!

In [19]:
def fibonacci_numbers(nums):
    x, y = 0, 1
    for _ in range(nums):
        x, y = y, x+y
        yield x

def square(nums):
    for num in nums:
        yield num**2

print(sum(square(fibonacci_numbers(10))))

4895


## SEARCH ALGORITHMS

### Linear / Sequential  Search
Worst-case performance	O(n)
Best-case performance	O(1)
Average performance	O(n)
Worst-case space complexity O(1) iterative

Linear search is rarely used practically because other search algorithms such as the binary search algorithm and hash tables have a significantly performance

In [9]:
def linear_search(arr, x): 
  
    for i in range (0, len(arr)): 
        if (arr[i] == x): 
            return i
    return -1

In [None]:
# If we know the list is ordered than, we only have to check until we have found the element or an element greater than it
def ordered_seq_search(arr,ele):
    """
    Sequential search for an Ordered list
    """
    # Start at position 0
    pos = 0
    
    # Target becomes true if ele is in the list
    found = False
    
    # Stop marker
    stopped = False
    
    # go until end of list
    while pos < len(arr) and not found and not stopped:
        
        # If match
        if arr[pos] == ele:
            found = True
            
        else:
            
            # Check if element is greater
            if arr[pos] > ele:
                stopped = True
                
            # Otherwise move on
            else:
                pos  = pos+1
    
    return found

In [10]:
arr = [ 2, 3, 4, 10, 40 ]
x = 10
result = linear_search(arr, x)
if(result == -1): 
    print("Element is not present in array") 
else: 
    print("Element is present at index", result)

Element is present at index 3


## Binary Search
Sorted list => reduces time complexity to __O(Log n)__  
Auxiliary Space: __O(1) iterative__ implementation, __O(Logn) recursion__

In [15]:
# iterative
def binary_search(arr, value):
    
    if len(arr) == 0: return None
    
    min_idx, max_idx= 0, len(arr)
        
    while min_idx < max_idx:
        mid = (min_idx + max_idx) // 2
    
        if arr[mid] == value:
            return mid
        elif arr[mid] < value:
            min_idx = mid + 1
        else: max_idx = mid
    
    return None

In [16]:
# recursive
# have to keep arr intact and pass array bounds to recursion to get the correct mid point index
# if array bounds are not passed, only the boolean version works (found / not found)
def binary_search_rec(arr, value, start=None, end=None):
        
    length = len(arr)
    
    if start is None:
        start = 0
    if end is None:
        end = len(arr) - 1
    
    if not length or start >= end:
        return None
    
    mid = (start + end) // 2
    if arr[mid] == value:
        return mid
    
    elif arr[mid] > value:
        return binary_search_rec(arr, value, start = start, end = mid)
    
    else:
        return binary_search_rec(arr, value, start = mid + 1, end = length)
    
    return None

In [20]:
array = [1,2,3,4,5,6,7,8,9]
num = 8
print(binary_search(array, num))
print(binary_search_rec(array, num))

7
7


## Interpolation search
Idea - return higher value of pos when searched value closer to arr[hi], and smaller when closer to arr[lo]  
pos = lo + [ (x-arr[lo])*(hi-lo) / (arr[hi]-arr[Lo]) ]


On average the interpolation search makes about log(log(n)) comparisons (if the elements are uniformly distributed), where n is the number of elements to be searched. In the worst case (for instance where the numerical values of the keys increase exponentially) it can make up to O(n) comparisons

O(1) space complexity

Usage:
improvement over Binary Search for instances, where the values in a sorted array are uniformly distributed. Binary goes to the middle while interpolation may go closer the element to be found. Formula for position:  
pos = lo + [ (x-arr[lo])*(hi-lo) / (arr[hi]-arr[Lo]) ]

arr[] ==> Array where elements need to be searched  
x     ==> Element to be searched  
lo    ==> Starting index in arr[]  
hi    ==> Ending index in arr[]

In [11]:
# If x is present in arr[0..n-1], then returns 
# index of it, else returns -1 
def interpolationSearch(arr, n, x): 
    # Find indexs of two corners 
    lo = 0
    hi = (n - 1) 
   
    # Since array is sorted, an element present 
    # in array must be in range defined by corner 
    while lo <= hi and x >= arr[lo] and x <= arr[hi]: 
        if lo == hi: 
            if arr[lo] == x:  
                return lo; 
            return -1; 
          
        # Probing the position with keeping 
        # uniform distribution in mind. 
        pos  = lo + int(((float(hi - lo) / 
            ( arr[hi] - arr[lo])) * ( x - arr[lo]))) 
  
        # Condition of target found 
        if arr[pos] == x: 
            return pos 
   
        # If x is larger, x is in upper part 
        if arr[pos] < x: 
            lo = pos + 1; 
   
        # If x is smaller, x is in lower part 
        else: 
            hi = pos - 1; 
      
    return -1

In [14]:
# Driver Code 
# Array of items oin which search will be conducted 
arr = [10, 12, 13, 16, 18, 19, 20, 21, 22, 23, 24, 33, 35, 42, 47] 
n = len(arr) 
  
x = 18 # Element to be searched 
index = interpolationSearch(arr, n, x) 
  
if index != -1: 
    print("Element found at index", index) 
else: 
    print("Element not found")

Element found at index 4


## Jump Search

* Time Complexity : O(√n)
* Auxiliary Space : O(1)

Like Binary Search, Jump Search is a searching algorithm for sorted arrays. The basic idea is to check fewer elements (than linear search) by jumping ahead by fixed steps or skipping some elements in place of searching all elements.

For example, suppose we have an array arr[] of size n and block (to be jumped) size m. Then we search at the indexes arr[0], arr[m], arr[2m]…..arr[km] and so on. Once we find the interval (arr[km] < x < arr[(k+1)m]), we perform a linear search operation from the index km to find the element x.

Let’s consider the following array: (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610). Length of the array is 16. Jump search will find the value of 55 with the following steps assuming that the block size to be jumped is 4.
STEP 1: Jump from index 0 to index 4;
STEP 2: Jump from index 4 to index 8;
STEP 3: Jump from index 8 to index 12;
STEP 4: Since the element at index 12 is greater than 55 we will jump back a step to come to index 8.
STEP 5: Perform linear search from index 8 to get the element 55.

What is the optimal block size to be skipped?
In the worst case, we have to do n/m jumps and if the last checked value is greater than the element to be searched for, we perform m-1 comparisons more for linear search. Therefore the total number of comparisons in the worst case will be ((n/m) + m-1). The value of the function ((n/m) + m-1) will be minimum when m = √n. Therefore, the best step size is m = √n.

Important points:

Works only sorted arrays.
The optimal size of a block to be jumped is (√ n). This makes the time complexity of Jump Search O(√ n).
The time complexity of Jump Search is between Linear Search ( ( O(n) ) and Binary Search ( O (Log n) ).
Binary Search is better than Jump Search, but Jump search has an advantage that we traverse back only once (Binary Search may require up to O(Log n) jumps, consider a situation where the element to be searched is the smallest element or smaller than the smallest). So in a system where binary search is costly, we use Jump Search

In [17]:
import math

def jumpSearch( arr , x , n ): 
      
    # Finding block size to be jumped 
    step = math.sqrt(n) 
      
    # Finding the block where element is 
    # present (if it is present) 
    prev = 0
    while arr[int(min(step, n)-1)] < x: 
        prev = step 
        step += math.sqrt(n) 
        if prev >= n: 
            return -1
      
    # Doing a linear search for x in  
    # block beginning with prev. 
    while arr[int(prev)] < x: 
        prev += 1
          
        # If we reached next block or end  
        # of array, element is not present. 
        if prev == min(step, n): 
            return -1
      
    # If element is found 
    if arr[int(prev)] == x: 
        return prev 
      
    return -1

In [18]:
# Driver code to test function 
arr = [ 0, 1, 1, 2, 3, 5, 8, 13, 21, 
    34, 55, 89, 144, 233, 377, 610 ] 
x = 55
n = len(arr) 
  
# Find the index of 'x' using Jump Search 
index = jumpSearch(arr, x, n) 
  
# Print the index where 'x' is located 
print("Number" , x, "is at index" ,"%.0f"%index) 

Number 55 is at index 10


## Exponential Search
Find range where element is present
Do Binary Search in above found range.
How to find the range where element may be present?
The idea is to start with subarray size 1, compare its last element with x, then try size 2, then 4 and so on until last element of a subarray is not greater.
Once we find an index i (after repeated doubling of i), we know that the element must be present between i/2 and i (Why i/2? because we could not find a greater value in previous iteration)

Time Complexity : O(Log n)
Auxiliary Space : The above implementation of Binary Search is recursive and requires O(Log n) space. With iterative Binary Search, we need only O(1) space.

Applications of Exponential Search:

Exponential Binary Search is particularly useful for unbounded searches, where size of array is infinite. Please refer Unbounded Binary Search for an example.
It works better than Binary Search for bounded arrays, and also when the element to be searched is closer to the first element.

In [None]:
# Returns the position of first 
# occurrence of x in array 
def exponentialSearch(arr, n, x): 
    # IF x is present at first  
    # location itself 
    if arr[0] == x: 
        return 0
          
    # Find range for binary search  
    # j by repeated doubling 
    i = 1
    while i < n and arr[i] <= x: 
        i = i * 2
      
    # Call binary search for the found range 
    return binarySearch( arr, i / 2,  
                         min(i, n), x)

In [None]:
# Driver Code 
arr = [2, 3, 4, 10, 40] 
n = len(arr) 
x = 10
result = exponentialSearch(arr, n, x) 
if result == -1: 
    print "Element not found in thye array"
else: 
    print "Element is present at index %d" %(result) 

#### Binary vs. Linear
* Input data needs to be sorted in Binary Search and not in Linear Search
* Linear search does the sequential access whereas Binary search access data randomly.
* Time complexity of linear search - O(n) , Binary search has time complexity O(log n).
* Linear search performs equality comparisons and Binary search performs ordering comparisons


#### Binary search does less comparisons than tertiary
The following is recursive formula for counting comparisons in worst case of Binary Search.

   T(n) = T(n/2) + 2,  T(1) = 1
The following is recursive formula for counting comparisons in worst case of Ternary Search.

   T(n) = T(n/3) + 4, T(1) = 1
In binary search, there are 2Log2n + 1 comparisons in worst case. In ternary search, there are 4Log3n + 1 comparisons in worst case.

Time Complexity for Binary search = 2clog2n + O(1)
Time Complexity for Ternary search = 4clog3n + O(1)
Therefore, the comparison of Ternary and Binary Searches boils down the comparison of expressions 2Log3n and Log2n . The value of 2Log3n can be written as (2 / Log23) * Log2n . Since the value of (2 / Log23) is more than one, Ternary Search does more comparisons than Binary Search in worst case

## Hash Table
___

Hash Table with Hash Functions (mapping). __Python's dictionary => Hash Table__

Methods:

* **HashTable()** returns an empty map collection.
* **put(key,val)** add a new key-value pair, if key in the map - replace with the new value.
* **get(key)** return the value for a given key or None otherwise.
* **del** delete key-value pair: del map[key].
* **len()** number of key-value pairs 
* key __in__ map: True if key in map, False otherwise

## Hash function
Two heuristic methods:

__Hashing by division__ (mod method):  
Map a key into one of the slots of table by taking the remainder of key divided by table_size:  
__h(key) = key % table_size__

Fast - single division.  
Avoid certain values of table_size: if table_size = r^p, then h(key) is just the p lowest-order bits of key - better off designing the hash function to depend on all the bits of the key unless we know that all low-order p-bit patterns are equally likely.  
Best results when table size = prime with additional restriction - if r = number of possible character codes on a computer, and if table_size = prime such that r % table_size = 1, then h(key) = key % table_size is sum of the binary representation of the characters in key % table_size.

Example:  
Suppose r = 256 and table_size = 17, in which r % table_size i.e. 256 % 17 = 1.  
Key = 37596, its hash is 37596 % 17 = 12  
But for key = 573, its hash function is also 573 % 12 = 12 - collision

A prime not too close to an exact power of 2 is often good choice for table_size.  

__Hashing by multiplication__:  
Multiply key k by constant real number c, 0 < c < 1, => extract fractional part => multiply this by table_size m and take floor:  
__h(k) = floor (m * frac (k * c))__
or  
h(k) = floor (m * (k * c mod 1))  
floor(x) from math.h yields integer part of real number x, and frac(x) yields fractional part (frac(x) = x – floor(x))

Value of m is not critical, typically choose a power of 2 (m = 2p for some integer p)  

Example:

Suppose k = 123456, p = 14,  
m = 2^14 = 16384, and w = 32.  
Adapting Knuth’s suggestion, c to be fraction of the form s / 2^32.  
Then key * s = 327706022297664 = (76300 * 2^32) + 17612864,  
So r1 = 76300 and r0 = 176122864.  
The 14 most significant bits of r0 yield the value h(key) = 67.

In [7]:
class HashTable:
    
    def __init__(self, size):
        
        # Set up size and keys and values
        self.size = size
        self.keys = [None] * self.size
        self.values = [None] * self.size
        
    def put(self, key, value):
        
        #Note, we'll only use integer keys for ease of use with the Hash Function
        # Get the hash value
        hashvalue = self.hashfunction(key, len(self.keys))

        # If key is empty
        if self.keys[hashvalue] == None:
            self.keys[hashvalue] = key
            self.values[hashvalue] = value
        
        else:
            
            # If key exists, replace with new value
            if self.keys[hashvalue] == key:
                self.values[hashvalue] = value  
            
            # If hashvalue has a different key
            else:
                
                nextkey = self.rehash(hashvalue, len(self.keys))
                
                # Get to next key
                while self.keys[nextkey] != None and self.keys[nextkey] != key:
                    nextkey = self.rehash(nextkey, len(self.keys))
                
                # Set new key, if NONE
                if self.keys[nextkey] == None:
                    self.keys[nextkey]=key
                    self.values[nextkey]=value
                    
                # Otherwise replace old value
                else:
                    self.values[nextkey] = value 

    def hashfunction(self, key, size):
                
        # Remainder Method
        return key%size

    def rehash(self, oldhash, size):
                
        # For finding next possible keys
        return (oldhash+1)%size
    
    
    def get(self, key):
        
        # Get value by key        
        # Set up variables for search
        startkey = self.hashfunction(key, len(self.keys))
        value = None        
        found = False
        stop = False
        position = startkey
        
        # Until we discern that its not empty or found (and haven't stopped yet)
        while self.keys[position] != None and not found and not stop:
            
            if self.keys[position] == key:
                found = True
                value = self.values[position]
                
            else:
                position=self.rehash(position, len(self.keys))
                                
                if position == startkey:                    
                    stop = True
                                        
        return value

    # Special Methods for use with Python list indexing
    # https://stackoverflow.com/questions/43627405/understanding-getitem-method
    def __getitem__(self, key):
        return self.get(key)

    def __setitem__(self, key, value):
        self.put(key, value)

In [8]:
h = HashTable(5)

# Put our first key in
h[0] = 'one'
h[2] = 'two'
h[3] = 'three'
print(h[0])

one


In [9]:
h[1] = 'new_one'
h[1]

'new_one'

In [10]:
print(h[4])

None


In [11]:
'two' in h

True

## Edit Levenshtein Distance
https://stackoverflow.com/questions/2460177/edit-distance-in-python
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python

In [18]:
# wiki
def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)

    # len(s1) >= len(s2)
    if len(s2) == 0:
        return len(s1)

    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions    = previous_row[j + 1] + 1       # j+1 instead of j since previous_row and current_row are one char longer            
            substitutions = previous_row[j] + (c1 != c2)
            deletions     = current_row[j] + 1             # than s2
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
        print(previous_row)
    
    return previous_row[-1]

In [19]:
# USE THIS ONE (stackoverflow)
def levenshtein2(s1, s2):
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
        distances = distances_
    return distances[-1]

In [20]:
levenshtein('aborigenous', 'sc')

[1, 1, 2]
[2, 2, 2]
[3, 3, 3]
[4, 4, 4]
[5, 5, 5]
[6, 6, 6]
[7, 7, 7]
[8, 8, 8]
[9, 9, 9]
[10, 10, 10]
[11, 10, 11]


11

In [17]:
levenshtein2('aborigenous', 'sc')

11

## Appendix

## Selection Sort
Based on https://github.com/agnedil/Code-Python-Algorithms-Notebooks/blob/master/ipython_nbs/sorting/selection_sort.ipynb

* One of the **simplest** and classic sorting algorithms.
* Complexity of __O(n^2)__ (list of n elements is sorted n times).
* In-place sorting of an array:
  * **Find the smallest element** in $[array[1] : array[n]]$ and **swap it with array[0]** if it is < $array[0]$;
  * **Repeat** the same, but for $array[n-1]$, swapping with $array[1]$, and so on.
* First iteration checks n elements, second iteration checks n-1 elements, third - n-2, and so forth => # of checks is $n\times 1/2 \times n$. But Big O omits constants => $O(n^2)$ instead of $O(n\times 1/2 \times n)$.

In [10]:
# sorting f(x)
def selection_sort(myarr):
    
    # iterate over the array
    for i in range (len(myarr)-1):
        min_idx = i + 1
        
        # find the smallest element on the right excluding the leftmost element
        for j in range(i + 2, len(myarr)):
            if myarr[j] < myarr[min_idx]:
                min_idx = j
                
        # compare the smallest element found on the right w/the leftmost element and swap if smaller
        if myarr[min_idx] < myarr[i]:
            myarr[min_idx], myarr[i] = myarr[i], myarr[min_idx]
            
    return myarr

In [21]:
# test sorting f(x)
a, b, c, d = [8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55], [8, 7, 4, 2, 1], [8, 7, 2, 11], [8, 7]
for arr in [a, b, c, d]:
    print(arr, end=' ')
    print('=> ', merge_sort(arr))

[8, 7, 4, 2, 1, 25, 29, 38, 45, 5, 101, 97, 73, 74, 72, 55] =>  [1, 2, 4, 5, 7, 8, 25, 29, 38, 45, 55, 72, 73, 74, 97, 101]
[8, 7, 4, 2, 1] =>  [1, 2, 4, 7, 8]
[8, 7, 2, 11] =>  [2, 7, 8, 11]
[8, 7] =>  [7, 8]


In [None]:
# Facebook
Find kth Smallest Value Among m Sorted Arrays

You have m arrays of sorted integers. The sum of the array lengths is n. Find the kth smallest value of all the values.

For exmaple, if m = 3, n=8, and we have these lists:

list1 = [3,6,9]
list2 = [8,15]
list3 = [4, 7, 12]

if k = 1, then returned value should be 3
if k = 2, then returned value should be 4
if k = 3, then returned value should be 6

# 1- brute foce: combine all arrays and find kth smallest: O(n)
list1 = [3,6,9]
list2 = [8,15]
list3 = [4, 7, 12]


input = [
    [8,15],
    [4],
    [5,6,9], 
]
# 4,5, 6, 7, 8, 15
k = 2


input = [
    [],
    [],
    [], 
]
k = 5

def find_kth(input, k):
    
    for idx, array in enumerate(input):
        if not array:
            input.remove(idx)
    
    if not input:
        raise exception
        #return None

    array_lengths = [len(item) for item in input]    
    temp_array = []
    
    for i in range(k):
        min_len = min(array_lenths)
        if k = min_len:
            idx = array_lengths.index(min_len)
            input.remove(idx)            
        temp_array.extend([item[i] for item in input])
        
    temp_array = sorted(temp_array)   
        
        
        
    return temp_array[k-1]
    
Complexity: nlogn, space: O(n)