# Notes for Algorithm Questions

## Resources
- Programming Interviews Exposed 3rd
- [Elements of Programming Interviews in Python](http://elementsofprogramminginterviews.com/sample/epilight_python_new.pdf)

## 1. Arrays and Strings
- Most array problems have a O(n) space brute force solution, but in-placment implementation is usually needed
- By using `pivot variables`, arrays can have different partitions (sometime even overlapped), this is specially useful to in-placement implementations. Examples are the even-odd element problem and etc.
- For matching/searching in array, optimal solutions usually involve `data structures`, such as trees, sorted array, or hash.
- Most array problems can be solved in `O(nlogn)` time, e.g., by sorting it first. If it is an interview question, usually it targets at `O(n)` or even `O(logn)`.
- Arrays operate equally efficiently from BOTH ends
- When using pivot variables to define the boundary of partitions, I found it useful to define them as [start, end). This asymetry makes it easier to represent empty partitions

### Find the first non-duplicate letter in  a string
***key***: use a data structure to facilitate searching

In [1]:
def first_non_duplicate(s):
    ## o(n) scan
    letter_occurances = {}
    for i,x in enumerate(s):
        if x not in letter_occurances:
            letter_occurances[x] = 1
        else:
            letter_occurances[x] += 1
    ## o(n)
    for x in s:
        if letter_occurances[x] == 1:
            return x
    return None

def test():
    assert first_non_duplicate("total") == "o"
    assert first_non_duplicate("teeter") == "r"
    
test()

### Delete a set of letters from a string
- It is much easier to use O(n) space
- To have an in-placement implementation, we need to do the marking and swapping (deleted with array end) at the same time - that is an application of using `pivot variable`
- `src` represents where the letter is, `dst` represents where it should be in the new string. Since we are removing only, `dst` should be smaller or equal to `src`
- we update `src` and `dst` by different paces based on whether there is sth to remove before it.
- we do the swapping only when `src` != `dst`

In [2]:
def remove(s, to_remove):
    s = list(s)
    src, dst = 0, 0
    for i, x in enumerate(s):
        if src != dst:
            s[dst] = s[src]
        src += 1
        dst += 1
        if x in to_remove: # in could be implemented in hashing
            dst -=  1
    return ''.join(s[:-(src-dst)])

## After the first implmentatin, I refactor it to the following

def remove(s, to_remove):
    s = list(s)
    dst = 0
    for src, x in enumerate(s):
        if src != dst:
            s[dst] = s[src]
        if x not in to_remove: # in could be implemented in hashing
            dst +=  1
    return ''.join(s[:-(src-dst+1)])

assert remove("Battle of the Vowels: Hawaii vs. Grozny", "aeiou") == "Bttl f th Vwls: Hw vs. Grzny"

### Words reversion
- words are delimited by spaces
- collect words, copy over each in reverse order
- O(n) time and O(n) space

In [3]:
def reverse_words(s):
    s = list(s)
    w_start, w_end = 0, 0
    word_pos = []
    while w_start < len(s) and w_end < len(s):
        while s[w_start] == ' ':
            w_start += 1
        w_end = w_start
        while  w_end < len(s) and s[w_end] != ' ':
            w_end += 1
        word_pos.append([w_start, w_end])
        w_start = w_end
    xs = [' '] * len(s)
    i = 0
    for start, end  in word_pos[::-1]:
        xs[i:(i+end-start)] = s[start:end]
        i = i+end-start+1
    return ''.join(xs)

assert reverse_words("Do or do not, there is no try.") == "try. no is there not, do or Do"

### Even-Odd Problem
Rearrange the array so that even elements appear first.

***key***: Use pivot variables to partition the array into even, unclassified and odd 

In [4]:
def even_odd(xs):
    # even partition : [-1, unclassfied)
    # unclassfied partion: [unclassfied, odd)
    # odd partition: [odd, ...]
    unclassified = 0
    odd = len(xs)
    while unclassified < odd:
        if xs[unclassified] % 2 == 1:
            xs[unclassified], xs[odd-1] = xs[odd-1], xs[unclassified]
            odd -= 1
        elif xs[unclassified] % 2 == 0:
            unclassified += 1
    return xs

r = even_odd([1, 1, 3, 5, 6, 9, 2, 7])
assert set(r[:2]) == set([2, 6])
r

[2, 6, 5, 3, 9, 1, 7, 1]

### Dutch national flag problem
- background: quick-sort might become slow when the binary division by < and >= are not of comparable size, e.g., when we have a lot of duplicate values in the array. a better solution is to put them in three divisions such as <, = and >
- it is trivial to use O(n) space
- the solution is similiar to the previous even-odd problem
- It is a difficult problem because we need to maintain 4 partitions, less-than, equal, greater-than and unclassified. One way of doing this is to do several pass.

In [5]:
## to make the implementation easier, and thus less bug-prob
## I use two iterations, one for moving smaller elements, one for moving bigger elements
def dutch_flag(xs, pivot_index=0):
    """xs: array of ints
    return xs as in partially ordered
    """
    p = xs[pivot_index]
    
    # first pass - moving smaller to the front
    i_lt, i_unclassified = -1, 0
    while i_unclassified < len(xs):
        x = xs[i_unclassified]
        if x < p: 
            xs[i_lt+1], xs[i_unclassified] = xs[i_unclassified], xs[i_lt+1]
            i_lt += 1
        else:
            i_unclassified += 1
    # second pass - moving greater to the end
    i_unclassified, i_gt = i_lt+1, len(xs)
    while i_unclassified < i_gt:
        x = xs[i_unclassified]
#         print(xs, i_unclassified)
        if x > p:
            xs[i_gt-1], xs[i_unclassified] = xs[i_unclassified], xs[i_gt-1]
            i_gt -= 1
        else:
            i_unclassified += 1
        
    return xs


xs = [5, 1, 1, 3, 6, 9, 5, 2, 7]

In [6]:
dutch_flag(xs)

[1, 1, 3, 2, 5, 5, 9, 7, 6]

In [7]:
def dutch_flag(xs, pivot_index=0):
    p = xs[pivot_index]
    # define the partitions
    # < : [0, lt_end)
    # = : [lt_end, eq_end)
    # unclassified: [eq_end, gt_start)
    # >: [gt_start, n)
    n = len(xs)
    lt_end, eq_end, gt_start = 0, 0, n
    i = 0
    while i < gt_start:
        if xs[i] == p:
            i += 1
        elif xs[i] < p: 
            xs[lt_end], xs[i] = xs[i], xs[lt_end] # xs[lt_end] should be equal before hand
            i += 1
            lt_end += 1
        else:
            xs[gt_start-1], xs[i] = xs[i], xs[gt_start-1]
            gt_start -= 1
    return xs

In [14]:
xs = [5, 1, 1, 3, 6, 9, 5, 2, 7]
print(xs[0], dutch_flag(xs, 0))
xs = [5, 1, 1, 3, 6, 9, 5, 2, 7]
print(xs[1], dutch_flag(xs, 1))
xs = [5, 1, 1, 3, 6, 9, 5, 2, 7]
print(xs[5], dutch_flag(xs, 5))
xs = [5, 1, 1, 3, 6, 9, 5, 2, 7]
print(xs[8], dutch_flag(xs, 8))

5 [1, 1, 3, 2, 5, 5, 9, 7, 6]
1 [1, 1, 3, 6, 9, 5, 2, 7, 5]
9 [5, 1, 1, 3, 6, 5, 2, 7, 9]
7 [5, 1, 1, 3, 6, 5, 2, 7, 9]


### palindromic string
- python ```s[~i]``` for in in [0, len(s)) is s[-(i+1)]

In [18]:
~0, ~1

(-1, -2)

In [22]:
def is_palindromic(s):
    return all([s[i] == s[~i] for i in range(len(s) // 2)])


assert is_palindromic("aabaa") == True
assert is_palindromic("a") == True
assert is_palindromic("aaba") == False

## 2. Static Searching
- Usually assmes the elements are sorted

### Binary search in python by bisect

In [42]:
## traditional binary search in pthhon
import bisect
xs = [1, 1, 2, 3, 5, 5, 6, 7, 9] # sorted

assert bisect.bisect_left(xs, 1) == 0
assert bisect.bisect_left(xs, 5) == 4
assert bisect.bisect_left(xs, 9) == 8
assert bisect.bisect_left(xs, -1) == 0

### Search for FIRST occurance
Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array. That's what `bisect.bisect_left` does.

***key***: twist the test condition

In [48]:
def first_occurance(xs, x):
    n = len(xs)
    lower, upper = 0, n-1
    while lower <= upper:
        middle = lower + (upper-lower) // 2 # to avoid potential overflow
        if xs[middle] > x:
            upper = middle - 1
        elif xs[middle] < x:
            lower = middle + 1
        else: #xs[middle] == x
            while xs[middle] == x: # this might be O(n)!
                middle -= 1
            return middle+1
    return None

xs = [-14, -10, 2, 108, 108, 243, 285, 285, 401]

assert first_occurance(xs, 108) == 3
assert first_occurance(xs, 285) == 6
assert first_occurance(xs, -14) == 0
assert first_occurance(xs, -100) == None

The worst case complexity for the above implementation might be O(n) + O(lgn) = O(n), because the backward search might take O(n).

A better solution: The fundamental idea of binary search is to maintain a set of candidate solutions (like in `backtracking`). Binary search is doing searching by eliminating.

In [53]:
## A better solution - keep the O(logn) in binary search

def first_occurance(xs, x):
    n = len(xs)
    lower, upper = 0, n-1
    found = None
    while lower <= upper:
        middle = lower + (upper - lower) // 2
        if xs[middle] < x:
            lower = middle + 1
        elif xs[middle] > x:
            upper = middle - 1
        else:
            found = middle # keep the answer
            upper = middle - 1 # keep looking
    return found

xs = [-14, -10, 2, 108, 108, 243, 285, 285, 401]

assert first_occurance(xs, 108) == 3
assert first_occurance(xs, 285) == 6
assert first_occurance(xs, -14) == 0
assert first_occurance(xs, -100) == None

## 3. Dynamic Searching
- Heaps
- Hash Tables
- Binary Search Trees

## 4. Sorting
- Know the pros and cons of different sorting algorithm
    - selection sort: swaping, in-palce, O(n2), not stable (bcs of swapping)
    - insertion sort: O(n2), not in-place, fast when inserting a few new records
    - quick sort: never merge, in-place, O(nlogn), worst case with repeated elements O(n2)
    - merge sort: not in-place, O(nlogn)
- Think about recursion
- A common way to make an unstable sorting to stable sorting - adding position to new keys

### Find the intersection of two sorted arrays (e.g., in search engine implmentation)

- Two arrays may have duplicate elements, but the result should be duplicate free
- it is a sorting problem because it is very related to merge-sort, which is to compare two sorted arrays
- it is O(n) + O(m)

In [62]:
def intersection(xs1, xs2):
    common = []
    i1, i2 = 0, 0
    while i1 < len(xs1) and i2 < len(xs2):
        if xs1[i1] == xs2[i2]:
            x = xs1[i1]
            if len(common) == 0 or x != common[-1]:
                common.append(x)
            i1 += 1
            i2 += 1
        elif xs1[i1] < xs2[i2]:
            i1 += 1
        else: # xs1[i1] > xs2[i2]
            i2 += 1
    return common

xs1 = [2, 3, 3, 5, 5, 6, 7, 7, 8, 12]
xs2 = [5, 5, 6, 8, 8, 9, 10, 10]
assert intersection(xs1, xs2) == [5, 6, 8]

### Given (start, end) time of events, find the max # of simultaneous events
- A common topic for sorting is to sort the intervals
- intervals are like parenthesis
***key:*** sort the interval endpoints, count the start as "(", and end as ")"
- it is O(nlogn) + O(n)

## 5. SQL

## 6. Data Structures

### 6.1 Linked List

### 6.2 Stacks & Queues
- See Pattern and Tricks for some usage of Stacks/Queues
- Good for implementation of iteration in different orders

#### Enhanced API of Stack with max()
- using additional data structure to return the current max() value in the  stack.

*** Solution ***: Use another stack to store the maximum and their occurances.

In [22]:
class Stack(object):
    def __init__(self):
        self.data_stack = []
        self.max_stack = []
    def push(self, x):
        # update data-stack
        self.data_stack.append(x)
        # update max-stack
        if len(self.max_stack)==0:
            self.max_stack.append((x, 1))
        else:
            if self.max() == x:
                _, times = self.max_stack[-1]
                self.max_stack[-1] = (x, times+1)
            elif self.max() < x:
                self.max_stack.append((x, 1))
            else:
                pass
    def pop(self):
        # update data-stack
        x = self.data_stack[-1]
        del self.data_stack[-1]
        # update max-stack
        if x < self.max():
            pass
        elif x == self.max():
            _, times = self.max_stack[-1]
            if times > 1:
                self.max_stack[-1] = (x, times-1)
            else:
                del self.max_stack[-1]
        return x
    def top(self):
        return self.data_stack[-1]
    def max(self):
        return self.max_stack[-1][0]
    
    
s = Stack()
s.push(2); assert s.top() == 2; assert s.max() == 2
s.push(2); assert s.top() == 2; assert s.max() == 2
s.push(1); assert s.top() == 1; assert s.max() == 2
s.push(4); assert s.top() == 4; assert s.max() == 4
s.push(5); assert s.top() == 5; assert s.max() == 5
s.push(5); assert s.top() == 5; assert s.max() == 5
s.push(3); assert s.top() == 3; assert s.max() == 5
s.pop(); assert s.top() == 5; assert s.max() == 5
s.pop(); assert s.top() == 5; assert s.max() == 5
s.pop(); assert s.top() == 4; assert s.max() == 4
s.pop(); assert s.top() == 1; assert s.max() == 2
s.push(0); assert s.top() == 0; assert s.max() == 2
s.push(3); assert s.top() == 3; assert s.max() == 3

## 7. Patterns and Tricks
- Most puzzles regarding comparison of every pair can be potentially optimized by first sorting and then partially comparing the elements (e.g., with neighbors). Examples are [finding shortes prefix](http://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/)
- Stacks/Queues can be very useful when we need to iterating and manipulating (e.g., removing/inserting) at the same time, e.g. [delete consecutive same words in sequence](http://www.geeksforgeeks.org/delete-consecutive-words-sequence/)

# Excercises