# Find the missing number in n-1 numbers

You're given an array of unsorted numbers in range 1 to n, of length n-1, with one missing.  Find the missing number

Solution:
    
    The sum of all integers from 1 to n is n*(n+1)/2
    We can then do this in O(n-1) with constant extra memory
    
    Another approach is to create a set/hashtable of numbers from 1 to n, 
    and walk the array, popping elements from the hash table until we're left with the last one.
    This os O(2*n) and O(n) extra memory.


In [4]:
from random import shuffle

def find_missing_number(arr):
    n = len(arr) + 1 # O(1)
    expected_sum = int(n * (n + 1) / 2)
    s = sum(arr) # O(n)
    return expected_sum - s

arr = list(range(1,11))
shuffle(arr)
print(arr.pop(4))
print(find_missing_number(arr))

8
8


# Create an LRU cache

Solution:
    
  Use a hashmap and a doubly-linked list with pointers to the head and tail.  The doubly linked list maintains the order, while the hash-map points from the key to the node.  When accessed, pop the element from the doubly-linked list and re-insert it at the front of the list.  To drop an elements from the cache because it times out, pop from the end of the list.


# Find if a name, from a list of names, exists in a document.  Do this in O(n) time.

Solution:

Build a trie/suffix tree from the list of names, with spaces.  This is pre-processing, so it won't effect the run-time cost of the algorithm.  Then walk the documnet, and test the trie at each word.  You can return a full list of names in the doc if you choose.

This should run in O(k*n), where k is the average length of the words/names in the tree.  For large n, the k is negligable, so basically O(n).

# Given an array of letters, find the second most frequent item


Solution:
    
    Create a hashmap for the items and their counts.  Also keep two pointers for the first and second most frequent elements.  For each new element, check if it's higher than first, or second highest count, if it is, replace them, and shift down the pointers.  This will take O(n) time and O(n) extra memory.  This is very specific to the kth most frequent element, and becomes very complicated as k becomes larger.
    
    There is also a with a max-heap, but this requires O(n*logn), because each insertion takes logn, and you have to insert possibly n times if every item is unique.  We can get the kth most frequent element more easily this way.  However, this is basically as expensive as counting and sorting.


In [23]:
def find_second_most_frequent(arr):
    counts_map = {arr[0] : 1}
    first_most = arr[0]
    idx = 1
    while arr[idx] == first_most:
        idx += 1
        counts_map[arr[idx]] += 1
    
    counts_map[arr[idx]] = 1
    second_most = arr[idx]
    idx += 1
    while arr[idx] == second_most:
        idx += 1
        counts_map[arr[idx]] += 1
    
    if counts_map[second_most] > counts_map[first_most]:
        second_most, first_most = first_most, second_most
    
    n = len(arr)
    # now we walk the rest of the array
    while idx < n:
        i = arr[idx]
        
        # increment
        if i in counts_map:
            counts_map[i] += 1
        else:
            counts_map[i] = 1
        
        # check if it was the second most, and if it's now the first most
        if i == second_most and counts_map[i] > counts_map[first_most]:
            # swap them
            first_most, second_most = second_most, first_most
        
        # check if it's now the second most
        elif not i == first_most and counts_map[i] > counts_map[second_most]:
            # replace the second most
            second_most = i
        idx += 1
    
    if counts_map[second_most] == counts_map[first_most]:
        return (first_most, second_most)
    else:
        return second_most

    
arr1 = ['a','b','c','a','a','a','b','b','b','c','d','d','d','e','e','e','e','e','e','e','e']
print(find_second_most_frequent(arr1))
arr2 = ['a','b','c','a','a','a','b','a','a','a','a','b','b','c','d','d','d','e','e','e','e','e','e','e','e']
print(find_second_most_frequent(arr2))

a
('a', 'e')


# Test if a tree is a valid binary tree

Solution:
    
    DFS or BFS the tree and test if each node has at most two children.
   
# Test if a tree is a valid binary search tree

Solution:
    use min and max pointers

In [33]:
INT_MAX = 10000000000
INT_MIN =-10000000000
 
# A binary tree node
class Node:
 
    # Constructor to create a new node
    def __init__(self, val, left = None, right = None):
        self.val = val 
        self.left = left
        self.right = right


def is_BST(root, mi, ma):
    
    def is_BST_rec(node, mi, ma):
        if node is None:
            return True
        
        if node.val < mi or node.val > ma:
            return False
        
        return is_BST_rec(node.left, mi, node.val - 1) and is_BST_rec(node.right, node.val + 1, ma)

    
    return is_BST_rec(root, mi, ma)

root = Node(4,
           Node(2, 
                Node(1), 
                Node(3)
               ),
           Node(5)
           )

print(is_BST(root, INT_MIN, INT_MAX))

True


# Get the higth difference of two nodes in a tree

Solution:
    

DFS for both nodes and keep track of depth.  As you find the nodes, update the depth pointers.  Compare at the end.  Run time O(n) to find the nodes with DFS, with constant extra memory, just the two pointers.

In [34]:
class Node:
    
    def __init__(self, val, children = []):
        self.val = val
        self.children = children

def get_depth_difference(root, val1, val2):
    
    depth1 = None
    depth2 = None
    depth = 0
    
    def dfs_rec(node, val1, val2, depth1, depth2, cur_d):
        if not depth1 and not depth2:
            return (depth1, depth2)
        if node.val == val1:
            depth1 = cur_d
        if node.val == val2:
            depth2 = cur_d
        
        for c in node.children:
            dfs_rec(c, val1, val2, depth1, depth2, cur_d+1)
            
    dfs_rec(root, val1, val2, depth1, depth2, 0)
    return abs(depth1 - depth2)


# Design a media player, to which songs can be added, and can play the songs in random order, without repeats

Solution:

We can store the songs in a sequential directory structure, with indices from 0 to n-1. Adding songs requires only appending to sequential directory.  We can play randomly by keeping an array of indices from 0 to n-1 and selecting randomly.  After the first song, we can ensure no repeats by swapping the current item with the item at the end of the index array, and subsiquently sampling from 0 to n-2.

In [37]:
from random import randint

class MediaPlayer:
    
    def __init__(self, songs):
        self.songs = songs
        self.n = len(songs)
        self.indices = list(range(n))
    
    def add_song(self, song):
        self.songs.append(song)
        self.n += 1
        self.indices.append(self.n)
        if n > 1:
            # put newest at second to last to maintain current at index -1
            self.indices[-2], self.indices[-1] = self.indices[-1], self.indices[-2]
    
    def get_song(self, idx):
        # read the song from directory and return it
        return self.song[idx]
    
    def play(self, song):
        # play the song
        print("playing song")
    
    def start(self):
        idx = randint(0,n)
        # put current songs idx at the end
        self.indices[idx], self.indices[-1] = self.indices[-1], self.indices[idx]
        return idx
    
    def next(self):
        idx = randint(0, n-1)
        self.indices[idx], self.indices[-1] = self.indices[-1], self.indices[idx]
        return idx