# Lesson 6: DATA STRUCTURE -- Hashtable
---
In this lesson, we will cover the following parts:
* 6.1: Lecture Note
* 6.2: Leetcode Training (Basic)
* 6.3: Leetcode Practice (Advanced)

(空间换时间)
Both BA/DS: this is a basic data structure commonly used in various applications. In many interview questions, if looking up a certain element in a collection is frequently used, you should consider using hashtable

hash_table (general term)
* one to one mapping
* hash_table is a <key, value> pair, key map to value
* does NOT allow duplicate key
* allow duplicate value
* hash_set is a set {1, 3}, it only contains keys

In Python, we can use set and dictionary to represent hash_set and hash_table.

## 6.1 Lecture Note -- Dictionary and Set

### 6.1.1 Dictionary
![Dictionary](source/lesson8_hashtable_dictionary.png)

A data structure stores pairs of data:
* key
* value
```python
my_dict = {}
grades = {'Ana':'B', 'John':'A+', 'Denise':'A', 'Katy':'A'}
```

Similar to indexing into a list  
looks up the key  
Returns the value associated with the key  
if key isn't found, get an error
```python
grades = {'Ana':'B', 'John':'A+', 'Denise':'A', 'Katy':'A'}
grades['John']  # evaluates to 'A+'
grades['Sylvan'] # gives a KeyError (Python raises a KeyError whenver a dict() object is requrested (using the format a = adict[key]) and the key is not in the dictionary).
```

#### <font color='red'>How to realize a hashtable 哈希表 in the low level?</font>

取决于处理 Hash Collision 的方式：
1. Open Hashing（动态分配空间给哈希表，当别人占了我的坑，我就用链表的方式一起共用一个坑）：数组 + 链表
2. Closed Hashing (已经分配了一片固定大小的内存空间来存储这个哈希表，当别人占了你的坑的时候，你就去占别人的坑)：数组 + linear / quadratic probing

Hash Function：
* 使命：对于任意的 key 得到一个固定且无规律的介于 0~capacity-1 的整数

Rehashing: 当当前所分配的空间被使用了比如 1/10 的时候，就需要再分配一片更大的空间（比如2倍于之前的空间大小）


#### Operations:
* Add an entry
```python
grades['Sylvan'] = 'A'
```
* Check if key in dictionary
```python
'John' in grades  # return True
'Daniel' in grades  # return False
```
* Delete entry
```python
del grades['Ana']
```
* Get all keys as a list
```python
grades.keys()  # return ['Denise', 'Katy', 'John', 'Ana']
```
* Get all values as a list
```python
grades.values  # returns ['A', 'A', 'A+', 'B']
```

Dictionary
* values
  * any type (immutable and mutable)
  * can be duplicates
  * dictionary value can be a list or another dictionary
* keys
  * must be unique
  * ummutable type
* No order to keys or values!
![listvsdict](source/lesson8_hashtable_dictionary2.png)

#### Dictionary Comprehension
Simiar to list comprehension:  
```python
[val for val in collection if condition]
```
change [] to {}
```python
{key_expr: value_expr for value in collection if condition}
```

#### Question 1: Count word occurrences

In [1]:
class Solution(object):
    def count_words_freqs(self, words):
        if not words:
            return {}
        
        my_dict = {}
        
        for word in words:
            my_dict[word] = my_dict.get(word, 0) + 1
            
        return my_dict

# Time Complexity: O(n)
# Space Complexity: O(1)

if __name__ == "__main__":
    soln = Solution()
    freqs = soln.count_words_freqs(words=['how', 'hi', 'me', 'hi', 'how', 'hi', 'you', 'you', 'you'])
    print(freqs)

{'how': 2, 'hi': 3, 'me': 1, 'you': 3}


#### Question 2: Most Common Word
Input: word --> count dictionary

In [2]:
class Solution(object):
    def get_most_common_words(self, freqs):
        if not freqs:
            return ()
        
        values = freqs.values()
        best = max(values)
        most_common_words = [
            key
            for key, val in freqs.items()
            if val == best
        ]
        
        return (most_common_words, best)
    
# Time Complexity: O(n)
# Space Complexity: O(n), since the set values occupies additional memeory

if __name__ == "__main__":
    soln = Solution()
    most_common_words = soln.get_most_common_words(freqs)
    print(most_common_words)

(['hi', 'you'], 3)


#### Question 3: 合并大小写的频次

In [3]:
class Solution(object):
    def merge_upper_lower(self, dic):
        if not dic:
            return {}
        
        my_dict = {}
        
        for key, val in dic.items():
            my_dict[key.lower()] = dic.get(key.lower(), 0) + dic.get(key.upper(), 0)
            
        return my_dict
    
    def merge_upper_lower(self, dic):
        if not dic:
            return {}
        
        my_dict = {
            key.lower(): dic.get(key.lower(), 0) + dic.get(key.upper(), 0)
            for key in dic
        }
            
        return my_dict
    
if __name__ == "__main__":
    soln = Solution()
print(soln.merge_upper_lower(dic={'a': 10, 'b':34, 'A':7, 'Z':3}))

{'a': 17, 'b': 34, 'z': 3}


#### Question 4: Exchange keys and values (no duplicate in values)

```
Input: {'a':10, 'b':34}
Output: {10: 'a', 34: 'b'}
```

In [5]:
class Solution(object):
    def swap_key_val(self, dic):
        if not dic:
            return {}
        
        my_dic = {
            val: key
            for key, val in dic.items()
        }
        
        return my_dic
    
if __name__ == "__main__":
    soln = Solution()
    print(soln.swap_key_val(dic={'a':10, 'b':34}))

{10: 'a', 34: 'b'}


### 6.1.2 Set

Sets are mutable unordered collections of unique elements.  
Sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.  
Sets are implemented using dictionaries. They cannot contain mutable elements such as lists or dictionaries. However, they can contain immutable collections.
[Set Reference](https://python-reference.readthedocs.io/en/latest/sets)

#### Set Operations
* Add one element
```python
x = {'a', 'b', 'c', 'd'}
x.add('e')
```
* Add multiple elements
```python
x = {'a', 'b', 'c', 'd'}
x.update({'e', 'f'})
```
* Delete an element from the set. If the element doesn't exist, raise a KeyError
```python
x = {'a', 'b', 'c', 'd'}
x.remove('a')
```
* Delete an element from the set.
```python
x = {'a', 'b', 'c', 'd'}
x.discard('a')
```
* Delete one random element from the set. If the set is empty, raise a KeyError
```python
x = {'a', 'b', 'c', 'd'}
x.pop()
```
* Clear elements from set
```python
x = {'a', 'b', 'c', 'd'}
x.clear()
```
* Union two sets
```python
x = set(['a', 'b', 'c', 'd'])
y = set(['e', 'f'])
x.union(y)
```
* Difference between two sets. To find the difference between two sets, use set1.difference(set2). Be aware that x.difference(y) is different from y.difference(x).
```python
x = set(['Postcard', 'Radio', 'Telegram'])
y = set(['Radio', 'Television'])
x.difference(y)  # {'Postcard', 'Telegram'}
y.difference(x)  # {'Television'}
```
* Subset. To test if a set is a subset, use:
```python
x = set(['a', 'b', 'c', 'd'])
y = set(['c', 'd'])
x.issubset(y)  # False
y.issubset(x)  # True
```
* Super-set. To test if a set is a super-set, use:
```python
x = set(['a', 'b', 'c', 'd'])
y = set(['c', 'd'])
x.issuperset(y)  # True
y.issuperset(x)  # False
```
* Intersection. To test for intersection, use:
```python
x = set(['a', 'b', 'c', 'd'])
y = set(['c', 'd'])
x.intersection(y)  # {'c', 'd'}
y.intersection(x)  # {'c', 'd'}
```

**Operators for sets**  
Sets and common sets supports the following operators:
* key in set1      # containment check
* key not in set1  # non-containment check
* set1 == set2     # set1 is equivalent to set2
* set1 != set2     # set1 is not equivalent to set2
* set1 <= set2     # set1 is subset of set2
* set1 < set2      # set1 is proper subset of set2
* set1 >= set2     # set1 is superset of set2
* set1 > set2      # set1 is proper superset of set2
* set1 | set2      # the union of set1 and set2
* set1 & set2      # the intersection of set1 and set2
* set1 - set2      # the set of elements in set1 but not in set2
* set1 ^ set2      # the set of elements in precisely one of set1 or set2
* x in my_set      # membership

#### Set Comprehension
Simiar to list comprehension:  
```python
[val for val in collection if condition]
```
change [] to {}
```python
{expr for value in collection if condition}
```

For example:  
```python
squared = {x**2 for x in [1, 1, 2]}
print(squared)
# {1, 4}
```

#### Question 5: Two Sums.
Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice.
```
A = [1, 2, 0, 5, 8, 10], target = 15
idx  0  1  2  3  4  5
Return (3, 5)
```

In [6]:
class Solution(object):
    def two_sum(self, nums, target):
        if not nums:
            return ()
        
        my_dict = dict()
        for idx in range(len(nums)):
            num = nums[idx]
            if target - num in my_dict:
                return (my_dict[target-num], idx)
            else:
                my_dict[num] = idx
                
        return ()

if __name__ == "__main__":
    soln = Solution()
    print(soln.two_sum(nums=[1, 2, 0, 5, 8, 10], target=15))
    print(soln.two_sum(nums=[1, 2, 0, 5, 8, 10], target=14))

(3, 5)
()


#### Question 5\* (Follow up): Two Sums with sorted array.

In [7]:
class Solution(object):
    def two_sum(self, nums, target):
        if not nums:
            return ()
        
        left, right = 0, len(nums) - 1
        while left < right:
            if nums[left] + nums[right] < target:
                left += 1
            elif nums[left] + nums[right] > target:
                right -= 1
            else:
                return (left, right)
            
        return ()
    
if __name__ == "__main__":
    soln = Solution()
    print(soln.two_sum(nums=[1, 2, 0, 5, 8, 10], target=15))
    print(soln.two_sum(nums=[1, 2, 0, 5, 8, 10], target=14))

(3, 5)
()


#### Question 5\*\* (Follow Up): 3 Sum

Given an unsorted integer array, check if there are three numbers have sum of a target 

input: array, target

Example
```
array = [1, 4, 7, 10, 13], target 18
```

<font color='blue'>Solution 1: </font>  unsorted, boil down to unsorted 2 sum problem.
```
array = [1, 4, 7, 10, 13]
         i
```
fix i, then the sub-problem becomes a unsorted 2 sum problem.

Time $O(n^2)$, Space $O(n)$

<font color='blue'>Solution 2: </font>  sort first, boil down to sorted 2 sum problem.
```
array = [1, 4, 7, 10, 13]
         i
```
Why to sort first? Time to sort $O(n\log n)$ < $O(n^2)$, it will not influence the time complexity to the whole problem

Time $O(n^2)$, Space $O(1)$

In [13]:
class Solution(object):
    def three_sum(self, nums, target):
        if not nums:
            return False
        
        nums.sort()
        
        for index in range(0, len(nums)-2):
            left, right = index + 1, len(nums) - 1
            while left < right:
                temp_sum = nums[index] + nums[left] + nums[right]
                # print(temp_sum, nums[index], nums[left], nums[right])
                if temp_sum < target:
                    left += 1
                elif temp_sum > target:
                    right -= 1
                else:
                    return True

            return False
        
if __name__ == "__main__":
    soln = Solution()
    print(soln.three_sum(nums=[1, 2, 0, 5, 8, 10], target=13))
    print(soln.three_sum(nums=[1, 4, 7, 10, 13], target=18))

True
True


#### Question 5\*\*\* (Follow Up): 4 Sum

Given an unsorted integer array, check if there are four numbers have sum of a target.

input: array, target

Example
```
array = [1, 4, 7, 10, 13], target 22
```

<font color='blue'>Solution 1: </font>  unsorted, boil down to unsorted 3 sum problem.  
```
array = [1, 4, 7, 10, 13]
         i
```
fix i, then the sub-problem becomes a unsorted 3 sum problem.
```python
for i in range(0, len(arr) - 3):
    for j in range(i + 1,  len(arr) - 2):
        # start 2 sum
```
Time $O(n^3)$, Space $O(n)$

<font color='blue'>Solution 2: </font>  unsorted, boil down to unsorted 2 sum problem.  

问题转换成：如何找到 pair of two sum，和是 target
1. 先配对
2. sort 一遍 或者 hashset (two sum 方法）

```
[1,  4,  7,  10, 13]

step 1: find all pairs 
[ 1 + 4, 1 + 7, 1 + 10, 1 + 13,
  4 + 7, 4 +10, 4 + 13,
  7 +10, 7 +13,
  10+13
]

step 2: run two sum on the array of pairs
```

需要注意（potential caveat）：重复使用了一样的数字， pair of 1+4， 1+7，如何发现重复使用的数字？

```python
class TwoSumPair(object):
    # numbe1, number2
    # sum
    # index1, index2
    def __init__(self, index1, index2):
        self.index1 = index1
        self.index2 = index2
        self.number1 = array[index1]
        self.number2 = array[index2]
        self.sum = self.number1 + self.number2
        
pair_list = [TwoSumPair(0,1), TwoSumPair(0,2), TwoSumPair(0,3), TwoSumPair(0,4), ...,
                              TwoSumPair(1,2), TwoSumPair(1,3), TwoSumPair(1,4), ...,
                                               TwoSumPair(2,3), TwoSumPair(2,4), ...
            ]
```
1. Step 1: Create 配对数组 TwoSumPair $O(n^2)$
2. Step 2.1: Run two sum on 配对数组 $O(n^2)$，配对数组size就是$n^2$，原因：$C(n,2)$ ~ $n^2$
3. Step 2.2: Check if duplicate index $O(1)$  
  e.g. pair1.index1 != pair2.index1 && pair1.index1 ！= pair2.index2 && pair1.index2 != pair2.index1 && pair1.index2 != pair2.index2
  
Time $O(n^2)$, Space $O(n^2)$

#### Question 5\*: 2 Difference

Example
```
[1,  4,  6,  10, 13], target, check if any two numbers diff == target
```

Note that:
* |1 - 4| = |4 - 1|

<font color='blue'>Solution 1: </font> assume sorted, slow and fast pointers 同向而行  
```
[1,  4,  6,  10, 13], diff_target = 4
         i
              j
```

Time $O(n)$, Space $O(1)$

<font color='blue'>Solution 2: </font> unsorted, hashtable
* step 1: create HashSet
* step 2: loop array, check array[i] - target or array[i] + target in the set:
  * If yes, return true
  * If no,, add array[i] into the HashSet

Time $O(n)$, Space $O(n)$

In [16]:
class Solution(object):
    def two_difference_1(self, array, target):
        if not array:
            return (-1, -1)
        
        slow = 0
        fast = 1
        while fast < len(array):
            if array[fast] - array[slow] < target:
                fast += 1
            elif array[fast] - array[slow] > target:
                slow += 1
            else:
                return (slow, fast)
            
        return (-1, -1)
    
    def two_difference_2(self, array, target):
        if not array:
            return (-1, -1)
        
        hashtable = {}
        for index, num in enumerate(array):
            if (num - target) in hashtable:
                return (hashtable[num - target], index)
            elif (num + target) in hashtable:
                return (index, hashtable[num + target])
            else:
                hashtable[num] = index
                
        return (-1, -1)
    
if __name__ == "__main__":
    soln = Solution()
    
    print(soln.two_difference_1(array=[1, 4, 6, 10, 13], target=4))
    print(soln.two_difference_2(array=[1, 4, 6, 10, 13], target=4))

(2, 3)
(2, 3)


#### Question 6: Continuous Subarrays
Given an array of integers and an integer k, you need to find the total number of continuous subarrays whose sum equals to k.

Example:
```
nums = [   1, 6,  5,  2,  3, 4,   0], k = 7, 
return 4
```

*Solution*:
```
Explanation:
nums =       [   1, 6,  5,  2,  3, 4,   0], k = 7, 
prefix_sum = [0, 1, 7, 12, 14, 17, 21, 21]
                           i        j
prefix_sum[j] - prefix_sum[i] == target, which is equivalent to
prefix_sum[j] - target == prefix_sum[i]
save prefix_sum[i] to a dictionary and then check if (prefix_sum[j]-target) in dict
```

In [17]:
class Solution(object):
    def subarray_sum(self, nums, target):
        if not nums:
            return 0
        
        count = 0
        sums = {}
        sums[0] = 1
        cur_sum = 0
        
        for idx, num in enumerate(nums):
            cur_sum = cur_sum + num
            # print(cur_sum, cur_sum-target, sums.keys(), cur_sum-target in sums)
            if cur_sum - target in sums:
                count += sums[cur_sum-target]
            sums[cur_sum] = sums.get(cur_sum, 0) + 1
            
        return count
    
# Time Complexity: O(n)
# Space Complexity: O(n)

if __name__ == "__main__":
    soln = Solution()
    print(soln.subarray_sum(nums=[1, 6, 5, 2, 3, 4, 0], target=7))
    print(soln.subarray_sum(nums=[1, 6, 0, 5, 2, 3, 4, 0], target=7))

4
6


* Given an element x in a list, you need to spend $O(n)$ to find the location of x.
* Given a key x in a hashtable, you need to spend $O(1)$ to find the location of x.

**Hash Collision**  
Given two keys, x!=y, if hash(x) == hash(y) ==> collision

Handle Hash Collision:
1. open adddresing
2. Separate chaining

[Hash Table Reference](https://en.wikipedia.org/wiki/Hash_table)

#### Question 7: Top K Frequent Words
For a compisition with different kinds of words, try to find the top-k frequent words from the composition.

Example:
```
['A','A','A','B','B','C', 'D', 'D', 'D', 'D'], k = 2 
==> {'A':3, 'B':2, 'C':1, 'D':3} ==> ['A', 'D']
```


*Solution 1:*  
dictionary + heap  
{key = word_i: value = counter for word_i}  
For example
```
{'A':3, 'B':2, 'C':1, 'D':3}
heap (-3, 'A') (-3, 'D') (-2, 'B') (-1, 'C')
```

*Solution 2: *  
1. Step 1: read the composition, and count the frequency of each word by using the hash_table.
2. Step 2: build a min-heap for the first k elements of the hash_table.
3. Step 3: iterate over each element from the k+1 -th to the n-th element, and update the min-heap as follows:
  * if the i-th element > min_heap.top(), the remove top and insert i-th element.
  * otherwise, do nothing.

For example
```
{'A':3, 'B':2, 'C':1, 'D':3}, k = 2
heap (1, 'C') (2, 'B') (3, 'A') (3, 'D')
```
Time Complexity: $O(k + (n-k)\log{k})$

In [18]:
import heapq

class Solution(object):
    def get_top_k_words_1(self, words, k):
        if not words:
            return []
        
        freqs = {}
        freqs_heap = []
        top_list = []
        
        # from list to dictionary
        for word in words:
            freqs[word] = freqs.get(word, 0) + 1
        
        # push to heap
        for key, val in freqs.items():
            heapq.heappush(freqs_heap, (-val, key))
            
        # top k
        for idx in range(0, k):
            top_list.append(heapq.heappop(freqs_heap)[1])
            
        return top_list
    
    # Time Complexity: O(n + n + klogn)
    # O(n) for creating the dictionary
    # O(n) for heap.heappush
    # O(klogn) for heap.heappop. logn is the height for the heap tree
    
    def get_top_k_words_2(self, words, k):
        if not words:
            return []
        
        freqs = {}
        freqs_heap = []
        top_list = []
        
        # from list to dictionary
        for word in words:
            freqs[word] = freqs.get(word, 0) + 1
       
        for key, val in freqs.items():
            # push to heap
            heapq.heappush(freqs_heap, (val, key))
            # pop
            if len(freqs_heap) > k:
                heapq.heappop(freqs_heap)
                            
        # top k
        for idx in range(0, k):
            top_list.append(heapq.heappop(freqs_heap)[1])
            
        return top_list
    
    # Time Complexity: O(n + k + (n-k)logk)
    # O(n) for creating the dictionary
    # O(k) for heap.heappush
    # O((n-k)logk) for heap.heappop.  logk is the height for the heap tree
    
if __name__ == "__main__":
    soln = Solution()
    print(soln.get_top_k_words_1(words=['A','A','A','B','B','C', 'D', 'D', 'D', 'D'], k=2))
    print(soln.get_top_k_words_2(words=['A','A','A','B','B','C', 'D', 'D', 'D', 'D'], k=2))

['D', 'A']
['A', 'D']


#### Question 8: Palindromic Permutations
Given a string, determine if a permutation of the string could form a palindrome.  
For example, "edified" can be permuted to form "deified".

A palindrome is a word that reads the same forwards and backwards, e.g. 'level, 'rotator'.


Example 1:
```
Input: "code"
Output: false
```
Example 2:
```
Input: "aab"
Output: true
```
Example 3:
```
Input: "carerac"
Output: true
```

*Solution*: 

最多只有一个奇数个数

In [19]:
class Solution(object):
    def is_palindrome(self, word):
        if not word:
            return True
        
        freqs = {}
        
        for char in word:
            freqs[char] = freqs.get(char, 0) + 1
            
        odd_count = 0
        for key, val in freqs.items():
            if val % 2 == 1:
                odd_count += 1
                if odd_count > 1:
                    return False
                
        return True
    
if __name__ == "__main__":
    soln = Solution()
    print(soln.is_palindrome(word="code"))
    print(soln.is_palindrome(word="aab"))
    print(soln.is_palindrome(word="carerac"))

False
True
True


#### Question 9: Find the length of longest subarray with distinct entries.

Example:
```
Input: arr = [f, s, f, e, t, w, e, n, w, e]
Output: the longest distinct subarray is [s, f, e, t, w]
```

*Solution 1*: Brute-Force  
For each subarray, test if all its elements are distinct using a hash table. 

How many subarrays? $O(n^2)$, and average length of subarray is $O(n)$, total time is $O(n^3)$.

*Solution 2*: 中心开花，左右出发 $O(n^2)$

*Solution 3*:  
1.[f, _s, f, e, t_, w, e, n, w, e]  
2.[f, _s, f, e, t, w_, e, n, w, e]  
假设我们知道最长无重复子串(longest deplicate-free subarray)在index k处结束，那么在k+1处的最长无重复子串是：  
1) 在k处的子串加上k+1处的元素，如果k+1位置的元素没出现在以k为结束位置的最长重复子串  
2) 从以k位置结束的子串中与k+1位置元素相同元素的后一个位置开始到k+1位置结束的子串，如果k+1位置的元素出现在以k为结束位置的子串中。

Running Example:
```
arr = [f, s, f, e, t, w, e, n, w, e]
idx    0  1  2  3  4  5  6  7  8  9
start = 0, k = 2, maxlen = 0    : f --> 0, s --> 1
start = 1, k = 2, maxlen = 2    : f --> 2, s --> 1
start = 1, k = 6, maxlen = 6-1=5: f --> 2, s --> 1, e --> 3, t --> 4, w --> 5
start = 4, k = 6, maxlen = 6-1=5: f --> 2, s --> 1, e --> 6, t --> 4, w --> 5
start = 4, k = 8, maxlen = 6-1=5: f --> 2, s --> 1, e --> 6, t --> 4, w --> 5, n --> 7
start = 6, k = 8, maxlen = 6-1=5: f --> 2, s --> 1, e --> 6, t --> 4, w --> 8, n --> 7
start = 7, k = 9, maxlen = 6-1=5: f --> 2, s --> 1, e --> 9, t --> 4, w --> 8, n --> 7

Corner case: maxlen = max(maxlen, len(arr)-start) = max(5,3) = 5
```

In [3]:
class Solution(object):
    def longest_subarray(self, arr):
        recent_occur = {}
        start = 0  # or left = 0
        max_len = 0

        for index, element in enumerate(arr):
            if element in recent_occur:
                if recent_occur[element] >= start:
                    max_len = max(max_len, index - start)
                    start = recent_occur[element] + 1
            recent_occur[element] = index

        max_len = max(max_len, len(arr) - start)
        return max_len

    # Time Complexity: O(n)
    # Space Complexity: O(d)  d -- # distinct elements in the array
    
if __name__ == "__main__":
    soln = Solution()
    print(soln.longest_subarray(arr=['f', 's', 'f', 'e', 't', 'w', 'e', 'n', 'w', 'e']))

5


#### Question 10: Given a nested dictionary as follows:

```python
data = {'one': {'label': 'This is shot 001', 'start': 1, 'end': 10},
        'two': {'label': 'This is shot 002', 'start': 11, 'end': 25},
        'three': 'This is shot 003'
       }
```
print the unfolded version of each element.  
E.g. 
```python
'one->label->This is shot 001', 'one->start->1', 'one->end->10',
'two->label->This is shot 002', 'two->start->11', 'two->end->25', ...
```


<font color="blue">*Solution*:</font>      
Actually, this can be viewed as a tree structure 
```
              root (data)
          /        |        \
        one        two      three
    /    |    \
 label  start  end
   |     |      |
 (str)  (1)    (10)  <----- they are not dictionary
```
The unfolded version of each element is a Three pre-order traversal of the tree.

Time Complexity: $O(n)$ 每个节点遍历一遍  
Space Complexity: $O(h)$ where h is the height of the tree

In [5]:
class Solution(object):
    def nest_dict(self, data):
        # Base Case: the type of data is not a dict
        if type(data) is not dict:
            return [str(data)]  # why put string in a list[]?

        # Recursion Part
        result = []
        for key, val in data.items():
            # what to get from your children
            children_data = self.nest_dict(val)
            # what to do in the current stage
            for element in children_data:
                result.append(str(key) + '->' + element)

        # what to return to your parent
        return result
    
if __name__ == "__main__":
    soln = Solution()
    data = {'one': {'label': 'This is shot 001', 'start': 1, 'end': 10},
        'two': {'label': 'This is shot 002', 'start': 11, 'end': 25},
        'three': 'This is shot 003'
       }
    print(soln.nest_dict(data))

['one->label->This is shot 001', 'one->start->1', 'one->end->10', 'two->label->This is shot 002', 'two->start->11', 'two->end->25', 'three->This is shot 003']


#### Question 11: Find the nearest repeated entries in an array.

Write code that takes as input an array and finds the distance between a closest pair of equal entries.  
E.g.
```python
arr = ['All','work','and','no','play','makes','for','no','work','no','fun', 'and','no', 'results']
                                                     i           j    
```

In [9]:
# Dictionary: word -> index of last appearance

class Solution(object):
    
    def nearest_repeat(self, arr):
        counter = dict()
        minimum = len(arr)

        for i, element in enumerate(arr):
            if element in counter:
                minimum = min(minimum, i - counter[element])
            counter[element] = i

        return minimum

# Time Complexity: O(n),  n -- length of the arr
# Space Complexity: O(d), d -- #distinct entries in arr
    
if __name__ == "__main__":
    soln = Solution()
    arr = ['All','work','and','no','play','makes','for','no','work','no','fun', 'and','no', 'results']
    print(soln.nearest_repeat(arr))

2


#### Question 12: Find the length of a longest contained range.
Given an integer array, find the size of a largest subset of integers in the array having the property that if two integers are in the subset, then so are all integers between them.

Example
```python
arr = [3, -2, 7, 9, 8, 1, 2, 0, -1, 5, 8]
the biggest contained range is [-2,-1,0,1,2,3], return 6
```

<font color="blue">*Solution 1*: Brute Force Time Complexity $O(n\log n)$</font>     
1. sort the array, $O(n\log n)$, 
2. search from each position, $O(n)$

<font color="blue">*Solution 2*: 中心开花， 左右出发 Time Complexity $O(n)$</font>
1. convert the list to a set named sets, $O(n)$
2. iterate from the first element to the last one in the set, $O(n)$
  * pick up the element by sets.pop(), e.g., 3 中心开花
    * search for the left -- whether 3-1=2 in the set, Yes, sets.remove(2)
      * search for the left again -- whether 1  in the set, Yes, sets.remove(1)
      * search for the left again -- whether 0  in the set, Yes, sets.remove(0)
      * search for the left again -- whether -1 in the set, Yes, sets.remove(-1)
      * search for the left again -- whether -2  in the set, Yes, sets.remove(-2)
      * search for the left again -- whether -3  in the set, No, Stop
    * search for the right -- whether 3+1=4 in the set, No, stop
  * pick up the next element by sets.pop()
  * ......

In [11]:
class Solution(object):
    def longest_contained_range(self, arr):
        sets = set(arr)
        max_len = 0

        # 中心开花 遍历set
        while sets:
            elem = sets.pop()

            # 左边出发
            lower = elem - 1        
            while lower in sets:
                sets.remove(lower)
                lower = lower - 1

            # 右边出发
            upper = elem + 1
            while upper in sets:
                sets.remove(upper)
                upper = upper + 1

            max_len = max(max_len, upper - lower - 1)

        return max_len    

# Time Complexity: O(n)
# Space Complexity: O(d), d -- #distinct elements in array

if __name__ == "__main__":
    soln = Solution()
    arr = [3, -2, 7, 9, 8, 1, 2, 0, -1, 5, 8]
    print(soln.longest_contained_range(arr))

6


#### Question 13: [Laicode 68 Medium] [Missing Number I](https://app.laicode.io/app/problem/68)
Given an integer array of size N - 1, containing all the numbers from 1 to N except one, find the missing number.

Assumptions
* The given array is not null, and N >= 1

Examples
```
A = {2, 1, 4}, the missing number is 3
A = {1, 2, 3}, the missing number is 4
A = {}, the missing number is 1
```

In [12]:
class Solution(object):
    def missing(self, array):
        """
        input: int[] array
        return: int
        Time Complexity: O(n)
        Space Complexity: O(n)
        """
        # write your solution here
        if not array:
            return 1
        
        sets = set(array)
        N = len(array) + 1
        
        for item in range(1, N + 1):
            if item not in sets:
                return item
            
        return -1

    
    def missing_2(self, array):
        """
        input: int[] array
        return: int
        Time Complexity: O(n)
        Space Complexity: O(1)
        """
        # write your solution here
        if not array:
            return 1
        
        N = len(array) + 1
        
        sum_expected = (N * ( 1 + N)) // 2
        sum_actual = sum(array)
        diff = sum_expected - sum_actual
        
        return diff
    
if __name__ == "__main__":
    soln = Solution()
    print(soln.missing(array={2, 1, 4}))
    print(soln.missing(array={1, 2, 3}))
    print(soln.missing(array={}))

    print(soln.missing_2(array={2, 1, 4}))
    print(soln.missing_2(array={1, 2, 3}))
    print(soln.missing_2(array={}))

3
4
1
3
4
1


## 6.2: Leetcode Training (Basic)

[Leetcode 0001 Easy] [Two Sum](Leetcode_0001.ipynb) (HashTable)

[Leetcode 0003 Medium] [Longest Substring Without Repeating Characters](Leetcode_0003.ipynb) (HashTable)

## 6.3: Leetcode Practice (Advanced)