#### [Leetcode 480 Hard] [Sliding Window Median](https://leetcode.com/problems/sliding-window-median/)

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Examples: 
```
[2,3,4] , the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position. Your job is to output the median array for each window in the original array.

For example,
Given nums = [1,3,-1,-3,5,3,6,7], and k = 3.

Window position                Median
---------------               -----
[1  3  -1] -3  5  3  6  7       1
 1 [3  -1  -3] 5  3  6  7       -1
 1  3 [-1  -3  5] 3  6  7       -1
 1  3  -1 [-3  5  3] 6  7       3
 1  3  -1  -3 [5  3  6] 7       5
 1  3  -1  -3  5 [3  6  7]      6
Therefore, return the median sliding window as [1,-1,-1,3,5,6].
```

Note: 
* You may assume k is always valid, ie: k is always smaller than input array's size for non-empty array.

<img src="source/Leetcode_480_0.png">

<font color='blue'>Solution: </font> Brute Force  
* Time Complexity: O((n-k+1) klogk)
* Space Complexity: O(k + logk)

In [18]:
class Solution(object):
    def medianSlidingWindow(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[float]
        """
        results = []
        
        for index in range(0, len(nums) - k + 1):
            window = nums[index : index + k]
            window.sort()
            #print(window)
            if k % 2 == 1:
                median = window[k // 2]
            else:
                median = (window[k // 2 - 1] + window[k // 2]) / 2
            results.append(median)
            
        return results

In [19]:
soln = Solution()
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=3))
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=4))

[1, -1, -1, 3, 5, 6]
[0.0, 1.0, 1.0, 4.0, 5.5]


<font color='blue'>Solution: </font> Insertion Sort  
* Time Complexity: O((n-k+1) k)
* Space Complexity: O(k)

<img src="source/Leetcode_480_1.png">

In [14]:
import bisect

class Solution:
    def medianSlidingWindow(self, nums, k):
        if k == 0: return []
        ans = []
        window = sorted(nums[0:k])
        for i in range(k, len(nums) + 1):
            ans.append((window[k // 2] + window[(k - 1) // 2]) / 2.0)
            if i == len(nums): 
                break
            index = bisect.bisect_left(window, nums[i - k]) # O(logk)
            window.pop(index)  # O(k)
            bisect.insort_left(window, nums[i]) # O(k)
        return ans

In [15]:
soln = Solution()
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=3))
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=4))

[1.0, -1.0, -1.0, 3.0, 5.0, 6.0]
[0.0, 1.0, 1.0, 4.0, 5.5]


<font color='blue'>Solution: </font> [Max and Min Heap](https://leetcode.com/problems/sliding-window-median/discuss/262689/Python-Small-and-Large-Heaps)    
* Time Complexity: O(n logk)
* Space Complexity: O(logk)

To calculate the median, we can maintain divide array into subarray equally: small and large. All elements in small are no larger than any element in large. So median would be (largest in small + smallest in large) / 2 if small's size = large's size. If large's size = small's size + 1, mediam is smallest in large.

Thus, we can use heap here to maintain small(max heap) and large(min heap) so we can fetch smallest and largest element in logarithmic time.

We can also maintain "large's size - small's size <= 1" and "smallest in large >= largest in small" by heap's property: once large's size - small's size > 1, we pop one element from large and add it to small. And vice versa when small's size > large's size.

Besides, since its a sliding window median, we need to keep track of window ends. So we will also push element's index to the heap. So each element takes a form of (val, index). Since Python's heapq is a min heap, so we convert small to a max heap by pushing (-val, index).

Intially for first k elements, we push them all into small and then pop k/2 element from small and add them to large.
Then we can intialize our answer array as [large[0][0] if k & 1 else (large[0][0]-small[0][0])/2] as we discussed above.

Then for rest iterations, each time we add a new element x whose index is i+k, and remove an old element nums[i] which is out of window scope. Then we calculate our median in current window as the same way before.
If right end's x is no smaller than large[0], then it belongs to large heap. If left end's nums[i] is no larger than large[0], then it belongs to small heap. So we will add one to large while remove one from small and heaps' sizes will be unbalanced. So we will move large[0] to small to rebalance two heaps.
Vice versa when we have to add one to small while remove one from large.

But we don't have to hurry and remove element in each iteration. As long as nums[i] is neither small[0] nor large[0], it has no effect to median calculation. So we wait later and use a while loop to remove those out-of-window small[0] or large[0] at one time. This also make whole logic clearer.

In [25]:
import heapq

class Solution(object):
    def medianSlidingWindow(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[float]
        """
        small, large, ans = [], [], []
        
        for i, n in enumerate(nums[:k]): 
            heapq.heappush(small, (-n,i))
        for _ in range(k-(k>>1)): 
            self.move(small, large)
        
        ans = [large[0][0] * 1. if k & 1 else (large[0][0]-small[0][0]) / 2.]
        for i, n in enumerate(nums[k:]):
            if n >= large[0][0]:
                heapq.heappush(large, (n, i+k))
                if nums[i] <= large[0][0]: 
                    self.move(large, small)
            else:
                heapq.heappush(small, (-n, i+k))
                if nums[i] >= large[0][0]: 
                    self.move(small, large)
            while small and small[0][1] <= i: 
                heapq.heappop(small)
            while large and large[0][1] <= i: 
                heapq.heappop(large)
            ans.append(large[0][0] * 1. if k & 1 else (large[0][0]-small[0][0]) / 2.)
        return ans

    def move(self, h1, h2):
        x, i = heapq.heappop(h1)
        heapq.heappush(h2, (-x, i))

In [26]:
soln = Solution()
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=3))
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=4))

[1.0, -1.0, -1.0, 3.0, 5.0, 6.0]
[0.0, 1.0, 1.0, 4.0, 5.5]


<font color='blue'>Solution: </font> BST  
* Time Complexity: O((n-k+1) logk)
* Space Complexity: O(k)

<img src="source/Leetcode_480_2.png">

The solution is to store the K elements in the current window in a BST. The BST node has other two attributes: dup store the number of duplicates, left_count store the number of elements in the current node's left tree.

Three methods of the BST class are implemented. insert inserts a new value to the tree, delete delete a value from the tree, find_kth find the kith smallest value in the tree. Each time the window moves, we deleted the old value, insert the new one, and using find_kth to find the median. All the above three methods are O(logk), so the time complexity is O(Nlogk)

In [29]:
class TreeNode(object):
    def __init__(self, val):
        self.val = val
        self.dup = 1
        self.left_count = 0
        self.left = None
        self.right = None
    
    def __repr__(self):
        return "val: %r dup: %r small: %r" %(self.val, self.dup, self.left_count)
        

class BST(object):
    def __init__(self):
        self.root = None
        self.count = 0
    
    def insert(self, num):
        #Insert number num into the BST
        self.count+=1
        
        if self.root==None:
            self.root = TreeNode(num)
            return
        
        prev, cur = None, self.root
        while cur is not None:
            prev = cur
            if cur.val == num:
                cur.dup+=1
                return
            elif cur.val < num:
                cur = cur.right
            else:
                #update the left_count
                cur.left_count+=1
                cur = cur.left
        if prev.val > num:
            prev.left = TreeNode(num)
        elif prev.val < num:
            prev.right = TreeNode(num)
        return
    
    def delete(self, num):
        #Delete number num from the BST, here it's guaranteed to find the number num
        self.count-=1
        
        prev, cur = None, self.root
        prev = TreeNode(-1000000)
        prev.right = self.root
        right_dir = True
        while True:
            if cur.val == num:
                break
            elif cur.val > num:
                cur.left_count-=1
                prev, cur = cur, cur.left
                right_dir = False
            else: #cur.val < num
                prev, cur = cur, cur.right
                right_dir = True
        
        #Find the node to delete, next step to delete the node
        if cur.dup>1:
            cur.dup-=1
            return
        
        #have to delete a node
        #node have no children
        if cur.left is None and cur.right is None:
            if cur==self.root: self.root=None
            elif right_dir: prev.right = None
            else: prev.left = None
            return
        #node have one childeren
        if cur.left is None:
            if cur == self.root: self.root = cur.right
            elif right_dir: prev.right = cur.right
            else: prev.left = cur.right
            return
        if cur.right is None:
            if cur == self.root: self.root = cur.left
            elif right_dir: prev.right = cur.left
            else: prev.left = cur.left
            return
        
        #node have two children
        next_prev, next_cur = cur, cur.right
        while next_cur.left is not None:
            next_prev, next_cur = next_cur, next_cur.left
        
        if next_cur == cur.right:
            cur.val = next_cur.val
            cur.dup = next_cur.dup
            cur.right = next_cur.right
            return
            
        elif next_cur != cur.right:
            next_prev.left = next_cur.right
            cur.val = next_cur.val
            cur.dup = next_cur.dup
            dup_to_del = cur.dup
            # next_prev.left = None
            p = cur.right
            while p is not None and p.val>=next_prev.val:
                p.left_count -= dup_to_del
                p = p.left
        return
    
    def find_median(self):
        k = (self.count+1)//2
        if self.count%2==1:
            return self.find_kth(k)+0.0
        else:
            return (0.+self.find_kth(k)+self.find_kth(k+1))/2
        
    def find_kth(self, k):
        #find the kth smallest element in the tree
        return self.find_kth_help(k, self.root)
        
    def find_kth_help(self, k, node):
        if node.left_count < k <= node.left_count+node.dup:
            return node.val
        if k <= node.left_count:
            return self.find_kth_help(k, node.left)
        else:
            return self.find_kth_help(k-node.left_count-node.dup, node.right)


class Solution(object):
    def medianSlidingWindow(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[float]
        """
        if k==0:
            return []
        if k==1:
            return [i+0. for i in nums]
        
        tree = BST()
        for i in range(k):
            tree.insert(nums[i])
        
        result = []
        for i in range(k, len(nums)):
            result.append(tree.find_median())
            tree.delete(nums[i-k])
            tree.insert(nums[i])
        result.append(tree.find_median())
        # print result
        return result

In [30]:
soln = Solution()
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=3))
print(soln.medianSlidingWindow(nums=[1,3,-1,-3,5,3,6,7], k=4))

[1.0, -1.0, -1.0, 3.0, 5.0, 6.0]
[0.0, 1.0, 1.0, 4.0, 5.5]


#### Follow up: 

求一个滑动窗口内的均值个人觉得这道题是刷题网 肆扒零的变种。只不过把中位数换成了去掉最大最小10%的平均值
输入是一个数组array 还有一个整数k。10<=k<=array.length. k代表滑动窗口的宽度。需要求每个滑动滑动窗口内的 去掉前10%大和前10%小的数之后剩下数的平均值 并把平均值放在一个数组里返回。
解法也是可以用min-heap/max-heap或者BST做。然而后者才是最优解
由于python没有treeMap这个东西，一开始果断选了第一种搞法写了。。。然后面试官直接就说BST才是最优解
最后在提示下说出了bst思路就到时间

====  
给一个list,维持一个k长的sublist，每次移动一位，然后求这个sublist去掉最大的10%数字，去掉最小的10%的数字之后的平均值，每次移动一位求一次平均数。。。。最佳算法用树 (collections.OrderedDict)

我想到的办法就是保存3个Treeset, 前10%，中间，和后10%。然后每次移动窗口的时候左边删除和右边添加都能LogK复杂度。所以总复杂度是NLogK？