### Median Maintenance
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 3 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.)

In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.

OPTIONAL EXERCISE: Compare the performance achieved by heap-based and search-tree-based implementations of the algorithm.

In [1]:
import sys
sys.path.insert(1, '/Users/gauravbakale/projects/practice/Algorithms')

from common.heaps import HeapMax, HeapMin

In [None]:
# QC: heap min
h = HeapMin()

# insert
items = list(range(0,32))
for i in items:
    h.insert(i) 
h.display()

# insert
h.insert(100)
h.insert(-100)
h.display()

# extract min 
print(h.extract_min())
h.display()
print(h.extract_min())
h.display()

# delete 
h.delete(20)
h.display()

# assert heap property
h.assert_heap_property()

In [None]:
# QC: heap max
h = HeapMax()

# insert
items = list(range(0,32))
for i in items:
    h.insert(i)
h.display()

# insert
h.insert(-100)
h.insert(100)
h.display()

# extract min 
print(h.extract_max())
h.display()
print(h.extract_max())
h.display()

# delete 
h.delete(20)
h.display()

# assert heap property
h.assert_heap_property()

In [2]:
# median maintanance function

class MedianMaintain:
    """
        Fetch the median at any point for given i values. Adding a new value should give us the median with only O(logn).

        Let the left heap be HeapMax and the right heap(hr) be a HeapMin. 
        Now, we need to maintain the invariant that at any given point hl has half the data and hr has half the data.
        How do we do that?
            - assuming that there are 3 elements in hl and 3 elements in hr, if new element i has to be added then it should 
            be added in hr if it is greater than the root node at hl. This leaves us with 3 elements in hl and 4 elements in hr. 
            - If the new element j has to be added and is again greater than the root node at hl and we add it to hr then the
            the there will be 3 elemnts in hl and 5 elements in hr. This will break the invariant. Therefore, after we insert 
            the j to hr we need to get the root node from hr and push it to hl. This will still maintain the invariant that
            each heap has ~n/2 elements and the median can be retrived by the root nodes.

    """
    def __init__(self):
        self.hl = HeapMax()
        self.hr = HeapMin()
    
    def fetch_median(self):

        # calculate the difference in heap sizes
        elements_count = len(self.hl)-len(self.hr)
        
        # if both of those have the same elements
        if elements_count == 0:
            return self.hl.root_node()
        # if hl has 1 extra element
        if elements_count == 1:
            return self.hl.root_node()
        # if hr has 1 extra element
        if elements_count == -1:
            return self.hr.root_node()        

    def insert(self,i):

        # when the heaps are empty
        if len(self.hl) == 0:
            self.hl.insert(i)
            return i
        
        # insert the new value to the appropriate heap
        if i > self.hl.root_node():
            self.hr.insert(i)
        else:
            self.hl.insert(i)
        
        # calculate the difference in heap sizes
        elements_count = len(self.hl)-len(self.hr)
        
        # if the invariant is broken
        if elements_count == 2:
            hl_root_node = self.hl.extract_max()
            self.hr.insert(hl_root_node)
        if elements_count == -2:
            hr_root_node = self.hr.extract_min()
            self.hl.insert(hr_root_node)
        
        if abs(len(self.hl) - len(self.hr)) > 1:
            raise AssertionError("The insert operation is breaking the ~l/2 invariant")
        
        return self.fetch_median()


In [3]:
# QC: MedianMaintain class
from random import Random
rand = Random()

mm = MedianMaintain()

lst = []
for _ in range(20):
    val = rand.randint(-20,50)
    lst.append(val)
    med = mm.insert(val)

    print(sorted(lst), " : ", med)



[42]  :  42
[21, 42]  :  21
[21, 31, 42]  :  31
[21, 31, 31, 42]  :  31
[9, 21, 31, 31, 42]  :  31
[9, 21, 28, 31, 31, 42]  :  28
[9, 21, 28, 31, 31, 36, 42]  :  31
[-5, 9, 21, 28, 31, 31, 36, 42]  :  28
[-5, 9, 21, 28, 31, 31, 36, 42, 45]  :  31
[-5, 1, 9, 21, 28, 31, 31, 36, 42, 45]  :  28
[-5, 1, 9, 21, 28, 31, 31, 36, 42, 45, 45]  :  31
[-5, 1, 9, 21, 28, 31, 31, 31, 36, 42, 45, 45]  :  31
[-5, 1, 9, 21, 28, 31, 31, 31, 36, 36, 42, 45, 45]  :  31
[-5, 1, 9, 21, 28, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  31
[-5, -2, 1, 9, 21, 28, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  31
[-5, -2, 1, 9, 21, 21, 28, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  31
[-5, -2, 1, 3, 9, 21, 21, 28, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  31
[-5, -2, 1, 3, 9, 21, 21, 28, 31, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  31
[-12, -5, -2, 1, 3, 9, 21, 21, 28, 31, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  31
[-12, -5, -5, -2, 1, 3, 9, 21, 21, 28, 31, 31, 31, 31, 35, 36, 36, 42, 45, 45]  :  28


### Assignment

In [5]:
nums = []
with open('Median.txt','r') as file:
    for row in file:
        nums.append(int(row))

In [9]:
mm = MedianMaintain()

sum_of_medians = 0

for num in nums:
    median = mm.insert(num)
    sum_of_medians+=median

In [24]:
sum_of_medians%10000

1213

### Solution = *1213*