# 295. Find Median from Data Stream
Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,
[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.
 

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3) 
findMedian() -> 2
 

Follow up:

If all integer numbers from the stream are between 0 and 100, how would you optimize it?
If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

## Approach 1: Simple Sorting
Intuition

Do what the question says.


Store the numbers in a resize-able container. Every time you need to output the median, sort the container and output the median.

Complexity Analysis

> Time complexity: O(n\log n) + O(1) ≃O(nlogn).

Adding a number takes amortized O(1) time for a container with an efficient resizing scheme.
Finding the median is primarily dependent on the sorting that takes place. This takes O(n\log n)O(nlogn) time for a standard comparative sort.
> Space complexity: O(n) linear space to hold input in a container. No extra space other than that needed (since sorting can usually be done in-place).

## Approach 2: Insertion Sort
Intuition

Keeping our input container always sorted (i.e. maintaining the sorted nature of the container as an invariant).

Algorithm

Which algorithm allows a number to be added to a sorted list of numbers and yet keeps the entire list sorted? Well, for one, insertion sort!

We assume that the current list is already sorted. When a new number comes, we have to add it to the list while maintaining the sorted nature of the list. This is achieved easily by finding the correct place to insert the incoming number, using a binary search (remember, the list is always sorted). Once the position is found, we need to shift all higher elements by one space to make room for the incoming number.

This method would work well when the amount of insertion queries is lesser or about the same as the amount of median finding queries.

Complexity Analysis

> Time complexity: O(n)+O(logn) ≈ O(n).

Binary Search takes O(logn) time to find correct insertion position.
Insertion can take up to O(n) time since elements have to be shifted inside the container to make room for the new element.
> Space complexity: O(n) linear space to hold input in a container.

## Topic analysis
This question gives us a data stream, let us find the median. For dynamic (flowing) data such as data streams, if you use array storage, then every time a new incoming data is sorted, it is inefficient.

The data structures generally used for processing dynamic data are stacks, queues, binary trees, and heaps.

In this problem, we use the heap this data structure.

First, the data is divided into two parts, located on top of the largest pile of data than below minimum heap of data to be small.

In order to ensure that the data is evenly distributed among the two heaps, the difference in the number of data in the two heaps during the dynamic operation cannot exceed one.

To ensure that all data of the maximum heap stack is less than the minimum data , during operation, into the newly added data to the maximum required minimum and maximum or minimum heap stack compared.

I keep two heaps (or priority queues):

> Max-heap small has the smaller half of the numbers.

> Min-heap large has the larger half of the numbers.

This gives me direct access to the one or two middle values (they're the tops of the heaps), so getting the median takes O(1) time. And adding a number takes O(log n) time.

Supporting both min- and max-heap is more or less cumbersome, depending on the language, so I simply negate the numbers in the heap in which I want the reverse of the default order. To prevent this from causing a bug with -231 (which negated is itself, when using 32-bit ints), I use integer types larger than 32 bits.

Note that the heapq in python is a min heap, thus we need to invert the values in the smaller half to mimic a "max heap".

Using larger integer types also prevents an overflow error when taking the mean of the two middle numbers.

Any time before we add a new number, there are two scenarios, (total n numbers, k = n / 2):

1. length of (small, large) == (k, k)
2. length of (small, large) == (k, k + 1)
After adding the number, total (n + 1) numbers, they will become:

1. length of (small, large) == (k, k + 1)
2. length of (small, large) == (k + 1, k + 1)
Here we take the first scenario for example, we know the large will gain one more item and small will remain the same size, but we cannot just push the item into large. What we should do is we push the new number into small and pop the maximum item from small then push it into large (all the pop and push here are heappop and heappush). By doing this kind of operations for the two scenarios we can keep our invariant.



In [None]:
from heapq import *

class MedianFinder(object):

    def __init__(self):

        self.small = [] # store the small half, top is the largest in the small part
        self.large = [] # store the large half, top is the smallest in the larger part

    def addNum(self, num):

        if len(self.small) == len(self.large):
            max_from_small = heappushpop(self.small,-num)
            heappush(self.large,-max_from_small)
        else:
            min_from_large = heappushpop(self.large,num)
            heappush(self.small,-min_from_large)
        
    def findMedian(self):

        if len(self.small) == len(self.large):
            return float(self.large[0] - self.small[0])/2.0
        else:
            return float(self.large[0])