# Heapsort

**heapsort**. Like merge sort, but unlike insertion sort, heapsort’s running time is O.n lg n/. Like insertion sort, but unlike merge sort, heapsort sorts in place: only a constant number of array elements are stored outside the input array at any time. Thus, heapsort combines the better attributes of the two sorting algorithms we have already discussed.

## heaps

Heap is a balanced binary tree. The root of the tree is A[1] and given the index i of a node, we can easily compute the indices of its parent, left child, right child:

parent: i/2

left: 2*i

right: 2*i+1

**Max heap**: A[parent[i]]>=A[i], Application: heapsort

**Min heap**: A[parent[i]]<=A[i], Application: priority queue

**Operations**: 

* The MAX-HEAPIFY procedure, which runs in O.lg n/ time, is the key to maintaining the max-heap property.
* The BUILD-MAX-HEAP procedure, which runs in linear time, produces a max- heap from an unordered input array.
* The HEAPSORT procedure, which runs in O.nlgn/ time, sorts an array in place.
* The MAX-HEAP-INSERT, HEAP-EXTRACT-MAX, HEAP-INCREASE-KEY, and HEAP-MAXIMUM procedures, which run in O(lg n) time, allow the heap data structure to implement a priority queue.

## Maxheapify

In [29]:
A=[0,16,4,10,14,7,9,3,2,8,1]

In [1]:
def max_heapify(A,size,i):
    l=2*i
    r=2*i+1
    largest=i
    if l<=size and A[l]>A[i]:
        largest=l
    if r<=size and A[r]>A[largest]:
        largest=r
    if largest!=i:
        tmp=A[i]
        A[i]=A[largest]
        A[largest]=tmp
        max_heapify(A,size,largest)

In [34]:
max_heapify(A,len(A)-1,2)
A

[0, 16, 14, 10, 4, 7, 9, 3, 2, 8, 1]

In [4]:
def build_max_heap(A):
    n=len(A)
    for i in range((n-1)/2,0,-1):
        max_heapify(A,n-1,i)

In [36]:
A=[0,16,4,10,14,7,9,3,2,8,1]
B=[A[0]]+A[1:][::-1]
print B
build_max_heap(B)
B

[0, 1, 8, 2, 3, 9, 7, 14, 10, 4, 16]


[0, 16, 10, 14, 4, 9, 7, 2, 3, 1, 8]

## The heapsort algorithm

The heapsort algorithm starts by using build_max_heap to build a max_heap on the input array A[1,...,n], where $n=A.length$. Since the maximum element of the array is stored at the root A[1], we can put it into its correct final position by exchanging it with A[n]. If we now discard A[n] (as it is already in the correct position, we can do so by decrementing heapsize by 1), we observe that the children of the root remain max-heaps, but the new root element might violate the max-heap property. All we need to do to restore the max-heap property, however, is call MAX-HEAPIFY(A,1), which leaves a max-heap in A[1..n-1]. The heapsort algorithm then repeats this process for the max-heap of size n-1 down to a heap of size 2.

In [9]:
def heapsort(A):
    build_max_heap(A)
    n=len(A)-1
    for i in range(n,1,-1):
        tmp=A[1]
        A[1]=A[i]
        A[i]=tmp
        i-=1
        max_heapify(A,i,1)

In [11]:
A=[0,-5,13,2,25,7,17,20,8,4]
heapsort(A)
A

9
8
7
6
5
4
3
2


[0, -5, 2, 4, 7, 8, 13, 17, 20, 25]

In [3]:
def max_heapify(A,size,i):
    l=2*i
    r=2*i+1
    largest=i
    if l<=size and A[l]>A[i]:
        largest=l
    if r<=size and A[r]>A[largest]:
        largest=r
    if largest!=i:
        tmp=A[i]
        A[i]=A[largest]
        A[largest]=tmp
        max_heapify(A,size,largest)

def build_max_heap(A):
    n=len(A)
    for i in range((n-1)/2,-1,-1):
        max_heapify(A,n-1,i)
        
def heapsort(A):
    build_max_heap(A)
    n=len(A)-1
    for i in range(n,0,-1):
        tmp=A[0]
        A[0]=A[i]
        A[i]=tmp
        i-=1
        max_heapify(A,i,0)

In [7]:
A=['b','d','e','c','a']
heapsort(A)
A

['a', 'b', 'c', 'd', 'e']

## Priority queues

When we use a heap to implement a priority queue, therefore, we often need to store a handle to the corresponding application object in each heap element. The exact makeup of the handle (such as a pointer or an integer) depends on the application. 

# Leetcode 

## 23. Merge k Sorted Lists

Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Solution: We could implement priority queue or heap queue in carrying out the job.

1. Initialize the heap queue h with the first k elements from each sorted lists. We need to label the list number they come from, so h will be a list of tuple(key, list_index).
2. pop the heap queue and add in a new element from the list the popped element is from.

In [55]:
# Definition for singly-linked list.
class ListNode(object):
    def __init__(self, x):
        self.val = x
        self.next = None

import heapq

class Solution(object):
    def mergeKLists(self, lists):
        """
        :type lists: List[ListNode]
        :rtype: ListNode
        """
        pq=[]
        sorted_list=[]
        k=len(lists)
        # initialization
        for i in range(k):
            if lists[i]:
                heapq.heappush(pq,(lists[i].val,i))
                lists[i]=lists[i].next
        # pop and push new element
        while pq!=[]:
            current=heapq.heappop(pq)
            index=current[1]
            sorted_list.append(current[0])
            if lists[index]:
                heapq.heappush(pq,(lists[index].val,index))
                lists[index]=lists[index].next
        return sorted_list

In [59]:
head1=ListNode(3)
head2=ListNode(2)
head2.next=ListNode(4)
lists=[head2,head1]
o=Solution()
o.mergeKLists(lists)

[2, 3, 4]

## 215. Kth Largest Element in an Array

Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

For example,
Given [3,2,1,5,6,4] and k = 2, return 5.

Note: 
You may assume k is always valid, 1 ≤ k ≤ array's length.

Solution: If we first sort the list and then find the k-th largest element, the time complexity will be O(n log n)+O(k)~O(n log n). If we build a binary heap, the time is O(n), then we pop k times, the time is O(k log n)

In [111]:
import heapq

class Solution(object):
    def findKthLargest(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: int
        """
        heapq.heapify(nums)
        for i in range(len(nums)-k):
            heapq.heappop(nums)
        return heapq.heappop(nums)

In [115]:
nums=[1,2,3,4,-1,-2]
o=Solution()
o.findKthLargest(nums,3)

2

## 264. Ugly Number II

Write a program to find the n-th ugly number.

Ugly numbers are positive numbers whose prime factors only include 2, 3, 5. For example, 1, 2, 3, 4, 5, 6, 8, 9, 10, 12 is the sequence of the first 10 ugly numbers.

Note that 1 is typically treated as an ugly number.

In [78]:
class Solution(object):
    def nthUglyNumber(self, n):
        """
        :type n: int
        :rtype: int
        """
        if n<=0:
            return None
        # ugly numbers list
        ugly_list=[1]
        # initialize Q2,Q3,Q5
        Q2=[2]
        Q3=[3]
        Q5=[5]
        while len(ugly_list)<n:
            # next minimum
            next_min=min(Q2[0],Q3[0],Q5[0])
            if next_min==Q2[0]:
                ugly_list.append(Q2.pop(0))
                Q2.append(next_min*2)
                Q3.append(next_min*3)
                Q5.append(next_min*5)
            elif next_min==Q3[0]:
                ugly_list.append(Q3.pop(0))
                Q3.append(next_min*3)
                Q5.append(next_min*5)
            else:
                ugly_list.append(Q5.pop(0))
                Q5.append(next_min*5)
        return ugly_list[n-1]

In [79]:
o=Solution()
o.nthUglyNumber(7)

8

In [103]:
class Solution(object):
    def nthUglyNumber(self, n):
        """
        :type n: int
        :rtype: int
        """
        if n==1:
            return 1
        uglies=[1]
        primes=[2,3,5]
        compare_list=[e for e in primes]
        # i2,i3,i5
        index_lists=[0]*len(primes)
        for i in range(1,n):
            min_index=0
            min_ugly=compare_list[0]
            for j in range(len(compare_list)):
                if compare_list[j]<min_ugly:
                    min_index=j
                    min_ugly=compare_list[j]
            uglies.append((min_ugly,min_index))
            index_lists[min_index]+=1
            while uglies[index_lists[min_index]][1]>min_index:
                index_lists[min_index]+=1
            compare_list[min_index]=primes[min_index]*uglies[index_lists[min_index]][0]
        return uglies[-1][0]

In [105]:
o=Solution()
o.nthUglyNumber(12)

16

## 313. Super Ugly Number

Write a program to find the nth super ugly number.

Super ugly numbers are positive numbers whose all prime factors are in the given prime list primes of size k. For example, [1, 2, 4, 7, 8, 13, 14, 16, 19, 26, 28, 32] is the sequence of the first 12 super ugly numbers given primes = [2, 7, 13, 19] of size 4.

Note:

(1) 1 is a super ugly number for any given primes.

(2) The given numbers in primes are in ascending order.

(3) 0 < k ≤ 100, 0 < n ≤ $10^6$, 0 < primes[i] < 1000.

Solution: If we apply the same method in finding ugly number, the space complexity will be O(n), which is impractical. We need to do some modification. 

In [106]:
class Solution(object):
    def nthSuperUglyNumber(self, n, primes):
        """
        :type n: int
        :type primes: List[int]
        :rtype: int
        """
        if n==1:
            return 1
        uglies=[1]
        compare_list=[e for e in primes]
        index_lists=[0]*len(primes)
        for i in range(1,n):
            min_index=0
            min_ugly=compare_list[0]
            for j in range(len(compare_list)):
                if compare_list[j]<min_ugly:
                    min_index=j
                    min_ugly=compare_list[j]
            uglies.append((min_ugly,min_index))
            index_lists[min_index]+=1
            while uglies[index_lists[min_index]][1]>min_index:
                index_lists[min_index]+=1
            compare_list[min_index]=primes[min_index]*uglies[index_lists[min_index]][0]
        return uglies[-1][0]

In [110]:
o=Solution()
o.nthSuperUglyNumber(9,[2, 7, 13, 19])

19

The above algorithm is O(n*k) in time complexity. It can be improved by applying binary heap to the compare_list such that the time is O(n*log), but we have to use tuples as elements in compare_list to keep track of the index.

## 347. Top K Frequent Elements

Given a non-empty array of integers, return the k most frequent elements.

For example,
Given [1,1,1,2,2,3] and k = 2, return [1,2].

Note: 
* You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
* Your algorithm's time complexity must be better than O(n log n), where n is the array's size.


Solution: We could use a hashtable to record number: frequency. Then we apply heap to the tuple list frequency: number.

In [3]:
import heapq

class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        freq_dict={}
        heap=[]
        result=[]
        for i in nums:
            if freq_dict.get(i):
                freq_dict[i]+=1
            else:
                freq_dict[i]=1
        for i in freq_dict.keys():
            heap.append((-freq_dict[i],i))
        heapq.heapify(heap)
        for i in range(k):
            result.append(heapq.heappop(heap)[1])
            
        return result

In [4]:
o=Solution()
o.topKFrequent([1,1,1,2,2,3],2)

[1, 2]

Discussion Solution: Concise solution O(n + klogn) python using minheap and dict

In [None]:
import heapq

class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        freq = {}
        freq_list=[]  
        for num in nums:
            if num in freq:
                freq[num] = freq[num] + 1
            else:
                freq[num] = 1
                
        for key in freq.keys():
           
            freq_list.append((-freq[key], key))
        heapq.heapify(freq_list)
        topk = []
        for i in range(0,k):
            topk.append(heapq.heappop(freq_list)[1])
        return topk

## <font color=red>373. Find K Pairs with Smallest Sums

You are given two integer arrays nums1 and nums2 sorted in ascending order and an integer k.

Define a pair (u,v) which consists of one element from the first array and one element from the second array.

Find the k pairs (u1,v1),(u2,v2) ...(uk,vk) with the smallest sums.

Example 1:
Given nums1 = [1,7,11], nums2 = [2,4,6],  k = 3

Return: [1,2],[1,4],[1,6]

The first 3 pairs are returned from the sequence:
[1,2],[1,4],[1,6],[7,2],[7,4],[11,2],[7,6],[11,4],[11,6]

Example 2:
Given nums1 = [1,1,2], nums2 = [1,2,3],  k = 2

Return: [1,1],[1,1]

The first 2 pairs are returned from the sequence:
[1,1],[1,1],[1,2],[2,1],[1,2],[2,2],[1,3],[1,3],[2,3]

Example 3:
Given nums1 = [1,2], nums2 = [3],  k = 3 

Return: [1,3],[2,3]

All possible pairs are returned from the sequence:
[1,3],[2,3]


Solution: We could build a heap containing n1 n2 elements, which will take O(n_1 n_2) time. Then we pop the heap until we reach k or the heap becomes empty. This will take O(n_1 n_2 log n_1 n_2) time(not tight bound)

In [31]:
import heapq

class Solution(object):
    def kSmallestPairs(self, nums1, nums2, k):
        """
        :type nums1: List[int]
        :type nums2: List[int]
        :type k: int
        :rtype: List[List[int]]
        """
        result=[]
        heap=[(nums1[i]+nums2[j],[nums1[i],nums2[j]]) for i in range(len(nums1)) for j in range(len(nums2))]
        heapq.heapify(heap)
        while k>0 and heap:
            result.append(heapq.heappop(heap)[1])
            k-=1
            
        return result

In [33]:
o=Solution()
o.kSmallestPairs([1,2],[3],3)

[[1, 3], [2, 3]]

The above solution is MLE. I think it takes O(n_1 n_2) memory. We could do a little better by shrinking the memory to O(k).

In [51]:
import heapq

class Solution(object):
    def kSmallestPairs(self, nums1, nums2, k, heap=[]):
        for n1 in nums1:
            for n2 in nums2:
                if len(heap) < k: heapq.heappush(heap, (-n1-n2, [n1, n2]))
                else:
                    if heap and -heap[0][0] > n1 + n2:
                        heapq.heappop(heap)
                        heapq.heappush(heap, (-n1-n2, [n1, n2]))
                    else: break
        return [heapq.heappop(heap)[1] for _ in range(k) if heap]

In [52]:
o=Solution()
o.kSmallestPairs([1,7,11],[2,4,6],3)

[[1, 6], [1, 4], [1, 2]]

## 378. Kth Smallest Element in a Sorted Matrix

Given a n x n matrix where each of the rows and columns are sorted in ascending order, find the kth smallest element in the matrix.

Note that it is the kth smallest element in the sorted order, not the kth distinct element.

Example:

matrix = [
   [ 1,  5,  9],
   [10, 11, 13],
   [12, 13, 15]
],
k = 8,

return 13.

Note: 

You may assume k is always valid, $1 ≤ k ≤ n^2$.



Solution: Naive solution. Forget about the sorted nature of the matrix and just construct a heap to store the K-th largest element at the root. The worst case is O(n^2) time and O(n^2) space. We could improve it a little by only keep a heap of size O(n) by inserting a single row from the matrix. Inserting K smallest element takes O(k log n) time.

In [None]:
import heapq

class Solution(object):
    def kthSmallest(self, matrix, k):
        """
        :type matrix: List[List[int]]
        :type k: int
        :rtype: int
        """
        kheap=[(-matrix[0][i],0,i) for i in xrange(len(matrix[0]))]
        heapq.heapify(kheap)
        result=matrix[0][0]
        for i in xrange(k):
            result=kheap.heappop()
            if result[1]<len(matrix)-1:
                kheap.heappush(matrix[result[1]+1][result[2]])
                
        return result

In [12]:
a=[(1,3),(1,2)]
heapq.heapify(a)
a

[(1, 2), (1, 3)]

## 451. Sort Characters By Frequency

Given a string, sort it in decreasing order based on the frequency of characters.

Example 1:

Input:
"tree"

Output:
"eert"

Explanation:
'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input:
"cccaaa"

Output:
"cccaaa"

Explanation:
Both 'c' and 'a' appear three times, so "aaaccc" is also a valid answer.
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input:
"Aabb"

Output:
"bbAa"

Explanation:
"bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.


In [24]:
class Solution(object):
    def frequencySort(self, s):
        """
        :type s: str
        :rtype: str
        """
        freq_dict={}
        heap=[]
        result=[]
        for c in s:
            if freq_dict.get(c):
                freq_dict[c]+=1
            else:
                freq_dict[c]=1
        for c in freq_dict.keys():
            heap.append((-freq_dict[c],c))
        heapq.heapify(heap)
        while heap:
            (minus_freq,c)=heapq.heappop(heap)
            result.append((-minus_freq)*c)
            
        return ''.join(result)

In [25]:
o=Solution()
o.frequencySort("raaeaedere")

'eeeeaaarrd'

Alternatively, there is a O(n) solution if we create a new array of len(s) with null initial values and match the frequency with the index of the array and the corresponding entry record the character in s. Then we read the array backwards.

In [26]:
class Solution(object):
    def frequencySort(self, s):
        """
        :type s: str
        :rtype: str
        """
        freq_dict={}
        bucket=[None for i in range(len(s))]
        result=[]
        
        # create hashtable
        for c in s:
            if freq_dict.get(c):
                freq_dict[c]+=1
            else:
                freq_dict[c]=1
        
        # record freq and c in ascending order in a bucket
        for c in freq_dict.keys():
            bucket[freq_dict[c]]=c
        for i in range(len(bucket)-1,-1,-1):
            if bucket[i]:
                result.append(i*bucket[i])
            
        return ''.join(result)

In [27]:
o=Solution()
o.frequencySort("raaeaedere")

'eeeeaaarrd'