# **Assignment 3 Solutions** #

**Delivery Instructions**:  Similar to assignment 2. See this [**Canvas announcement**](https://njit.instructure.com/courses/11882/discussion_topics/42914) for more details. 



### **Q1. A task scheduling problem** ###

The input for this problem is a set of $n$ tasks. Task $i$ has a start time and and end time $(s_i, t_i)$. The goal is to complete the **maximum possible number of tasks** under this constraint: if we choose to perform some task $i$, then no other task that overlaps with it can be performed. Or in other words, if a task starts or ends between $s_i$ and $t_i$, then it cannot be performed. 

**(i)** For this problem you should give an algorithm that returns a set of tasks whose number is the maximum possible that can be performed. You should target a algorithm that keeps augmenting a list of tasks by a simple 'greedy' criterion. Please describe your algorithm in text cell.  

**(ii)** You should also give an implementation of the algorithm. You can assume that the input is given in the format specified in the following code cell.











In [None]:

# Tasks = [(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10), (8, 11), (8, 12), (2, 13), (12, 14)]

We first sort the asks according to their end time. Then the idea is to keep picking the earliest ending task, as long as it does not intersect with the ones we have picked already. This can be done by traversing the list one time.

The overall runnning time is $O(n \log n)$ due to the sort. 

In [None]:
Tasks = [(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10), (8, 11), (8, 12), (2, 13), (12, 14)]

sorted(Tasks, key = lambda task :task[1])  # this is how to sort by the second number of the tuple (end time)

[(1, 4),
 (3, 5),
 (0, 6),
 (5, 7),
 (3, 8),
 (5, 9),
 (6, 10),
 (8, 11),
 (8, 12),
 (2, 13),
 (12, 14)]

In [None]:
def taskScheduler(Tasks):
  Tasks = sorted(Tasks, key = lambda task :task[1])  # sort by end time

  last_task_added = Tasks[0]    # add first task in ToDo
  ToDo = [last_task_added]
  
  for j in range(1,len(Tasks)):

    task = Tasks[j]                     # next possible task 
    if task[0]>= last_task_added[1]:    # check if task starts after the end of previous task
      ToDo.append(task)
      last_task_added = task
  
  return ToDo


# check how it works
taskScheduler(Tasks)

[(1, 4), (5, 7), (8, 11), (12, 14)]

### **Q2. A Heap with composite values**

**(i)** The heap class we discussed in lecture 2 implicitly assumes that the heap stores only **keys** (numbers). However, as we discussed, we may want to store composite elements that contain values along with the keys. In this question you are asked to modify the following implementation of the class in order to handle elements that are assumed to have a special **key** field. You can demonstrate this on the *heapInsert* function -- the max extraction function will be very similar. 

**(ii)** You can see Lecture-3 notebook to see how part (i) is done when the data structure is a simple list. This list was used for a suboptimal implementation of Prim's MST algorithm. Assuming you have a modified class from part (i), how would you modify Prim's implementation in order to get an $O(n \log n)$ running time? Please give a short answer in a text cell. 



In [None]:
 class myMaxHeap:
  def __init__(self):
    self.H = []


  def heapInsert(self,x):
    n = len(self.H)

    if n == 0:
      self.H[0]=x
      return


    self.H.append(x)   # append in last leaf (next available position in array/list)
    
    # now bubble up x
    pos = n;      # current position of bubble-up
    while pos>0:
      parent_pos = (pos-1)//2 
      if self.H[parent_pos].key < self.H[pos].key:  
        self.H[pos] = self.H[parent_pos]     # copy parent value to current position
        self.H[parent_pos] = x               # move x to parent's position
        pos = parent_pos                     # update current position
      else:
        break                                # break the bubble-up loop
    # return H    

**(i)** The above code takes into account composite elements. The only difference with the previosu code is in line 20, specifically adding .key to access the keys of the two elements and perform the comparison based on that. 

**(ii)** The implementation of Prim's algorithm in lecture 3 uses a class 'edgeBox' to store edges and extract the maximum weight edge. The only change required would be to replace that 'H = edgeBox()' with 'H=myMaxHeap()'. Both classes have methods with the same name and specifications, so the rest of the code stays the same. 


In [None]:
# your code correction goes here

### **Q3. Augmented heap for insertion performance tracking**

**(i)** Give a modification of *class myMaxHeap* to include the following attribute: 

*   *n_comparisons*: The number of comparisons that *heapInsert* performed since the initialization of the heap instance. 

**(ii)** Suppose that you want to insert in a heap the numbers $1,\ldots, n$. In what sequence should you insert them in order to cause the maximum possible number of comparisons in *heapInsert* ? What is the asymptotic number of comparisons as a function of $n$?

**(iii)** Describe and implement an insertion sequence that will make the number of comparisons significantly smaller. (Hint: think random)

**(iv)** Use the modified class from part (i) in order to count exactly the number of comparisons for the two strategies in (ii) and (iii). Do that for $n=10^2, 10^4, 10^6$, and report the numbers. 





**(i)** The modificiation is to add a counter which gets updated when a new comparison is performed in *heapInsert*

In [None]:
class myMaxHeap:
  def __init__(self):
    self.H = []
    self.ncomparisons = 0

  def heapInsert(self, x):

    
    self.H.append(x)  # append in last leaf (next available position in array/list)

    n = len(self.H)-1
    if n==0:
      return


    # now bubble up x
    pos = n;  # current position of bubble-up
    while pos>0:
        parent_pos = (pos - 1) // 2

        self.ncomparisons = self.ncomparisons+1      # here  a comparison will be performed
        if self.H[parent_pos] < self.H[pos]:
            self.H[pos] = self.H[parent_pos]
            self.H[parent_pos] = x  # move x to parent's position
            pos = parent_pos  # update current position
        else:
            break  # break the bubble-up loop
    return H


  # function for removing max element from heap
  # WARNING: This function is intentionally incomplete --
  #          You will fix this in the assignment

  def heapMaxRemove(self):

    if len(self.H) == 0:
      print('Empty Heap!')
      return []

    max_elem = self.H[0]
    x = self.H.pop()

    if len(self.H) == 1:
      return max_elem


    H[0] = x   # put x is the position of max

    # now bubble-down x
    pos = 0
    while True:

        if 2 * pos + 1 < len(self.H) and 2 * pos + 2 < len(self.H):   #both children exist
          c1_pos = 2 * pos + 1  # child 1 position
          c2_pos = 2 * pos + 2  # child 2 position

          if self.H[c1_pos] > self.H[c2_pos]:
            c_pos = c1_pos
          else:
            c_pos = c2_pos  # which child is active in possible swap

        elif 2 * pos + 1 < len(self.H):                               # only one child
          c_pos = 2 * pos + 1  # child 1 position

        if self.H[pos] < self.H[c_pos]:
            self.H[pos] = self.H[c_pos]  # swap
            self.H[c_pos] = x
            pos = c_pos  # update current position
        else:
            break  # break : no possible swap 
    
    return max_elem

      


**(ii)** In order to cause the maximum number of comparisons, we should be inserting things in a sorted order, from min to max. In that way, it is guaranteed that the newly iserted element has to 'bubble-up' all the way to the root. This will cause the maximum number of insertions



**(iii)** If we know all elements to be inserted we can take a random permutation (shuffling) of these elements and insert them in that random order. The average number of comparisons in each bubble-up will be constant. 

In [55]:
H = myMaxHeap()

L = list(range(10**5))

for j in range(len(L)):
  H.heapInsert(L[j])

print("For bad-pattern insertion, the number of comparisons is", H.ncomparisons)


# reset heap
H = myMaxHeap()


import numpy as np
L = np.random.permutation(10**5)   # this gives the first 10^5 numbers in random order

for j in range(len(L)):
  H.heapInsert(L[j])

print("For random insertion, the number of comparisons is",H.ncomparisons)





For bad-pattern insertion, the number of comparisons is 1468946
For random insertion, the number of comparisons is 228768


In [None]:
L = list(range(10**5))
len(L)

100000

### **Q4. Manual construction of a Huffman Tree**

In Lecture 3 we discussed a specific example of how to build a Huffman prefix tree for a set of input numbers. Recall that the algorithm starts from different one-node trees and then keeps merging two trees until it is left with only one tree. In this exercise you are asked to use the Node class contained in lecture-3 notebook and build a tree following exactly the merges that we saw in the lecture. The final outcome should be a single variable of the *Node* class, that contains the entire tree. (Hint: The pseudocode in the lecture notes may be useful)




In [None]:
# create six different trees

# merge two trees (x5)


class Node:
  def __init__(self,key):
    self.key = key
    self.lchild = None
    self.rchild = None

T1 = Node(5)
T2 = Node(9)
T3 = Node(16)
T4 = Node(12)
T5 = Node(13)
T6 = Node(45)

# merge 1
T_12 = Node(T1.key+T2.key)
T_12.lchild = T1
T_12.rchild = T2

# merge 2 
T_45 = Node(T4.key+T5.key)
T_45.lchild = T4
T_45.rchild = T5

# merge 3
T_123 = Node(T_12.key+T3.key)
T_123.lchild = T_12
T_123.rchild = T3

# merge 4
T_12345 = Node(T_123.key+T_45.key)
T_12345.lchild = T_123
T_12345.rchild = T_45

# merge 5
T_all = Node(T_12345.key + T6.key)
T_all.lchild = T_12345
T_all.rchild = T6







In [None]:
T_all.key

100