In [2]:
import random
from heapq import heappop, heappush

# **Python Background**

The [heapq](https://docs.python.org/3/library/heapq.html) module implements a min-heap which can be used as a priority queue.  Take a look at the documentation to get a feel for how it works.  Understand enough so that the following chunk of code makes sense to you.

In [3]:
heap_array = []
tasks = [(10,'A'), (8,'B'), (11, 'C'), (9,'D')]
for T in tasks:
  heappush(heap_array, T)

print([heappop(heap_array) for _ in range(len(heap_array))])

[(8, 'B'), (9, 'D'), (10, 'A'), (11, 'C')]


**How are tuples compared in Python?  Which is bigger $(2,3,1,5)$ or $(2,3,2,-10)$?**

Tuples are compared element-wise, starting from the first element of each tuple. Take the two tuples in the question as examples: python first compare the first element in both tuple, and 2 = 2. Then, it moves on to compare the second element in both tuple, and 3 = 3. Then, it moves on to compare the third element in both tuple, and since 1 < 2, (2, 3, 1, 5) is smaller than (2, 3, 2, -10).

# **Building a Priority Queue**



We have a large collection of tasks which we'd like to dispatch to a single processor. We'll receive a sequence of tasks in the form (task_name, priority), where priority is an integer. For this implementation, the smaller priority is, the more urgent the task.  A task with priority=0 is higher priority than a task with priority=100.

We'll support two operations: ```insert(task_name, priority)``` which adds a new task with a given name and priority and ```next_task()``` which returns the name of the highest priority task that hasn't been scheduled yet.  We'll also have an ```is_empty()``` function for convenience.

Here's a simple implementation that we'll build off of.

In [4]:
class TaskScheduler:

  def __init__(self):
    self.heap = []
  
  def insert(self, task_name, priority):
    heappush(self.heap, (priority, task_name))
  
  def next_task(self):
    _,  task_name = heappop(self.heap) #(underscore is conventionally used to represent a variable that we don't use)
    return task_name
  
  def is_empty(self):
    return len(self.heap) == 0

You can use TaskScheduler as follows.

In [5]:
scheduler = TaskScheduler()
scheduler.insert("Task 1", 5)
scheduler.insert("Task 2", 8)
scheduler.insert("Task 3", 1)
print(scheduler.next_task())
print(scheduler.next_task())
scheduler.insert("Task 4", 0)
print(scheduler.next_task())

Task 3
Task 1
Task 4


**Suppose that TaskScheduler holds two tasks with the same priority.  You might hope that the first task to be added would be scheduled first.  This isn't necessarily the case, although it sometimes is.  Give two code snippets showing that two tasks with the same priority can be scheduled either in order of insertion, or in the opposite order.**

In [6]:
# This code snippet shows that two tasks with the same priority can be scheduled in order of insertion.
# The reason is that "Task 1" is smaller than "Task 2" by using dictionary order.

scheduler1 = TaskScheduler()
scheduler1.insert("Task 1", 5)
scheduler1.insert("Task 2", 5)
print(scheduler1.next_task())
print(scheduler1.next_task())

Task 1
Task 2


In [7]:
# This code snippet shows that two tasks with the same priority can be scheduled in the opposite order of insertion.
# The reason is that "Task 1" is smaller than "Task 2" by using dictionary order.

scheduler2 = TaskScheduler()
scheduler2.insert("Task 2", 5)
scheduler2.insert("Task 1", 5)
print(scheduler2.next_task())
print(scheduler2.next_task())

Task 1
Task 2


**Make a task scheduler that's "stable", in the sense that any two tasks with equal priority are removed in the order that they were added.**

In [8]:
class StableTaskScheduler:

  def __init__(self):
    self.heap = []
    self.entryCount = 0
    '''
    You can add other variables here if you want.
    For Python reasons, call them self.<name of the variable> and
    refer to them like that throughout.
    '''
  
  def insert(self, task_name, priority):
    heappush(self.heap, (priority, self.entryCount, task_name))
    self.entryCount += 1
  
  def next_task(self):
    _, _, task_name = heappop(self.heap) #(underscore is conventionally used to represent a variable that we don't use)
    return task_name
  
  def empty(self):
    return len(self.heap) == 0

In [9]:
#Add a quick test here to make sure it works.

scheduler2 = StableTaskScheduler()
scheduler2.insert("Task 2", 5)
scheduler2.insert("Task 1", 5)
print(scheduler2.next_task())
print(scheduler2.next_task())

Task 2
Task 1


**Another property that you might want is "liveness".  We'd like to guarantee that any job currently in the queue will eventually be scheduled as long as ```next_task()``` is called infinitely many times.  Why is this not currently guaranteed?**

Because the program may also be inserted with processes with higher priority continuously, resulting in some specific task with a low priority never be executed.

**To ensure liveness, we'll only allow a nonnegative initial priority for each task. Then every time a new task is added to the queue, we'll decrease by 1 the priorities of all tasks currently in the queue (priorities are allowed to become negative).  Why does this guarantee liveness?**

Assume we inserted a new task, task_x, with priority n, and there are m tasks existed in the heap with higher priority. Now consider the worst case, where each newly inserted task having priority = 0: after n+1 insertions, the priority of task_x becomes -1, and from now on all newly inserted tasks (which can only have non-negative priority) will be executed later than task_x. And for the worst case, at this time there can only have (m + n) - (n + 1) = m - 1 tasks with higher priority than task_x. 

To sum up, for any task currently with priority n and with m numbers of tasks with higher priority in the heap, after m+n times, it will be executed.

**Implement a task scheduler that behaves as described above.  Like the previous scheduler, it should also be stable.  The details of the implementation are up to you, as long as the tasks come out in the right order.**

In [41]:
class LiveTaskScheduler:

  def __init__(self):
    self.heap = []
    self.entryCount = 0
  
  def insert(self, task_name, priority):
    priority = max(priority, 0)
    heappush(self.heap, [priority + self.entryCount, self.entryCount, task_name])
    self.entryCount += 1
    #decreasing each other by 1 would be O(n), which is too inefficent and will cause the program to run ~10 minutes.
    #however, decreasing each other by 1 is equivalent to add n to the n-th newly inserted task, and this is just O(1).
    #Now the program only takes ~1 second.

    #in case you still want the original logic, which decrease each one by 1, then the code for insert should be:
    # def insert(self, task_name, priority):
    #   priority = max(priority, 0)
    #   heappush(self.heap, [priority, self.entryCount, task_name])
    #   self.entryCount += 1
    #   for i in self.heap:
    #     i[0] -= 1
    
  def next_task(self):
    _, _, task_name = heappop(self.heap) #(underscore is conventionally used to represent a variable that we don't use)
    return task_name
  
  def is_empty(self):
    return len(self.heap) == 0

**Now let's test this out.**

In [16]:
def stress_test(scheduler, iters, M, seed=42):
  '''
  Runs scheduler for iters iterations.  
  For the first M iterations no tasks are removed. 
  In other iterations, we'll add a task with random priority if there are either
  no tasks left to remove, or if a random coin comes up heads.
  Returns the task schedule.
  '''
  random.seed(seed)

  task_schedule = []

  for i in range(iters):
    if scheduler.is_empty() or i < M or random.randint(0,1):
      name = "Task " + str(i)
      priority = random.randint(0,100)
      scheduler.insert(name, priority)
    else:
      task_schedule.append(scheduler.next_task())

  return task_schedule

**Test 1**

In [17]:
scheduler = LiveTaskScheduler()
tasks = stress_test(scheduler, 100, 1)
print(tasks[-10:])  #print the last 10 tasks scheduled

['Task 72', 'Task 46', 'Task 82', 'Task 56', 'Task 88', 'Task 91', 'Task 89', 'Task 83', 'Task 71', 'Task 92']


**Test 2**

Ideally this should run in a couple seconds.

In [42]:
scheduler = LiveTaskScheduler()
tasks = stress_test(scheduler, 500000, 10000)
print(tasks[-10:])

['Task 479000', 'Task 479128', 'Task 478975', 'Task 479034', 'Task 479042', 'Task 479182', 'Task 479179', 'Task 479129', 'Task 479124', 'Task 478978']
