# Data Structures Question (Intermediate)
You are designing a task scheduling system for a distributed computing platform. The system needs to manage tasks with different priority levels and track their execution history.
Design a data structure that supports the following operations efficiently:

1. **scheduleTask(taskId, priority, timestamp)** – Add a new task with a given priority (integer 1-10, where 10 is highest) and arrival timestamp
2. **executeNext()** – Remove and return the highest priority task. If multiple tasks have the same priority, return the one with the earliest timestamp (FIFO within priority level
3. **cancelTask(taskId)** – Remove a specific task by its ID if it exists in the queue
4. **updatePriority(taskId, newPriority)** – Change the priority of an existing task
5. **getTasksByPriority(priority)** – Return all task IDs currently scheduled at a specific priority level

## Constraints

- The system may handle hundreds of thousands of concurrent tasks
- `scheduleTask` should be O(log n) or better
- `executeNext` should be O(log n) or better
- `cancelTask` and `updatePriority` should be faster than O(n)
- `getTasksByPriority` should return results without scanning all tasks

## Tasks

1. Describe the data structures you would use and justify your choices
2. Explain how each operation works with pseudocode or detailed descriptions
3. Analyze the time and space complexity of each operation
4. Discuss potential trade-offs – What optimizations did you choose and what did you sacrifice?

> **Bonus Challenge:** How would you modify your design if tasks also had deadlines, and you needed to support `getExpiringSoon(timeWindow)` to retrieve tasks expiring within the next timeWindow seconds?

## 1. Describe the data structures you would use and justify your choices

> challenges noticed 
>
> 1. add a task such that highest priority remains first
>
> 2. Note that early time is less than later time 
>

## Idea 

1. task manager heap (schedule tasks ) `task_manager`

|index| task id| priority (max)| time in (min)|
|:---:|:----:|:----:|:----|
|0| task 5| 10| 0000001|
|1| task 2| 10| 0001000|
|2| task 3|  9| 0000002|
|3| task 4|  9| 0000004|
|4| task 1|  9| 0000102|
|5| task 6|  8| 0000003|


the task manager will use a heap(-priority, time in, id). 

2. priority hashmap (getTasksByPriority) {priority: list(set(task id))} `priority`
3. task hashmap(cancel, update) {task id:(priority, time in), ...} `task_map`

## Steps for each method

### Schedule a task O(log n)

1. create a `new_task = task_id: (task_priority, task_time_in)`
2. if `root is none`. In the heap `task_manager.root = new_task`
3. else
   1. heap.push(new_task).
   2. `task_map[task_id] = (task_priority, task_time_in)`( to help with update and cancelling)
   3. `priority[task_priority].append(task_id)
4. return True

### Execute task O(log n)

> **Execute the task** means pop the task from the task heap and delete its id from the priority hashmap and task hashmap
> 

loop `while a tasks exist in task_manager`
   `task = task_manager.pop()`
   `if task.task_id in task_map` O(1)
   `if task.task_priority == task_map[task.task_id].task_priority`. O(1) `account for updates`
        priority.pop(taskid)
        task_map.task_priority.remove(task_id)
        return task
return None


### Cancel a task

The logic will depend on the task hashmap. 

if `id_to_delet` in task_map
    `task_map.pop(id_to_delete)` O(1) and 
    `priority.remove([task_manager[id_to_delete].task_priority])` O(1) (fuzzy delete to save time) 
return None


### Update a task

1. Update the task priority in the task hashmap O(1)
2. Update the priority hashmap `remove from old priority; add to new priority` O(1)
3. Insert a new entry into the heap with updated priority (the old one becomes stale)

### Get task by priority

1. Just call the exact priority on the priority hashmap O(1)

### Get expiring soon 

1. Add a secondary min-heap ordered by deadline
2. Maintain another hashmap: {taskId: deadline}
3. getExpiringSoon(window): Peek heap, collect tasks with deadline ≤ current_time + window

>  **Space Complexity**
>
>  Heap: O(n + s) where s = stale entries
> 
> task_hashmap: O(n)
> 
> priority_hashmap: O(n)
> 
> Total: O(n + s), which in worst case could be O(n²) without cleanup
>

# Trade Offs

1. Lazy deletion creates stale entries in the heap (space overhead)
2. Trade-off: Simple implementation vs. worst-case guarantees



In [1]:
import heapq
from collections import defaultdict
from typing import Optional, List, Tuple


class TaskManager:
    def __init__(self):
        # Heap: (-priority, time_in, task_id)
        self.task_manager = []

        # task_id -> (priority, time_in)
        self.task_map = {}

        # priority -> set(task_id)
        self.priority_map = defaultdict(set)

    # --------------------------------------------------
    # Schedule a task : O(log n)
    # --------------------------------------------------
    def schedule_task(self, task_id: int, priority: int, time_in: int) -> bool:
        if task_id in self.task_map:
            return False  # task already exists

        heapq.heappush(
            self.task_manager,
            (-priority, time_in, task_id)
        )

        self.task_map[task_id] = (priority, time_in)
        self.priority_map[priority].add(task_id)
        return True

    # --------------------------------------------------
    # Execute task : O(log n) amortized
    # --------------------------------------------------
    def execute_task(self) -> Optional[Tuple[int, int, int]]:
        while self.task_manager:
            neg_priority, time_in, task_id = heapq.heappop(self.task_manager)

            # Lazy deletion check
            if task_id not in self.task_map:
                continue

            current_priority, current_time = self.task_map[task_id]

            # Stale heap entry (due to update)
            if -neg_priority != current_priority or time_in != current_time:
                continue

            # Valid task → execute
            del self.task_map[task_id]
            self.priority_map[current_priority].remove(task_id)

            if not self.priority_map[current_priority]:
                del self.priority_map[current_priority]

            return (task_id, current_priority, time_in)

        return None

    # --------------------------------------------------
    # Cancel a task : O(1)
    # --------------------------------------------------
    def cancel_task(self, task_id: int) -> bool:
        if task_id not in self.task_map:
            return False

        priority, _ = self.task_map.pop(task_id)
        self.priority_map[priority].remove(task_id)

        if not self.priority_map[priority]:
            del self.priority_map[priority]

        # Heap cleanup is lazy
        return True

    # --------------------------------------------------
    # Update a task : O(log n)
    # --------------------------------------------------
    def update_task(self, task_id: int, new_priority: int) -> bool:
        if task_id not in self.task_map:
            return False

        old_priority, time_in = self.task_map[task_id]

        # Update task_map
        self.task_map[task_id] = (new_priority, time_in)

        # Update priority_map
        self.priority_map[old_priority].remove(task_id)
        if not self.priority_map[old_priority]:
            del self.priority_map[old_priority]

        self.priority_map[new_priority].add(task_id)

        # Push updated entry (old one becomes stale)
        heapq.heappush(
            self.task_manager,
            (-new_priority, time_in, task_id)
        )

        return True

    # --------------------------------------------------
    # Get tasks by priority : O(1)
    # --------------------------------------------------
    def get_tasks_by_priority(self, priority: int) -> List[int]:
        return list(self.priority_map.get(priority, []))
