# Dijkstra's Shortest-Path Algorithm

Given a directed graph $G = (V, E)$ with each edge having a non-negative weight, define the shortest path as the path that minimises the sum of the weights over the path.

For edges with negative weights, consider the Bellman-Ford algorithm.

Psuedocode
```
Initialise

X = {s} //set of correctly processed nodes
A[s] = 0 //computed shortest paths to s

// as a simplifiation also remember
B[s] = empty path

Main Loop

while X != V:
    among all edges (v, w) with v in X and w in V-X,
    pick the edge that minimises,

        A[v] + l_vw //Dijkstra's Greedy Criterion

    Where l_vw is the weight of the edge (v, w)

    Let this edge be (v*, w*)

    add w* to X
    A[w*] = A[v*] + l_v*w*
    B[w*] = B[v*] + (v*, w*)

```
The naive approach outlined above runs in $O(mn)$ time. This can be greatly improved using a Heap.


## Heaps 

A Heap is used to perform insert and deletion operations for ordered items.

For exmaple, a Binary Heap is a structed node tree, such that each parent node has two children and the value of each parent is less than each child.

A binary tree has two properties that have to be maintained 
1. The shape is preserved
2. The parent is always less than each child.

In order to preserve the shape of the tree, a new node is always added in the same order. For instance, left nodes first, in which case all left nodes of each child in the bottom most layer are added before adding the right nodes for each child - starting from the left most parent.

In order to add a new node,
1. Add the node according to the shape rule 
2. Check if the node is less than its parent, if so swap the node with its parent. Repeat until the heap is correct

This runs in only $O(\log{n})$ time as there will only be at most $\log{n}$ parents to check before arriving at the top most node

For deletion / extraction, we are only allowed to extract the top most node. After doing so,
1. Take a node from the bottom according to the shape rule 
2. Perform swaps going down comparing the new parent to each of its children. If its greater than any of its children swap the parent with the smaller of the children. Repeat until the heap is correct.

This runs in $O(\log{n})$ time as well.

A heap gives us access to insertion of any element and extraction of the minimum element in $O(\log{n})$ time.

We can also delete from the middle of the heap. We remove the node as required, replace it with a node from the bottom of the heap, and then swap with the deleted node, followed by a "bubbling" up or down as needed.



## Implementing Dijkstra's with Heaps

Heap invariants will be as follows
1. Elements in the heap = vertices of V - X
2. for a vertex $v$ not in $X$, key[$v$] = smallest greedy score of an edge $(u, v) \in E$ with $u \in X$
3. for vertecies with no such edge assing a score of $+\infty$

Maintaining Invariant #2

Consider when a new vertex $w$ is added to the set $X$. nodes that has previously had an edge whose tail was from $w$, will need to have their key values updated. 

```
for each edge (w,v) in E:
    if v in V-X: \\v is still in the heap:
        delete v from heap
        recompute key[v] = min(key[v], A[v] + l_wv)
        re insert v into heap
```

key updates will take $O(\log{n})$ time.

Run time analysis,
1. $(n-1)$ Extract mins
2. each edge triggers at most one Delete/Insert combo $O(m)$

Total number of heap operations in therefore $O(m + n) = O(m)$ since we assume that there is a path to every other vertex and so $m$ is always larger than $n$

Overall running time $O(m \log{n})$

# Optional Theory Problems

## 1

In lecture we define the length of a path to be the sum of the lengths of its edges. Define the bottleneck of a path to be the maximum length of one of its edges. A mininum-bottleneck path between two vertices $s$ and $t$ is a path with bottleneck no larger than that of any other $s-t$ path. Show how to modify Dijkstra's algorithm to compute a minimum-bottleneck path between two given vertices. The running time should be $O(m \log{n})$, as in lecture.

Let us modify the argument for Dijkstra's algorithm.

We define two sets $X$ and $V-X$ such that all vertecies within $X$ have correct minimum bottleneck values.

Then we consider the edges that start in $X$ and end in $V-X$.

We then pick the edge that has the minimum length value, and add the head of that edge into $X$.

Then we update the minimum-bottleneck values in $X$ as 
```
Q[v] = max(Q[v], l_vw)
```

This enables us to add another node into the set $X$ with the correct minimum bottleneck values.

## 2

We can do better. Suppose now that the graph is undirected. Give a linear-time $O(m)$ algorithm to compute a minimum-bottleneck path between two given vertices.

Let's try to modify BFS.

```
Initialise all edges to unexplored

Let Q = FIFO data structure initialised with s
Let M[v] = minimum bottle neck values to v initalised to inf
Let B[v] = empty path //store the paths

While Q is not empty:
    remove first node in Q, call it v

    for each edge(v, w) in v:
        if edge has not yet been explored:
            M[w] = max(M[w], length of edge)
            B[w] = B[v] + (v, w)

            add w into Q

            mark edge as explored
```

this algorithm will iterate once over each node, whilst doing a sort of BFS, updating the minimum bottle-neck values for each node as the algorithm runs.

Does this work?? Check out [Camerini's Algorithm](https://en.wikipedia.org/wiki/Minimum_bottleneck_spanning_tree#:~:text=necessarily%20a%20MST.-,Camerini's%20algorithm%20for%20undirected%20graphs,than%20that%20in%20the%20other.)

## 3

What if the graph is directed? Can you compute a minimum-bottleneck path between two given vertices faster than $O(m\log{n})$?

I think solution for #2 will work for this one as well...

# Programming Assignment 2

The file contains an adjacency list representation of an undirected weighted graph with 200 vertices labeled 1 to 200.  Each row consists of the node tuples that are adjacent to that particular vertex along with the length of that edge. For example, the 6th row has 6 as the first entry indicating that this row corresponds to the vertex labeled 6. The next entry of this row "141,8200" indicates that there is an edge between vertex 6 and vertex 141 that has length 8200.  The rest of the pairs of this row indicate the other vertices adjacent to vertex 6 and the lengths of the corresponding edges.

Your task is to run Dijkstra's shortest-path algorithm on this graph, using 1 (the first vertex) as the source vertex, and to compute the shortest-path distances between 1 and every other vertex of the graph. If there is no path between a vertex $v$ and vertex 1, we'll define the shortest-path distance between 1 and $v$ to be 1000000. 

You should report the shortest-path distances to the following ten vertices, in order: 7,37,59,82,99,115,133,165,188,197.  You should encode the distances as a comma-separated string of integers. So if you find that all ten of these vertices except 115 are at distance 1000 away from vertex 1 and 115 is 2000 distance away, then your answer should be 1000,1000,1000,1000,1000,2000,1000,1000,1000,1000. Remember the order of reporting DOES MATTER, and the string should be in the same order in which the above ten vertices are given. The string should not contain any spaces.  Please type your answer in the space provided.

IMPLEMENTATION NOTES: This graph is small enough that the straightforward $O(mn)$ time implementation of Dijkstra's algorithm should work fine.  OPTIONAL: For those of you seeking an additional challenge, try implementing the heap-based version.  Note this requires a heap that supports deletions, and you'll probably need to maintain some kind of mapping between vertices and their positions in the heap.

To implement a binary heap using an array. We can set up the array structure such that for any item whose index is $i$, their children are $2i +1$ and $2i +2$. Therefore for each child, its parent exists at $\lfloor \frac{i-1}{2} \rfloor$

In [1]:
from typing import Tuple, TypeVar, Generic

T = TypeVar('T')

class MinHeap(Generic[T]):
    def __init__(self):
        self.arr: list[T] = []
        self.keys: dict[T, int] = dict()
        self.index: dict[T, int] = dict()
        return

    def heapify():
        # Should be able to run in O(n) time
        return

    def heapify_up(self, start_idx: int):
        if start_idx == 0:
            return
        
        parent_idx = (start_idx -1) // 2

        parent_node = self.arr[parent_idx]
        start_node = self.arr[start_idx]
        
        parent_key = self.keys[parent_node]
        node_key = self.keys[start_node]

        if parent_key > node_key:
            self.arr[parent_idx], self.arr[start_idx] = self.arr[start_idx], self.arr[parent_idx]
            self.index[parent_node], self.index[start_node] = self.index[start_node], self.index[parent_node]

            self.heapify_up(parent_idx)
        
        return
    
    def heapify_down(self, start_idx: int):
        arr_size = len(self.arr)

        left_idx = 2 * start_idx + 1
        right_idx = 2 * start_idx + 2

        start_node = self.arr[start_idx]
        node_key = self.keys[start_node]
        
        if left_idx > arr_size-1:
            return
        
        left_node = self.arr[left_idx]
        left_key = self.keys[left_node]

        if right_idx > arr_size-1:
            smaller_key, smaller_idx, smaller_node = left_key, left_idx, left_node
        else:
            right_node = self.arr[right_idx]
            right_key = self.keys[right_node]

            smaller_key, smaller_idx, smaller_node = (left_key, left_idx, left_node) if left_key < right_key else (right_key, right_idx, right_node)
        
        if node_key > smaller_key:

            self.arr[smaller_idx], self.arr[start_idx] = self.arr[start_idx], self.arr[smaller_idx]
            self.index[smaller_node], self.index[start_node] = self.index[start_node], self.index[smaller_node]

            self.heapify_down(smaller_idx)
        return
    
    def extract_min(self) -> Tuple[T, int]:
        arr_size = len(self.arr)

        if arr_size == 0:
            return None
        
        if arr_size == 1:
            min_element = self.arr.pop()

            self.index.pop(min_element)
            return min_element, self.keys.pop(min_element)
        
        last_element = self.arr[-1]
        
        self.arr[0], self.arr[-1] = self.arr[-1], self.arr[0]
        self.index[last_element] = 0
        
        min_element = self.arr.pop()
        key = self.keys.pop(min_element)
        self.index.pop(min_element)

        self.heapify_down(0)
        
        return min_element, key
    
    def insert(self, item: T, key: int=None):
        size = len(self.arr)

        self.arr.append(item)
        self.index[item] = size
        
        if key == None:
            key = item
        
        self.keys[item] = key
        
        if size > 0:
            self.heapify_up(size)
        return
    
    def delete(self, item: T):
        idx = self.index[item]

        if idx == len(self.arr) -1:
            self.arr.pop()
            self.index.pop(item)
            self.keys.pop(item)
            
            return

        last = self.arr[-1]

        self.arr[idx], self.arr[-1] = self.arr[-1], self.arr[idx]
        self.index[last] = idx
        
        deleted = self.arr.pop()
        self.index.pop(item)
        self.keys.pop(item)
        
        if deleted < last:
            self.heapify_down(idx)
        else:
            self.heapify_up(idx)
            
        return
    
    def contains(self, item: T):
        return item in self.index

    def validate(self):

        def report():
            print(self.arr)
            print(self.index)
            print(self.keys)
            return
        
        if len(self.arr) != len(self.keys.keys()) and len(self.arr) != len(self.index.keys()):
            report()
            return False

        for idx, item in enumerate(self.arr):
            if self.index[item] != idx:
                report()
                return False
        
        nodeStack = [0]
        while len(nodeStack) != 0:
            parent_idx = nodeStack.pop()

            lchild_idx = 2 * parent_idx + 1
            rchild_idx = 2 * parent_idx + 2

            arr_size = len(self.arr)
            
            parent_key = self.keys[self.arr[parent_idx]]

            if lchild_idx > arr_size -1:
                continue
            
            left_key = self.keys[self.arr[lchild_idx]]
            
            if rchild_idx > arr_size -1:
                if parent_key > left_key:
                    report()
                    return False
                else:
                    continue
            
            right_key = self.keys[self.arr[rchild_idx]]
            
            if parent_key <= left_key and parent_key <= right_key:
                nodeStack.append(lchild_idx)
                nodeStack.append(rchild_idx)
            else:
                report()
                return False

        return True

In [2]:
def test_heap():
    heap = MinHeap()
    
    for i in [2, 5, 3, 8, 9, 7, 4, 11, 1, 40, 23, 41, 52]:
        heap.insert(i)
    
    print(f'Insert: {"PASSED" if heap.validate() else "FAILED"}')
    print(f'Contains: {"PASSED" if heap.contains(4) else "FAILED"}')
    
    for i in [3, 8, 7, 1, 11, 23]:
        heap.delete(i)

    print(f'Delete: {"PASSED" if heap.validate() else "FAILED"}')

    heap.extract_min()
    
    print(f'Extract Min: {"PASSED" if heap.validate() else "FAILED"}')

test_heap()

Insert: PASSED
Contains: PASSED
Delete: PASSED
Extract Min: PASSED


In [159]:
g: dict[int, list[Tuple[int, int]]] = dict()

with open('Week 2 DijkstraData.txt', 'r') as f:
    for line in f:
        items = line.split("\t")
        
        head = int(items[0])
        tails = [(int(node), int(length))
                for node,length 
                in [x.split(",") for x in items[1:-1]]]
        
        g[head] = tails

In [160]:
def dijkstra(g: dict[int, list[Tuple[int, int]]]):

    nodeHeap = MinHeap[int]()
    processedNodes = {1: 0}
    
    for tail, edges in g.items():
        if tail == 1:
            for edge in edges:
                head, weight = edge
                nodeHeap.insert(head, weight)
        elif not nodeHeap.contains(tail):
            nodeHeap.insert(tail, 1e6)
    
    while len(processedNodes.keys()) < len(g.keys()):
        
        greedy_node, greedy_score = nodeHeap.extract_min()
        processedNodes[greedy_node] = greedy_score
        
        for edge in g[greedy_node]:
            head, weight = edge
            
            if nodeHeap.contains(head):
                old_key = nodeHeap.keys[head]
                nodeHeap.delete(head)
                
                new_key = min(old_key, greedy_score + weight)
                nodeHeap.insert(head, new_key)
    
    return processedNodes

In [169]:
shortestPaths = dijkstra(g)
required_nodes = [7,37,59,82,99,115,133,165,188,197]
answers = [shortestPaths[x] for x in required_nodes]

In [170]:
answers

[2599, 2610, 2947, 2052, 2367, 2399, 2029, 2442, 2505, 3068]