# Dijkstra

### WHY

Time Complexity of Dijkstra's Algorithm is O ( V^2 ) but with min-priority queue it drops down to O ( V + E \*l o g V ).

- **Queue**
  1. Adding all |V| vertices to Q takes O(|V|) time.
  2. Removing the node with minimal dist takes O(|V|) time, and we only need O(1) to recalculate dist[u] and update Q. Since we use an adjacency matrix here, we’ll need to loop for |V| vertices to update the dist.
  3. The time taken for each iteration of the loop is O(|V|), as one vertex is deleted from Q per loop.
  4. Thus, total time complexity becomes O(|V|) + O(|V|) \* O(|V|) = O(|V|^2).
- **Binary Heap**
  1. It takes O(|V|) time to construct the initial priority queue of |V| vertices.
  2. With adjacency list representation, all vertices of the graph can be traversed using BFS. Therefore, iterating over all vertices’ neighbors and updating their dist values over the course of a run of the algorithm takes O(|E|) time.
  3. The time taken for each iteration of the loop is O(|V|), as one vertex is removed from Q per loop.
  4. The binary heap data structure allows us to extract-min (remove the node with minimal dist) and update an element (recalculate dist[u]) in O(log|V|) time.
  5. Therefore, the time complexity becomes O(|V|) + O(|E| _ log|V|) + O(|V| _ log|V|), which is O((|E|+|V|) _ log|V|) = O(|E| _ log|V|), since |E| >= |V| - 1 as G is a connected graph.

### WHAT

Dijkstra's shortest path algorithm. It was conceived while Dijkstra was on a shopping day with his girl friend and they stopped for a coffee break. He was wondering how an algorithm would express the shortest time required to visit all the destinations of interest.

### HOW

#### High-Level

1. We calculate the cost of travelling to a vertex one-degree away.
2. We prioritize the cheapest routes we can see from a source vertex _u_ to it's neighbors _v_. It's important to realize that the "cost" to travel, is not the cost from only _u_ to _v_, but rather, that cost of travelling from the starting vertex _s_ to _v_ via _u_. The total cost's are saved in a central state (typically called _distance_) which is tallying the total cost from _s_ to some vertex _v_. The distance value represents the **minimum total cost** so far. So if we want to know what's the cost to travel to all nodes, as cheap as possible? well we would simply sum up the total costs in the _distance_ set and we'd have our answer.
3. If we wanted to know the cheapest cost to travel from a starting node to an ending node, well, that answer will be whatever key value pair exists in the _distance_ set.
4. Dijkstra is essentially a breadth-first search algorithm, but it cherry-picks from the Queue, the node that has the cheapest cost from all of the observations made so far. So it's important to intuitively understand, that what exists in the Queue, are vertices _v_ that have been observed from a source _u_, that's it. It's the algorithm's job to determine where to go next, by choosing the cheapest edge, currently in the Queue.
5. _Proof that it works_: We need to essentially look 2 edges ahead, while simultaneously knowing how much cost is required to arrive at the current destination. Said another way, we need to remember where we've come from, and be sure that it was the cheapest route, and we need to look 2 degrees ahead to know there's no surprises in the future. If we look to far ahead, we could waste time, so typically if an edge is skipped in the Queue in favor of cheaper edges, and more vertices are enqueue'd due to that skipping, we can be confident that the skipping won't happen too much, because eventually the cheapest total cost will include the edge that's been sitting in the Queue for a while and was skipped a few times. This dynamic is the responsiblity of the _distance_ set, and the **get_min** algorithm.

#### Low-Level

1. Initialize the graph generating the following
   1. Adjaceny Map: Keeps track of neighbors _v_ for a given vertex _u_
   2. Visited Set: Keeps track of each source vertex _u_ we've analyzed and finished inspecting.
   3. Minimum Costs dictionary: Keeps track of the minimum cost for a given vertex (key). Initially all values will be set to Infinity.
2. We start the process by enqueuing the initial node and setting it's _costs_ value to `0`. After we `q.pop()`, we traverse all the neighbors, and calculate the distance _thru_ the current node _u_, for all the univisited neighbor nodes. For example ...
   ```python
   pq = [start]
   while pq:
       u = self.get_min(pq)
       self.visited.add(u)
       for v, cost in self.adj_map.get(u):
           cost_u, cost_v = self.costs(u), self.costs(v)
           if cost_v > cost_u + cost:
               self.costs[v] = cost_u + cost
               pq.append(v)
   ```
   **Note 1** : We allow the control flow to revisit nodes already visited, in order to fully relax the edges if a cheaper cost is found.
3. When we pop off the Q, we can pop using a priority Q/minimum heap. This will ensure that out of all the available edges we have, we will always pick the most cheapest edge. Also, it's important to realize that
   - **Note** We don't have to implement our own min-heap. Python provides a heap out of the box. `heapq` The Api would look something like the following
     - If you have an initial list of elements then you have to heapify that initial list.
       ```python
       import heapq
       pq = [...initial_list]
       heapq.heapify(pq)
       ```
     - If you have a single value, and you dynamically push and pop from the pq, then...
       ```python
       import heapq
       pq = [start_vertex]
       while pq:
           u = heapq.heappop(pq)
           ...
           heapq.heappush(pq, next_vertex)
       ```
     - You can also use tuples as values within the heap's nodes. Just make sure that the first value in the tuple is the value you want to evaluate/sort on.
       ```python
       heapq.heappush(pq, (sort_value, non_sort_value))
       ```


In [1]:
class Graph:
    def __init__(self, edges, V):
        self.V = V
        self.adj_map = Graph.build_graph(V, edges)
        self.visited = set()
        self.costs = {i: float("inf") for i in range(1, V + 1)}

    def dijkstra(self, start):
        self.visited.add(start)
        self.costs[start] = 0
        pq = [start]
        while pq:
            u = self.get_min(pq)
            self.visited.add(u)
            for v, cost in self.adj_map.get(u):
                cost_u, cost_v = self.costs(u), self.costs(v)
                if cost_v > cost_u + cost:
                    self.costs[v] = cost_u + cost
                    pq.append(v)

    def get_min(self, q):
        for i in range(len(q) // 2 - 1, -1, -1):
            self.heapify(q, i, len(q))
        self.swap(q, 0, -1)
        return q.pop()

    def heapify(self, q, i, size):
        min_i, l, r = i, 2 * i + 1, 2 * i + 2
        cost_min = self.costs[q[min_i]]
        if l < size and self.costs[q[l]] < cost_min:
            min_i = l
        if r < size and self.costs[q[r]] < cost_min:
            min_i = r
        if min_i != i:
            self.swap(q, min_i, i)
            self.heapify(q, min_i, size)

    @staticmethod
    def swap(a, l, r):
        a[l], a[r] = a[r], a[l]

    @staticmethod
    def build_graph(v, edges):
        adj_map = {i: set() for i in range(1, v + 1)}
        for u, v, c in edges:
            adj_map[u].add((v, c))
        return adj_map

    def get_max_cost(
        self,
    ):  # NOTE - example of how to extract some solution post-analysis
        unvisited = set(list(self.adj_map.keys())) - self.visited
        max_cost = max(self.costs.values())
        return -1 if max_cost in [float("inf"), 0] or unvisited else max_cost

In [None]:
import heapq


class Graph:
    def __init__(self, edges, V):
        self.V = V
        self.adj_map = Graph.build_graph(V, edges)
        self.visited = set()
        self.min_cost = {i: float("inf") for i in range(1, V + 1)}

    def dijkstra(self, start):
        self.visited.add(start)
        self.min_cost[start] = 0
        pq = [start]
        while pq:
            u = heapq.heappop(pq)
            self.visited.add(u)
            for v, cost in self.adj_map.get(u):
                distance_u, distance_v = self.min_cost(u), self.min_cost(v)
                if distance_v > distance_u + cost:
                    self.min_cost[v] = distance_u + cost
                    heapq.heappush(v)
        return self.get_max_cost()

    def get_max_cost(
        self,
    ):  # NOTE - example of how to extract some solution post-analysis
        unvisited = (
            set(list(self.adj_map.keys())) - self.visited
        )  # unvisited should be empty
        max_cost = max(
            self.min_cost.values()
        )  # max_cost should be finite - it may be counter-intuitive why we're looking for max cost, but it's because we're looking for the longest path, and the longest path is the one with the highest cost (in this case)
        return (
            -1 if max_cost in [float("inf"), 0] or unvisited else max_cost
        )  # if max_cost is infinite or 0, or if there are still unvisited nodes, then we return -1, otherwise we return max_cost

    def get_final_cost(self, end):
        return self.min_cost[end]

    @staticmethod
    def build_graph(v, edges):
        adj_map = {i: set() for i in range(1, v + 1)}
        for u, v, c in edges:
            adj_map[u].add((v, c))
        return adj_map

#### Terse | Lite

A very fast & light version of Dijkstra is available below. In order to verify that all nodes have been traversed, additional logic would be necessary, but is easily accomplished by taking the set intersection between the `adj_map.keys()`, and the values in the `visited` set. If the sets are disjoint, then all nodes have been visited.


In [9]:
import heapq


def dijkstra(graph, size, start):
    adj_map, costs, visited = {}, {}, set()
    for i in range(size):
        adj_map[i], costs[i] = set(), float("inf")
    for u, v, w in graph:
        adj_map[u].add((v, w))
    pQ = [(0, start)]
    while pQ:
        total_cost, u = heapq.heappop(pQ)
        if u in visited:
            continue
        visited.add(u)
        for v, v_cost in adj_map.get(u):
            travel_cost = total_cost + v_cost
            if costs[v] > travel_cost:
                costs[v] = travel_cost
                heapq.heappush(pQ, (travel_cost, v))
    return costs, visited, adj_map


graph = [[0, 1, 5], [0, 2, 3], [0, 3, 7], [1, 4, 2], [2, 4, 2], [3, 4, 4]]
size = 5
start = 0
dijkstra(graph, size, start)

({0: inf, 1: 5, 2: 3, 3: 7, 4: 5},
 {0, 1, 2, 3, 4},
 {0: {(1, 5), (2, 3), (3, 7)},
  1: {(4, 2)},
  2: {(4, 2)},
  3: {(4, 4)},
  4: set()})