In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Graph Search - Shortest Path


## Weighted graphs

Up to now, we have focused on unweighted graphs. 

For many problems, we need to associate real-valued **weights** to each edge.

E.g., consider a graph where nodes are cities and edges represent the distance between them. 

<img src="figures/weighted.png" width=70%/>

The **weight of a path** in the graph is the sum of the weights of the edges along that path.

The **shortest weighted path** (or just **shortest path**) between **s** and **e** is the one with minimal weight.

**What is the shortest path from s to e**?

We saw that we can use BFS to get the distance from the source to each node.

<span style="color:red">**Question:**</span> Can we use BFS to solve the shortest path problem for weighted graphs?

<img src="figures/bfs_fail.png" width=50%/>

BFS will:
- visit b
- visit a
- but, will not visit path from a to b, since it doesn't visit a node more than once

Thus, BFS will not discover that the shortest path from $s$ to $b$ is $s \rightarrow a \rightarrow b$.

## SSSP: Single-Source Shortest Path

Given a weighted graph $G=(V,E,w)$ and a source vertex $s$, the single-source shortest path (SSSP) problem is to find a shortest weighted path from $s$ to every other vertex in $V$.

Consider this figure:

<img src="figures/subpaths.png" width="40%"/>

<span style="color:red">**Question:**</span> Suppose that an oracle has told us the shortest paths from $s$ to all vertices except for the vertex $v$, shown in red squares. How can we find the shortest path to $v$?



Let's define $\delta_G(i,j)$ as the weight of shortest path from $i$ to $j$ in graph $G$. Then:

$$
\begin{align}
\delta_G(s,v) = \min(&\delta_G(s,a)+3,\\
&\delta_G(s,b)+6,\\
&\delta_G(s,c)+5 )
\end{align}
$$



### sub-paths property [Optimal Substructure]
> any sub-path of a shortest path is itself a shortest path. 

The sub-paths property makes it possible to construct shortest paths from smaller shortest paths. 

## Dijkstra's property

For any partitioning of vertices $V$ into $X$ and $Y = V \setminus X$ with $s \in X$:

If $p(v) = \min_{x \in X} (\delta_G(s,x) + w(x,v))$, then

$$\min_{y \in Y} p(y) = \min_{y \in Y} \delta_G(s, y)$$

<center>
<img src="figures/dijkstra_example.jpg" width=50%/>
</center>

> The overall shortest-path weight from $s$ via a vertex in $X$ directly to a neighbor in $Y$ (in the frontier) is as short as any path from $s$ to any vertex in $Y$


This property suggest that we can start with shortest paths to a node frontier, then extend them beyond the frontier to get longer, shortest paths.

But, what order should we visit nodes? Consider this graph again:

<center>
<img src="figures/bfs_fail.png" width=50%/>
</center>

If we visit $b$ before $a$, we will still not discover that the shortest path from $s$ to $b$ is $s \rightarrow a \rightarrow b$.

Instead, we must visit nodes in increasing distance from the source.



## Dijkstra's Algorithm

Assume we know the shortest paths to $\{a,b,c,e\}$. We can then use these to determine whether $u$ or $v$ is closer to $s$.

<center>
<img src="figures/distance.png" width=50%/>
</center>

The idea of Dijkstra's algorithm is:
- Maintain a visited set of vertices whose distances have already been computed correctly.
- Calculate distances to each node in the frontier.
- Extend the frontier by visiting the closest vertex.

<br>
<br>
<br>

<span style="color:red">**Question:**</span> Is this Greedy Algorithm or Dynamic Programming?


The final algorithm can be viewed as an instance of **priority-first search**, using the path length as the priority criterion.

1. Initialize frontier to $(s, 0)$
2. While frontier not empty:
  - pop from the frontier the node $v$ with minimum distance $d$ from the source.
  - set $result(v) = d$ to be the weight of the shortest path from $s$ to $v$
  - For each neighbor $x$ of $v$ with edge weight $w$, add $x$ to frontier with distance $d + w$
3. return $result$

<center>
    <img src="figures/dijkstra-0.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-1.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-2.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-3.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-4.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-5.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-6.jpg" width=50%/>
</center>

<center>
    <img src="figures/dijkstra-7.jpg" width=50%/>
</center>

In [1]:
# Heaps in Python
from heapq import heappush, heappop 
  
# Creating empty heap 
heap = [] 
  
# Adding items to the heap using heappush function 
heappush(heap, (10, 'a')) 
heappush(heap, (30, 'b')) 
heappush(heap, (20, 'c')) 
heappush(heap, (400, 'd')) 
print("Head value of heap : "+str(heappop(heap)))
print("Head value of heap : "+str(heappop(heap)))
print("Head value of heap : "+str(heappop(heap)))
print("Head value of heap : "+str(heappop(heap)))

Head value of heap : (10, 'a')
Head value of heap : (20, 'c')
Head value of heap : (30, 'b')
Head value of heap : (400, 'd')


In [2]:
# Creating empty heap 
heap1 = [] 
  
# Adding items to the heap using heappush function 
heappush(heap1, ('a',10)) 
heappush(heap1, ('b',30)) 
heappush(heap1, ('c',20)) 
heappush(heap1, ('d',400)) 
print("Head value of heap : "+str(heappop(heap1)))
print("Head value of heap : "+str(heappop(heap1)))
print("Head value of heap : "+str(heappop(heap1)))
print("Head value of heap : "+str(heappop(heap1)))

Head value of heap : ('a', 10)
Head value of heap : ('b', 30)
Head value of heap : ('c', 20)
Head value of heap : ('d', 400)


In [3]:
# Heaps in Python
from heapq import heappush, heappop 

def dijkstra(graph, source):
    def dijkstra_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            # Pick next closest node from heap
            distance, node = heappop(frontier)
            print('visiting', node)
            if node in visited:
                # Already visited, so ignore this longer path
                return dijkstra_helper(visited, frontier)
            else:
                # We now know the shortest path from source to node.
                # insert into visited dict.
                visited[node] = distance
                print('...distance=', distance)
                # Visit each neighbor of node and insert into heap.
                # We may add same node more than once, heap
                # will keep shortest distance prioritized.
                for neighbor, weight in graph[node]:
                    heappush(frontier, (distance + weight, neighbor))                
                return dijkstra_helper(visited, frontier)
        
    frontier = []
    heappush(frontier, (0, source))
    visited = dict()  # store the final shortest paths for each node.
    return dijkstra_helper(visited, frontier)

graph = {
            's': {('a', 1), ('c', 5)},
            'a': {('b', 2)}, # 'a': {'b'},
            'b': {('c', 1), ('d', 5)}, 
            'c': {('d', 3)},
            'd': {},
            'e': {('d', 0)}
        }
dijkstra(graph, 's')

visiting s
...distance= 0
visiting a
...distance= 1
visiting b
...distance= 3
visiting c
...distance= 4
visiting c
visiting d
...distance= 7
visiting d


{'s': 0, 'a': 1, 'b': 3, 'c': 4, 'd': 7}

### Correctness of Dijkstra's Algorithm

The algorithm maintains an invariant that each visited element $x \in X$ contains the shortest path from $s$ to $x$.
- That is, `visited[x]` $=\delta_G(s,x)$


- We know this is true after visiting the source, since $\delta_G(s,x)=$ `visited[x]` $=0$
- Dijkstra's property ensures that each element we remove from the heap also maintains this property

## Work of Dijkstra's Algorithm

The two key lines are:

```python
distance, node = heappop(frontier)
```

and


```python
for neighbor, weight in graph[node]:
    heappush(frontier, (distance + weight, neighbor))
```    

What is work and span of `heappop` and `heappush`?

$O(\lg n)$ work and span for each, for a heap of size $n$.

How many times will we call these functions?

Once per edge, since a node may be added to the heap multiple times for each edge.

Thus, the total work and span is $O(|E| \log |E|)$


Note that we assume constant time `dict` operations:
- `visited[node] = distance`
- `for neighbor, weight in graph[node]:`
- These result in an additional $O(|V| + |E|)$ work, but are dominated by the above.

Because this is a serial algorithm, the span is also $O(|E| \log |E|)$

## Question

Consider the directed graph shown in the figure below. There are multiple shortest paths between vertices S and T. Which one will be reported by Dijkstra’s shortest path algorithm? Assume that, in any iteration, the shortest path to a vertex v is updated only when a strictly shorter path to v is discovered.

<center>
    <img src="figures/quiz1.png" width=50%/>
</center>

Choose one?

- SDT 
- SBDT 
- SACDT 
- SACET


### How to solve shortest, shortest path problem??