Chapter 21 Graph Search

In [1]:
from ds2.graph import EdgeSetGraph

class Graph(EdgeSetGraph):  # make sure we are using directed graph
    pass

In [2]:
# preorder graph traversal

def printall(G, v):
    print(v)
    for n in G.nbrs(v):
        printall(G, n)

G = Graph({1,2,3,4}, {(1,2), (1,3), (1,4)})
printall(G, 1)

1
2
3
4


This is fine for a tree, but it quickly gets very bad as soon as there is
a cycle. In that case, there is nothing in the code to keep us from going
around and around the cycle. We will get a RecursionError.

In [3]:
def printall(G, v, visited):
    visited.add(v)
    print(v)
    for n in G.nbrs(v):
        if n not in visited:
            printall(G, n, visited)

G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,1)})
printall(G, 1, set())

1
2
3
4


This is the most direct generalization of a recursive tree traversal into
something that also traverses the vertices of a graph.

21.1 Depth-First Search<br>

A depth-first search (or DFS) of a graph G starting from a vertex v
will visit all the vertices connected to v. It will always prioritize moving
”outward” in the direction of new vertices, backtracking as little as possible.
The printall method above prints the vertices in a depth-first order.
Below is the general form of this algorithm.

In [4]:
def dfs(G, v):
    visited = {v}
    _dfs(G, v, visited)
    return visited

def _dfs(G, v, visited):
    for n in G.nbrs(v):
        if n not in visited:
            visited.add(n)
            _dfs(G, n, visited)

G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,2)})
print("reachable from 1:", dfs(G, 1))
print("reachable from 2:", dfs(G, 2))

reachable from 1: {1, 2, 3, 4}
reachable from 2: {2, 3, 4}


With this code, it will be easy to check if two vertices are connected.

In [5]:
def connected(G, u, v):
    return v in dfs(G, u)

In [6]:
G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,2)})
print("1 is connected to 4:", connected(G, 1, 4))
print("4 is connected to 3:", connected(G, 4, 3))
print("4 is connected to 1:", connected(G, 4, 1))

1 is connected to 4: True
4 is connected to 3: True
4 is connected to 1: False


It's possible to modify our dfs code to provide not only the set of con-
nected vertices, but also the paths used in the search. The idea is to store
a dictionary that maps vertices to the previous vertex in the path from the
starting vertex.

In [7]:
def dfs(G, v):
    tree = {v: None}
    _dfs(G, v, tree)
    return tree

def _dfs(G, v, tree):
    for n in G.nbrs(v):
        if n not in tree:
            tree[n] = v
            _dfs(G, n, tree)

G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,2)})
print("dfs tree from 1:", dfs(G, 1))
print("dfs tree from 2:", dfs(G, 2))

dfs tree from 1: {1: None, 2: 1, 3: 2, 4: 3}
dfs tree from 2: {2: None, 3: 2, 4: 3}


In [8]:
def dfs(G, v):
    tree = {v: None}
    _dfs(G, v, tree)
    return tree

def _dfs(G, v, tree):
    for n in G.nbrs(v):
        if n not in tree:
            tree[n] = v
            _dfs(G, n, tree)

G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,2)})
print("dfs tree from 1:", dfs(G, 1))
print("dfs tree from 2:", dfs(G, 2))

dfs tree from 1: {1: None, 2: 1, 3: 2, 4: 3}
dfs tree from 2: {2: None, 3: 2, 4: 3}


21.2 Removing the Recursion<br>

The dfs code above uses recursion to keep track of previous vertices, so
that we can backtrack (by returning) when we reach a vertex from which
we can't move forward. To remove the recursion, we replace the function
call stack with our own stack.

In [9]:
from ds2.graph import Graph

def dfs(self, v):
    tree = {}
    tovisit = [(None, v)]
    while tovisit:
        a, b = tovisit.pop()
        if b not in tree:
            tree[b] = a
            for n in self.nbrs(b):
                tovisit.append((b, n))
    return tree


G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,2)})
print("dfs tree from 1:", dfs(G, 1))
print("dfs tree from 2:", dfs(G, 2))

dfs tree from 1: {1: None, 2: 1, 4: 2, 3: 4}
dfs tree from 2: {2: None, 4: 2, 3: 4, 1: 2}


21.3 Breadth-First Search<br>

We get another important traversal by replacing the stack with a queue. In
this case, the search prioritizes breadth over depth, resulting in a breadth-
first search of BFS.

In [10]:
from ds2.queue import ListQueue as Queue

def bfs(G, v):
    tree = {}
    tovisit = Queue()
    tovisit.enqueue((None, v))
    while tovisit:
        a, b = tovisit.dequeue()
        if b not in tree:
            tree[b] = a
            for n in G.nbrs(b):
                tovisit.enqueue((b, n))
    return tree

G = Graph({1,2,3,4}, {(1,2), (2,3), (3,4), (4,2)})
print("bfs tree from 1:", bfs(G, 1))
print("bfs tree from 2:", bfs(G, 2))

bfs tree from 1: {1: None, 2: 1, 3: 2, 4: 2}
bfs tree from 2: {2: None, 3: 2, 1: 2, 4: 2}


In [11]:
def distance(G, u, v):
    tree = bfs(G, u)
    if v not in tree:
        return float('inf')
    edgecount = 0
    while v is not u:
        edgecount += 1
        v = tree[v]
    return edgecount

G = Graph({1,2,3,4,5}, {(1,2), (2,3), (3,4), (4,5)})
print("distance from 1 to 5:", distance(G, 1, 5))
print("distance from 2 to 5:", distance(G, 2, 5))
print("distance from 3 to 4:", distance(G, 3, 4))

distance from 1 to 5: 4
distance from 2 to 5: 3
distance from 3 to 4: 1


21.4 Weighted Graphs and Shortest Paths<br>

In the single source, all shortest paths problem, the goal is to find the
shortest path from every vertex to a given source vertex. If the edges are
assumed to have the same length, then BFS solves this problem. However, it
is common to consider weighted graphs in which a (positive) real number
called the weight is assigned to each edge. We will augment our graph ADT
to support a function wt(u,v) that returns the weight of an edge. Then,
the weight of a path is the sum of the weights of the edges on that path.

In [12]:
from ds2.graph import AdjacencySetGraph
from ds2.priorityqueue import PriorityQueue

class Digraph(AdjacencySetGraph):
    def addedge(self, u, v, weight = 1):
        self._nbrs[u][v] = weight

    def removeedge(self, u, v):
        del self._nbrs[u][v]

    def addvertex(self, v):
        self._V.add(v)
        self._nbrs[v] = {}

    def wt(self, u, v):
        return self._nbrs[u][v]

In [13]:
from ds2.graph import Digraph

class Graph(Digraph):
    def addedge(self, u, v, weight = 1):
        Digraph.addedge(self, u, v, weight)
        Digraph.addedge(self, v, u, weight)
    
    def removeedge(self, u, v):
        Digraph.removeedge(self, u, v)
        Digraph.removeedge(self, v, u)
    
    def edges(self):
        E = {frozenset(e) for e in Digraph.edges(self)}
        return iter(E)

One nice algorithm for the single source, all shortest paths problem on
weighted graphs is called **Dijkstra's algorithm**. It looks a lot like DFS and
BFS except now, the stack or queue is replaced by a priority queue. The ver-
tices will be visited in order of their distance to the source. These distances
will be used as the priorities in the priority queue.<br>

We'll see two different implementations. The first, although less efficient
is very close to DFS and BFS. Recall that in those algorithms, we visit
the vertices, recording the edges used in a dictionary and adding all the
neighboring vertices to a stack or a queue to be traversed later. We'll do
the same here except that we'll use a priority queue to store the edges to be
searched. We'll also keep a dictionary of the distances from the start vertex
that will be updated when we visit a vertex. The priority for an edge (u,v)
will be the distance to u plus the weight of (u,v). So, if we use this edge,
the shortest path to v will go through u. In this way, the tree will encode
all the shortest paths from the start vertex. Thus, the result will be not
only the lengths of all the paths, but also an efficient encoding of the paths
themselves.

In [14]:
def dijkstra(G, v):
    tree = {}
    D = {v: 0}
    tovisit = PriorityQueue()
    tovisit.insert((None,v), 0)
    for a,b in tovisit:
        if b not in tree:
            tree[b] = a
            if a is not None:
                D[b] = D[a] + G.wt(a,b)
            for n in G.nbrs(b):
                tovisit.insert((b,n), D[b] + G.wt(b,n))
    return tree, D

def path(tree, v):
    path = []
    while v is not None:
        path.append(str(v))
        v = tree[v]
    return ' --> '.join(path)

def shortestpaths(G, v):
    tree, D = dijkstra(G, v)
    for v in G.vertices():
        print('Vertex', v, ':', path(tree, v), ", distance = ", D[v])

G = Graph({1,2,3}, {(1,2, 4.6), (2, 3, 9.2), (1, 3, 3.1)})
shortestpaths(G, 1)
print('------------------------')
# Adding an edge creates a shortcut to vertex 2.
G.addedge(3, 2, 1.1)
shortestpaths(G, 1)

Vertex 1 : 1 , distance =  0
Vertex 2 : 2 --> 1 , distance =  4.6
Vertex 3 : 3 --> 1 , distance =  3.1
------------------------
Vertex 1 : 1 , distance =  0
Vertex 2 : 2 --> 3 --> 1 , distance =  4.2
Vertex 3 : 3 --> 1 , distance =  3.1


21.5 Prim’s Algorithm for Minimum Spanning Trees<br>

Recall that a subgraph of an undirected graph G = (V, E) is a spanning
tree if it is a tree with vertex set V . For a weighted graph, the weight
of a spanning tree is the sum of the weights of its edges. The Minimum
Spanning Tree (MST) Problem is to find a spanning tree of an input
graph with minimum weight.<br>

To find an algorithm for this problem, we start by trying to describe
which edges should appear in the minimum spanning tree. That is, we
should think about the object we want to construct first, and only then can
we think about how to construct it.

In [15]:
def prim(G):
    v = next(iter(G.vertices()))
    tree = {}
    tovisit = PriorityQueue()
    tovisit.insert((None, v), 0)
    for a, b in tovisit:
        if b not in tree:
            tree[b] = a
            for n in G.nbrs(b):
                tovisit.insert((b,n), G.wt(b,n))
    return tree

In [16]:
G = Graph({1,2,3,4,5}, {(1, 2, 1),
                        (2, 3, 1),
                        (1, 3, 2),
                        (3, 4, 1),
                        (3, 5, 3),
                        (4, 5, 2),
                        })
mst = prim(G)
sp, D = dijkstra(G, 1)
print(mst)
print(sp)

{1: None, 2: 1, 3: 2, 4: 3, 5: 4}
{1: None, 2: 1, 3: 1, 4: 3, 5: 3}


21.6 An optimization for Priority-First search<br>

When thinking about how to
improve an algorithm, an easy first place to look is for wasted work. In
this case, we can see that many edges added to the priority queue are later
removed without being used, because they lead to a vertex that has already
been visited (by a shorter path).<br>

The idea is to store vertices rather than edges in the priority queue.
Then, we’ll use the changepriority method to update an entry when we
find a new shorter path to a given vertex. Although we won’t know the
distances at first, we’ll store the shortest distance we’ve seen so far. If we
find a shortcut to a given vertex, we will reduce it’s priority and update the
priority queue. Updating after finding a shortcut is called edge relaxation.<br>

It works as follows. The distances to the source are stored in a dictionary D
that maps vertices to the distance, based on what we’ve searched so far. If
we find that D[n] > D[u] + G.wt(u,n), then it would be a shorter path to
n if we just took the shortest path from the source to u and appended the
edge (u,n). In that case, we set D[n] = D[u] + G.wt(u,n) and update
the priority queue. Note that we had this algorithm in mind when we added
changepriority to our Priority Queue ADT.

In [17]:
def dijkstra2(G, v):
    tree = {v: None}
    D = {u: float('inf') for u in G.vertices()}
    D[v] = 0
    tovisit = PriorityQueue(entries = [(u, D[u]) for u in G.vertices()])
    for u in tovisit:
        for n in G.nbrs(u):
            if D[u] + G.wt(u,n) < D[n]:
                D[n] = D[u] + G.wt(u,n)
                tree[n] = u
                tovisit.changepriority(n, D[n])
    return tree, D
from ds2.graph import Digraph
V = {1,2,3,4,5}
E = {(1,2,1),
(2,3,2),
(1,3,2),
(3,4,2),
(2,5,2)
}
G = Digraph(V, E)
tree, D = dijkstra2(G, 1)
print(tree, D)

{1: None, 2: 1, 3: 1, 5: 2, 4: 3} {1: 0, 2: 1, 3: 2, 4: 4, 5: 3}


Visualize Prims algorithm [here](https://visualgo.net/en/mst)<br>
and Visualize dijkstra algorithm [here](https://visualgo.net/en/sssp)