In [54]:
# setup
from IPython.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Graph Search

### Last Time:

We reviewed basic concepts related to graphs.

### This Time:

We will study Breadth-First Search and Depth-First Search, which are standard algorithms to examine every node in a connected graph by moving along the edges. We will assume that the graphs are connected. Otherwise, we can apply our search methods on each connected component separately. These methods are also useful because they identify the connected components of the graph.

### Breadth-First Search

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif" width=25%/>
</center>

**Input:**
- A connected graph $G$
- source vertex $s$



**Process:**
1. Visit neighbors of $s$.
2. Visit neighbors of neighbors of $s$.
3. Visit neighbors of neighbors of neighbors of $s$ ... (and so forth)

while ensuring that each node is visited only once.



<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif" width=25%/>
</center>

- Nodes visited at step $i$ have graph distance of $i$ from $s$
- BFS proceeds one step at a time, until there are no new neighbors to visit.


## Building BFS

<center>
<img src="figures/bfs_1.png" width=30%/>
</center>

What variables will we need to keep track of?



- `visited` $X$: the nodes already visited, so we don't visit them more than once.
- `frontier` $F$: the nodes to visit next.


<center>
<img src="figures/bfs_1.png" width=30%/>
</center>

At step $i$ of BFS:

- `visited` (written $X_i$) contains all nodes with distance less than $i$ from $s$.
- `frontier` (written $F_i$) contains all nodes with distance exactly $i$ from $s$.
  - $F_i$ is the set of unvisited neighbors of $X_i$.
 

 
e.g., for $i=1$:

- $X_1 = \{a\}$
- $F_1 = \{b,c\}$


<center>
<img src="figures/bfs_1.png" width=30%/>
</center>

How do we update `visited` and `frontier` at each iteration?

- To update `visited`, we add any new values encountered in the frontier:
  - $X_{i+1} = X_i \cup F_i$.


- To update `frontier`, we take the neighborhood of $F_i$ and remove any vertices that have already been visited:
  - $F_{i+1} = N(F_i) \setminus X_{i+1}$.
  - Here, $N(F_i)$ is the neighbors of the nodes in $F_i$.



<center>
<img src="figures/bfs_1.png" width=30%/>
</center>

e.g. for $i=1$:

$X_1 = \{a\}$

$F_1 = \{b,c\}$



update $X$ and $F$:



- $X_2 = \{a\} \cup \{b,c\} = \{a,b,c\}$
    
- $F_2 = \{a, d, e, f, g\} \setminus \{a,b,c\} = \{d,e,f,g\}$
    


In [38]:
from functools import reduce

def bfs_recursive(graph, source):
    #Expects graph to be an adjacency list, and source to be a a vertex.
    #Returns all of the nodes in the graph that are found by BFS. (the connected component containing source)
    def bfs_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            # update visited
            # X_{i+1} = X_i Union F_i
            visited_new = visited | frontier
            print('visiting', (visited_new - visited))
            visited = visited_new

            # update frontier
            # F_{i+1} = N(F_i) \ X_{i+1}
            frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])
            frontier = frontier_neighbors - visited
            return bfs_helper(visited, frontier)

    visited = set()
    frontier = set([source])        
    return bfs_helper(visited, frontier)


<center>
<img src="figures/dfs-graph.jpeg" width="40%"/>
</center>

In [55]:

# same as example graph above
graph = {
            'A': {'B', 'C'},
            'B': {'A', 'D', 'E'},
            'C': {'A', 'F', 'G'},
            'D': {'B'},
            'E': {'B', 'H'},
            'F': {'C'},
            'G': {'C'},
            'H': {'E'}
        }

In [40]:

bfs_recursive(graph, 'A')

visiting {'A'}
visiting {'C', 'B'}
visiting {'F', 'G', 'E', 'D'}
visiting {'H'}


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

## Work of BFS

- We implemented BFS as a recursive function, but there is no obvious recursive equation to calculate the work.
- Instead we add up costs of each step.
- The number of steps is the diameter of the graph.
- But, the work at each step varies depending on how many nodes are in the frontier of that step, $|F_i|$.



What we do know:

- Every reachable node appears in the frontier exactly **once**.
- Likewise, each edge is processed exactly **once**.


How much work is done for each node/edge?



- `visited_new = visited | frontier`
  - Each node is added to the visited set at most once.


- `frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])`
  - Each edge contributes to processing a node in `frontier_neighbors` at most twice (a->b, b->a).
  - All edges are examined at least once, since every node is in the frontier at some point.


- `frontier = frontier_neighbors - visited`
  - Each node is removed from the `frontier` at most once.
  - Each node $v$ is removed from `frontier_neighbors` at most $\Delta(v)$ times.
  - Removing a node from a list of length $n$ takes $O(n)$ work in Python.
  - Checking if a node is in the list `visited` takes $O(|visited|)$ work in Python.
  - By implementing `visited` as a set (hash), checking if a node has been visited can be done in $O(1)$ work.
  - We will see that we can use doubly linked lists so that removing the first node takes constant time.
  - So, if implemented correctly, each node $v$ contributes $O(1)+O(\Delta(v))$ work. 
  - Summing gives $O(V+E)$ work, if everything is implemented properly.




- Therefore the work of BFS is $O(|V| + |E|)$, assuming the graph is connected and we implement frontier as a doubly linked list.



## Parallelism in BFS

There is some limited parallelism possible in BFS. While we must perform the steps sequentially, each step itself can be parallelized.

We can represent a set as a binary search tree, which supports $O(\log n)$ span operations for union, intersection, and difference operations. (See Vol II Ch 17 of the textbook Parallel and Sequential Algorithms for more details). Here, $n$ is the size of the set.

- `visited_new = visited | frontier`
  - $O(\log n)$ span, where $n = \max(|visited|,|frontier|)$.

- `frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])`
  - $O(\log(n)^2)$
    - reduce has $O(\log n)$ span, but the union call at each step has $O(\log n)$ span.

- `frontier = frontier_neighbors - visited`
  - $O(\log n)$ span
  
If the diameter of the graph is $d$, then the span is $O(d \log(|V|)^2)$.

What shape graph results in the worst span?


![figures/chain.png](figures/chain.png)


Span $O(|V| \log(|V|)^2)$.

Note that this is actually slower than $O(|E|+|V|)=O(|V|)$.

### Serial BFS

**Alternatively**: represent frontier with a queue and remove one node at a time.

"first in first out"

- Add newly discovered nodes to the end of the queue.
- At each iteration, remove the first node in the queue.
- Queues cannot be paralellized, since parallel access to the queue would disrupt the notions of "first in" and "first out".

In [41]:
# deque is a double ended queue
# a doubly linked list 
from collections import deque
q = deque()
q.append(1)
q.append(2)
q.append(3)
print(q)
print('popleft returns: %d' %  q.popleft())
print(q)
print('pop returns: %d' %  q.pop())
print(q)

deque([1, 2, 3])
popleft returns: 1
deque([2, 3])
pop returns: 3
deque([2])


In [42]:
# compare with:
a = [1,2,3]
print(a)
print('pop returns', a.pop(0))

[1, 2, 3]
pop returns 1


**What is running time to remove first item of a dynamic array of $n$ items (a list in Python)?**

$O(n)$: Need to shift all elements to the left.

**What is the running time to remove first item of a doubly linked list of $n$ items (a deque in Python)?**

$O(1)$.

See more:
https://wiki.python.org/moin/TimeComplexity



In [43]:
def bfs_serial(graph, source):
    #Has the same behavior has bfs_recursive.
    def bfs_serial_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            node = frontier.popleft()
            print('visiting', node)
            visited.add(node)
            frontier.extend(filter(lambda n: n not in visited, graph[node]))
            return bfs_serial_helper(visited, frontier)

    frontier = deque()
    frontier.append(source)
    visited = set()
    return bfs_serial_helper(visited, frontier)

bfs_serial(graph, 'A')

visiting A
visiting C
visiting B
visiting F
visiting G
visiting E
visiting D
visiting H


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

### Serial BFS Work/Span

Work and span are $O(|V| + |E|)$, since each vertex and edge are visited once.


### How can we keep track of the distance each node is from the source?

In [44]:
def bfs_recursive_depths(graph, source):
    #expects graph to be an adjacency list, source is a vertex. Returns the distances of each vertex to source.
    def bfs_helper_depths(visited, frontier, cur_depth, depths):
        if len(frontier) == 0:
            return depths
        else:
            # update visited
            # X_{i+1} = X_i OR F_i
            visited_new = visited | frontier
            print('visiting', (visited_new - visited))
            # record the depths of visited nodes
            for v in visited_new - visited:
                depths[v] = cur_depth
            visited = visited_new        
            # update frontier
            # F_{i+1} = N(F_i) \ X_{i+1}
            frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])
            frontier = frontier_neighbors - visited
            return bfs_helper_depths(visited, frontier, cur_depth+1, depths)    

    depths = dict()
    visited = set()
    frontier = set([source])
    return bfs_helper_depths(visited, frontier, 0, depths)
     
bfs_recursive_depths(graph, 'A')

visiting {'A'}
visiting {'C', 'B'}
visiting {'F', 'G', 'E', 'D'}
visiting {'H'}


{'A': 0, 'C': 1, 'B': 1, 'F': 2, 'G': 2, 'E': 2, 'D': 2, 'H': 3}



## Depth-First Search

Agenda:

- depth-first search
- comparison with breadth-first search
- cycle detection

<center>
<table border=0>
    <tr style="background-color: #ffffff;"><td><h2>DFS</h2></td><td><h2>BFS</h2></td></tr>
    <tr style="background-color: #ffffff;">
    <td><img src="https://upload.wikimedia.org/wikipedia/commons/7/7f/Depth-First-Search.gif" width=50%/></td>
    <td><img src="https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif" width=100%/></td>
    </tr>
</table>
</center>

[source](https://commons.wikimedia.org/w/index.php?curid=6342841)


While BFS uses a queue, we can implement DFS with a stack: **last in first out**


In [45]:
from collections import deque

def dfs_stack(graph, source):
    def dfs_stack_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            node = frontier.pop()
            print('visiting', node)
            visited.add(node)
            frontier.extend(filter(lambda n: n not in visited, graph[node]))
            return dfs_stack_helper(visited, frontier)
        
    frontier = deque()
    frontier.append(source)
    visited = set()
    return dfs_stack_helper(visited, frontier)
  


<center>
<img src="figures/dfs-graph.jpeg" width=40%/>
</center>
  


In [46]:
graph = {
            'A': {'B', 'C'},
            'B': {'A', 'D', 'E'},
            'C': {'A', 'F', 'G'},
            'D': {'B'},
            'E': {'B', 'H'},
            'F': {'C'},
            'G': {'C'},
            'H': {'E'}
        }



<center>
<img src="figures/dfs-graph.jpeg" width=40%/>
</center>

dfs_stack(graph, 'A')
### Compare with `bfs_serial`!


In [47]:

def bfs_serial(graph, source):
    def bfs_serial_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            node = frontier.popleft() # <==== DIFFERENCE
            print('visiting', node)
            visited.add(node)
            frontier.extend(filter(lambda n: n not in visited, graph[node]))
            return bfs_serial_helper(visited, frontier)
    frontier = deque()
    frontier.append(source)
    visited = set()
    return bfs_serial_helper(visited, frontier)
def dfs_stack(graph, source):
    def dfs_stack_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            node = frontier.pop() # <======== DIFFERENCE
            print('visiting', node)
            visited.add(node)
            frontier.extend(filter(lambda n: n not in visited, graph[node]))
            return dfs_stack_helper(visited, frontier)   
    frontier = deque()
    frontier.append(source)
    visited = set()
    return dfs_stack_helper(visited, frontier)



`dfs_stack`:

- `node = frontier.pop()`


`bfs_serial`:

- `node = frontier.popleft()`



### DFS with recursion


but wait, can't we just use recursion?

recursion maintains a stack of calls automatically.

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/7/7f/Depth-First-Search.gif" width=25%/>
</center>



In [48]:

def dfs_recursive(graph, source):
    
    def dfs_recursive_helper(visited, node):
        if node in visited:
            return visited
        else:
            print('visiting', node)
            visited.add(node)
            iterate(dfs_recursive_helper, visited, list(graph[node]))
            return visited

    visited = set()
    return dfs_recursive_helper(visited, source)

def iterate(f, x, a):
    if len(a) == 0:
        return x
    else:
        return iterate(f, f(x, a[0]), a[1:])



<center>
<img src="figures/dfs-graph.jpeg" width=40%/>
</center>


In [49]:
dfs_recursive(graph, 'A')


visiting A
visiting C
visiting F
visiting G
visiting B
visiting E
visiting H
visiting D


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}


## Cost of DFS

As in DFS, we add a node to the visited set exactly once ($\Theta(|V|)$).

For each edge, we do one lookup to see if it exists in the visited set ($\Theta(|E|)$).

Thus, the total work is equivalent to BFS: $\Theta(|V| + |E|)$.


## Parallelism in DFS?
<center>
<img src="figures/dfs_nop.jpg" width="30%"/>
</center>

Is there any opportunity for parallelism?

One idea is to just run the search for each child in parallel. 
- E.g., in this example, search subtree $a$ and $b$ in parallel

What potential problems arise?



<center>
<img src="figures/dfs_nop.jpg" width="30%"/>
</center>

- We may end up visiting $b$ twice (or $c$, or $f$)
- This isn't in DFS order! We shouldn't be visiting $b$ before $e$.

DFS belongs to a class of problems called **P**-complete: computations that most likely do not admit solutions with **polylogarithmic** span. 
## Cycle detection

How can we modify DFS to determine if the graph has a cycle?



**cycle**: a path in which all nodes are distinct except the first and last
- In an undirected graph, a cycle must contain at least three nodes.


**idea**: determine whether a vertex is visited more than once.

<center>
<img src="figures/triangle.png" width="50%"/>
</center>

Determine whether a vertex is visited more than once but...the second visit must be from a different source.


In [50]:

def dfs_stack_helper(visited, frontier):
    if len(frontier) == 0:
        return visited
    else:
        node = frontier.pop()
        print('visiting', node)
        visited.add(node)
        frontier.extend(filter(lambda n: n not in visited, graph[node]))
        return dfs_stack_helper(visited, frontier)   



e.g., if $a$ is the source, then we will see $b$ twice:
- once when it is added to `visited`
- once in the base case of the recursive call (`if node in visited`), with `c` as the parent.

<center>
<img src="figures/triangle.png" width="50%"/>
</center>

We will see $a$ three times:
- once when it is added to `visited`
- twice in the base case of the recursive call (`if node in visited`)
  - with `b` as the parent
  - with `c` as the parent
  
So, we need to keep track of the parent of each recursive call and make sure not to make a recursive call back to the parent.


In [51]:

def dfs_cycle(graph, source):
    visited = set()

    def dfs_cycle_helper(result, node, parent):
        """
        We pack (visited, has_cycle) variables into a single result variable,
        so we can use iterate.
        """
        visited, has_cycle = result
        if node in visited:
            print('found cycle from %s to %s' % (parent, node))
            return (visited, True)
        else:
            print('visiting', node)
            visited.add(node)
            # ignore the parent!
            neighbors = list(filter(lambda n: n != parent, graph[node]))
            # curry the dfs_cycle_helper function to set the parent variable 
            # to be the node we are visiting now.                         
            fn = lambda r, n: dfs_cycle_helper(r, n, node)
            res = iterate(fn, (visited, has_cycle), neighbors)
            return res
    
    return dfs_cycle_helper((visited, False), source, source)
    


<center>
<img src="figures/dfs-graph.jpeg" width="40%"/>
</center>

In [52]:

dfs_cycle(graph, 'A')
graph2 = {
            'A': {'B', 'C'},
            'B': {'A', 'D', 'E'},
            'C': {'A', 'F', 'G'},
            'D': {'B'},
            'E': {'B', 'H'},
            'F': {'C'},
            'G': {'C', 'A'},  # add cycle back to A from G
            'H': {'E'}
        }


visiting A
visiting C
visiting F
visiting G
visiting B
visiting E
visiting H
visiting D


<center>
<img src="figures/dfs-graph-cycle.jpeg" width="40%"/>
</center>

In [53]:
dfs_cycle(graph2, 'A')

visiting A
visiting C
visiting F
visiting G
found cycle from G to A
visiting B
visiting E
visiting H
visiting D


({'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}, True)