In [10]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Review and Graph Search


Today's Agenda

- Review of Graph and Graph Search
- BFS Analsysis
- DFS 

## Graph Review

#### Graph $G$: Node $V$, Edge $E$, Neighbor $\mathcal{N}(e)$

<center>
<img src="figures\graph_ex.png" width=70%/>
</center>


####  Graph Representation 
  - Adjacency matrix
  - Edge Sets
  - Map of Neighbors
```python
graph = {
            'A': {'B', 'C'},
            'B': {'A', 'D'},
            'C': {'A', 'F'},
            'D': {'B', 'E'},
            'E': {'D'},
            'F': {'C'}
        }
```

### Graph Search

- Is node $t$ reachable from node $s$ [Lab 8, Task 1]
- Is the graph *connected*? Components? [Lab 8, Task 2 and 3]
- Shortest path from $s$ to $t$ [BFS for Unweighted Graph]

<br>

##### Three Sets Considered
- **visited**: the set of vertices already visited
- **frontier**: the unvisited neighbors of the visited vertices
- **unseen**: everything else


<center>
<img src="figures\graph_search_ex.png" width=50%/>
</center>

##### Three Algorithms
- Breadth-first Search
- Depth-first Search
- Priority-first Search [Weighted Graph]

In [2]:
from functools import reduce


# same as example graph above
graph = {
            'A': {'B', 'C'},
            'B': {'A', 'D', 'E'},
            'C': {'A', 'F', 'G'},
            'D': {'B'},
            'E': {'B', 'H'},
            'F': {'C'},
            'G': {'C'},
            'H': {'E'}
        }


def bfs_recursive(graph, source):
    
    def bfs_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            # update visited
            # X_{i+1} = X_i OR F_i
            visited_new = visited | frontier
            print('visiting', (visited_new - visited))
            visited = visited_new

            # update frontier
            # F_{i+1} = N(F_i) \ X_{i+1}
            frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])
            frontier = frontier_neighbors - visited
            return bfs_helper(visited, frontier)

    ## Start Here
    ## Initialize visited and frontier
    visited = set()
    frontier = set([source])        
    return bfs_helper(visited, frontier)
    

bfs_recursive(graph, 'A')

visiting {'A'}
visiting {'C', 'B'}
visiting {'E', 'F', 'D', 'G'}
visiting {'H'}


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

## Work of BFS

- We will simply add up costs of each level.
- Work done at each level varies depending on how many nodes it contains.

What we do know:

- Every reachable node appears in the frontier exactly **once**
- Likewise, each edge is processed exactly **once**

How much work is done for each node/edge?

- `visited_new = visited | frontier`
  - each node is added to the visited set at most once
- `frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])`
  - each edge is added to `frontier_neighbors` at most twice (a->b, b->a)
- `frontier = frontier_neighbors - visited`
  - each node is removed from the frontier at most once.


- Therefore work is $O(|V| + |E|)$



## Parallelism in BFS

There is some limited parallelism possible in BFS. While we must visit each level sequentially, at each level we can parallelize the set operations:

`visited_new = visited | frontier`

`frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])`

`frontier = frontier_neighbors - visited`

We can represent a set as a binary search tree, which supports $O(\lg n)$ span operations for union, intersection, and difference operations. (See [Vol II Ch 17](https://www.diderot.one/courses/43/books/185/part/334/chapter/2689) for more details).

So, the first and third lines have $O(\lg n)$ span, but the second has $O(\lg^2 n)$.
  - reduce has $O(\lg n)$ span, but the union call at each step has $O(\lg n)$ span
  
If the distance from the source to the most distant node is $d$, then the span is $O(d \lg^2 n)$

Question: When will BFS stop searching?


### Serial BFS

**Alternatively**: represent frontier with a queue, and remove one node at a time.

"first in first out"

- add newly discovered nodes to the end of the list
- at each iteration, remove the first node in the list


In [2]:
# deque is a double ended queue
# a doubly linked list 
from collections import deque
q = deque()
q.append(1)
q.append(2)
q.append(3)
print(q)
print('popleft returns: %d' %  q.popleft())
print(q)
print('pop returns: %d' %  q.pop())
print(q)

deque([1, 2, 3])
popleft returns: 1
deque([2, 3])
pop returns: 3
deque([2])


In [4]:
def bfs_serial(graph, source):
    def bfs_serial_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            node = frontier.popleft()
            print('visiting', node)
            visited.add(node)
    #         for n in graph[node]:
    #             if n not in visited:
    #                 frontier.append(n)
            # in parallel
            frontier.extend(filter(lambda n: n not in visited, graph[node]))
            return bfs_serial_helper(visited, frontier)

    frontier = deque()
    frontier.append(source)
    visited = set()
    return bfs_serial_helper(visited, frontier)

bfs_serial(graph, 'A')

visiting A
visiting B
visiting C
visiting D
visiting E
visiting F
visiting G
visiting H


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

### Serial BFS Work/Span

Work and span are $O(|V| + |E|)$, since each vertex and edge are visited once.


### How can we keep track of the distance each node is from the source?

- Shortest Path for Unweighted Graph


In [2]:
def bfs_recursive_depths(graph, source):

    def bfs_helper_depths(visited, frontier, cur_depth, depths):
        if len(frontier) == 0:
            return depths
        else:
            # update visited
            # X_{i+1} = X_i OR F_i
            visited_new = visited | frontier
            print('visiting', (visited_new - visited))
            # record the depths of visited nodes
            for v in visited_new - visited:
                depths[v] = cur_depth
            visited = visited_new        
            # update frontier
            # F_{i+1} = N(F_i) \ X_{i+1}
            frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])
            frontier = frontier_neighbors - visited

            return bfs_helper_depths(visited, frontier, cur_depth+1, depths)    

    depths = dict()
    visited = set()
    frontier = set([source])
    return bfs_helper_depths(visited, frontier, 0, depths)
    

        
bfs_recursive_depths(graph, 'A')

visiting {'A'}
visiting {'C', 'B'}
visiting {'G', 'F', 'D', 'E'}
visiting {'H'}


{'A': 0, 'C': 1, 'B': 1, 'G': 2, 'F': 2, 'D': 2, 'E': 2, 'H': 3}

<center>
<img src="figures\graph_search_ex.png" width=50%/>
</center>

While BFS uses a queue, we can implement DFS with a stack

**last in first out**


In [2]:
from collections import deque

def dfs_stack(graph, source):
    def dfs_stack_helper(visited, frontier):
        if len(frontier) == 0:
            return visited
        else:
            node = frontier.pop()
            print('visiting', node)
            visited.add(node)
            frontier.extend(filter(lambda n: n not in visited, graph[node]))
            return dfs_stack_helper(visited, frontier)
        
    frontier = deque()
    frontier.append(source)
    visited = set()
    return dfs_stack_helper(visited, frontier)
    
graph = {
            'A': {'B', 'C'},
            'B': {'A', 'D', 'E'},
            'C': {'A', 'F', 'G'},
            'D': {'B'},
            'E': {'B', 'H'},
            'F': {'C'},
            'G': {'C'},
            'H': {'E'}
        }

dfs_stack(graph, 'A')

visiting A
visiting C
visiting F
visiting G
visiting B
visiting E
visiting H
visiting D


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

### Compare with `bfs_serial`!

`dfs_stack`:

- `node = frontier.pop()`


`bfs_serial`:

- `node = frontier.popleft()`

## Cost of DFS

As in BFS, we add a node to the visited set exactly once ($|V|$).

For each edge, we do one lookup to see if it exists in the visited set ($|E|$).

Thus, the total work is equivalent to BFS: $O(|V| + |E|)$.



## Parallelism in DFS?
<img src="figures/dfs_nop.jpg" width="30%"/>

Is there any opportunity for parallelism?

One idea is to just run the search for each child in parallel. 
- E.g., in this example, search the subtree starting at $a$ in parallel with the subtree starting at $b$

What potential problems arise?

- We may end up visiting $b$ twice (or $c$, or $f$)
- This isn't in DFS order! We shouldn't be visiting $b$ before $e$.

DFS belongs to a class of problems called **P**-complete: computations that most likely do not admit solutions with **polylogarithmic** span. 