# Graph Exploration

One of the fundamental operations we wish to perform over a graph is to explore it!

This is often referred to as **Graph Search** since it can and is often used to search for information within a graph. 

A **graph search** is a traversal over the graph. Starting from some source vertex $s$, visit every vertex that is reachable from $s$.

## General Algorithm

If we consider a general traversal of a graph, at any given point, we can consider every vertext to be in one of three sets:

- **visited**: the set of vertices already visited
- **frontier**: the unvisited neighbors of the visited vertices
- **unseen**: everything else

A general graph traversal algorithm would then be:


- while there are vertices left to visit:
    - visit an unvisited vertex from the frontier
    - remove it from the fontier
    - add all its unvisited neighbors to the frontier
    - add it to visited


# Breadth First Search (BFS)

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif" width=20%/>
</center>

In breadth first search, the vertices are traversed across the levels starting from a source vertex $s$. In other words, after $s$ is visited, all vertices one step away from it are visited, then all vertices two steps away, and so on.

To implement this, we need structures for the `frontier` and for `visited`. 

For the `visited` structure, our two major operations `visited` will be to insert each vertex into it when we visit them and to check that a each vertex is not in `visited` before adding it to the frontier. We'll use a set. Python's sets achieves on average $O(1)$ for the insertion and contains operations.



> When we first learned about python's built-in data structures (lists, sets, tuples, and dictionaries), we focused on their syntax and how to use them. When deciding which one to use for an application, their respective runtimes are the deciding factors and so it is useful to become familiar with them. Python's wiki page on [Time Complexity](https://wiki.python.org/moin/TimeComplexity) gives the runtimes for built-in python data structures. 
> 
> In this reference you will see that tuples are omitted, but that a structure known as a **deque** is included. This is a "double-ended queue", pronounced "deck", and is actually python's implementation of a Linked List! It can be used as a Stack or Queue depending on the operations used. Tuples are omitted, because they are simply immutable lists. As you've seen, tuples are mostly used to pass around or store multiple pieces of data as a single variable.

For the `frontier`, you might notice in the animation above that we visit the vertices in the same order in which we added them to the `frontier`. This is exactly the behavior of a Queue. Therefore, we can use a Queue for the frontier and its behavior will allow us to visit the vertices in exactly the order we want!

### Pseudocode

We can sketch this out pretty easily.

We'll start by creating `visited` and `frontier` and by adding the source vertex $s$ to the frontier. When we implement this, we will write it as a function which takes in the graph and source vertex.

Then, while the frontier isn't empty, visit the next vertex and add all its unvisited neighbors to the frontier.

``` python
visited = set()
# We'll use a deque as a queue
frontier = deque() # use ops `append` and `popleft`

frontier.append(s)

while len(frontier) > 0:
    # Remove the next vertex from the frontier
    # add it to visited
    # all all unvisited neighbors to the frontier
# return the set of visited vertices
```

### Implementation

In [13]:
from collections import deque

def breadth_first_search(graph, source):
    visited = set()
    frontier = deque() # use ops `append` and `popleft`
    frontier.append(source)
    while len(frontier) > 0:
        v = frontier.popleft()
        
        visited.add(v)
        print("Visiting vertex {}".format(v))
        for neighbor in graph[v]:
            if neighbor not in visited:
                frontier.append(neighbor)
        
    return visited


In [14]:

# Same as animated example
graph = {
            'A': ['B', 'C'],
            'B': ['A', 'D', 'E'],
            'C': ['A', 'F', 'G'],
            'D': ['B'],
            'E': ['B', 'H'],
            'F': ['C'],
            'G': ['C'],
            'H': ['E']
        }
        
breadth_first_search(graph, 'A')

Visiting vertex A
Visiting vertex B
Visiting vertex C
Visiting vertex D
Visiting vertex E
Visiting vertex F
Visiting vertex G
Visiting vertex H


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

### Tracking Distances

We say that a vertex $a$ which is adjacent to $v$ is distance $1$ away from $v$. If two vertices are separated by two edges, then they are distance $2$ away from each other.

More generally, we can define the distance between vertices as the number of edges on the shortest path between them.

Breadth First Search visits all vertices that are one step away from the source before visiting all vertices that are two steps away from the source, and so on. We can thus modify Breadth First Search to calculate and return the distances from every vertex to the source.

For practice, modify Breadth First Search given above so that instead of returning a set of the visited vertices, it returns a dictionary whose keys are vertices and values are the distance of that vertex from the source vertex.

In [None]:
from collections import deque

def breadth_first_search_distances(graph, source):
    distances = {}
    # TO-DO: Modify BFS to store the distance from
    # each vertex to source in the dictionary distances.
    # You may copy the implementation of BFS above to use
    # as a starting point
        
    return distances

# Depth First Search (DFS)

Whereas Breadth First Search visits every vertex across each level before moving on to the next, Depth First Search visits a chain of vertices starting from the source as far as it can go before moving to the next neighbor of the source.

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/7/7f/Depth-First-Search.gif" width=25%/>
</center>

Depth First Search can be used to detect cycles in graphs: if a vertex can be visited from multiple paths from the root, there must be a cycle. [Among other uses](https://en.wikipedia.org/wiki/Depth-first_search#Applications), Depth First Search can also serve as a strategy for generating or solving mazes.

Depth First Search is similar to Breadth First Search, except that we use the frontier differently.

In BFS, we visit vertices in the same order that we add them to the frontier, treating the frontier like a Queue.

In DFS, we want to visit vertices further down a path before visiting those along a level. In the animation above, when visiting vertex $2$, vertex $3$ is added to the frontier and then is the immediate next one to visit. The behavior we want is that the last vertex added to the frontier will be the next vertex visited. This is the behavior of a stack!

### Pseudocode

The algorithm for Depth First Search is VERY similar to BFS.

``` python
visited = set()
# We'll use a deque as a stack
frontier = deque() # use ops `append` and `pop`

frontier.append(s)

while len(frontier) > 0:
    # Remove the next vertex from the frontier, the top of the stack
    # add it to visited
    # all all unvisited neighbors to the frontier
# return the set of visited vertices
```

### Implementation

In [17]:
from collections import deque

def depth_first_search(graph, source):
    visited = set()
    frontier = deque() # treat as a stack
    frontier.append(source)
    while len(frontier) > 0:
        v = frontier.pop()
        
        visited.add(v)
        print("Visiting vertex {}".format(v))
        for neighbor in graph[v]:
            if neighbor not in visited:
                frontier.append(neighbor)
        
    return visited

In [20]:

# Same as animated example
graph = {
            1: [9, 5, 2],
            2: [1, 3],
            3: [2, 4],
            4: [3],
            5: [8, 6],
            6: [5, 7],
            7: [6],
            8: [5],
            9: [1, 10],
            10: [9]
        }
        
depth_first_search(graph, 1)

Visiting vertex 1
Visiting vertex 2
Visiting vertex 3
Visiting vertex 4
Visiting vertex 5
Visiting vertex 6
Visiting vertex 7
Visiting vertex 8
Visiting vertex 9
Visiting vertex 10


{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

## Connectedness

We can use either BFS or DFS to discover connected components in graphs. Both search algorithms start from a source vertex and return the set of all vertices that are reachable from that vertex. 

If the whole graph is connected, then the set returns will contain all vertices in the graph. If the graph is not connected, then the set will neccesarily be a subset of the vertices in the graph. 

To get all connected components, the search can be run iteratively over the whole graph until all vertices are seen.



# Where to go from here

Our introduction to graphs is short, but they are incredibly applicable and useful data structures. We've seen how to represent graphs and used the two major graph search algorithms: Breadth First Search and Depth First Search.

These algorithms can be used to solve a variety of problems, but there are many more graph algorithms that you may find useful in the future. Searching graphs is just a first step. 



Other useful algorithms include:

**Path Finding Algorithms**
- Given a source vertex, find the shortest paths from it to all (or particular) vertices in the graph.
    - [Dijkstra's Algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm)
    - [Bellman-Ford Algorithm](https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm)
    - [Johnson's Algorithm](https://en.wikipedia.org/wiki/Johnson%27s_algorithm)

**Minimum Spanning Trees**
- Identify a set of edges that connect all vertices but that have no cycles within them.
- This is useful for designing utilities (electricity should reach all customers and there should be no short circuits in the distribution network!).
- Internet traffic routing is another application.
    - [Prim's Algorithm](https://en.wikipedia.org/wiki/Prim%27s_algorithm)
    - [Kruskal's Algorithm](https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)
    - [Boruvka's Algorithm](https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm)

**[Maximum Cliques](https://en.wikipedia.org/wiki/Clique_problem)/[Independent Sets](https://en.wikipedia.org/wiki/Independent_set_(graph_theory))**

- These two problems are inverses of one another. A maximum clique is the largest set of vertices in a graph that are completely connected. That is, every pairs of vertices is directly connected by an edge. 
- An Independent Set is a set of vertices where no pair of vertices share an edge.
    - There are no known efficient algorithms to solve these problems. All algorithms so far invented are some form of a brute force search.

And the list goes on.



It can be very profitable to build up a familiarity with the types of problems that can be solved using graphs. With that familiarity, when you are investigating a new problem, if you can represent it as a graph and run a well-known algorithm to address your problem, you can very quickly make headway and get results.

Given their generality and the abundance of algorithms than can operate on them, graphs can be a very powerful tool in your problem solving toolbox.
