# Breadth-First and Depth-First Search

### BFS

- Initialise:
  - All vertices `undiscovered`
  - Initialise Queue `Q`
  - `Q.enqueue(root)`
- While Q is not empty:
  - **v** = `Q.dequeue()`
  - For all neighbouring vertices **n**:
  - if **n** is not `discovered`:
    - mark it as `discovered`
    - `Q.enqueue(n)`
    
Here is an example:<br>
<img src="Graphics/bfsexample.png" width="60%" align="left">

##### Notes:
We will show an implementation that instead of an adjacency list uses a dictionary and sets. Here is the reasoning why this implementation choice:
- The main operation for the adj. list would be to get the neighbours, given a node. The fastest way to do this is via a dictionary whose key is the node id.
- Next, the main operation you want to do with the neighbours is to see which of them are undiscovered so as to add them to the queue. So, if you keep a set of discovered nodes, you can compute the set difference between the already discovered and neighbouring nodes. These are the newly discovered nodes.
- Minor point: we will use a Python list as a queue.

In [3]:
def bfs(g, start):
    q = []
    q.insert(0, start)
    distance=0
    predecessor=None
    discovered={start}
        
    while q: # pythonic for a non-empty queue
        popped = q.pop() # pop a node from the list (used here as a stack)
            
        neighbs = g.get(popped) # get the node's neighbours
        
        # sets difference: the undiscovered is the set difference of neighbours and dicsovered:
        undiscovered = neighbs - discovered 
        
        if undiscovered: 
            # add each newly discovered node to the queue and to the set of discovered
            for u in undiscovered: 
                discovered.add(u)
                q.insert(0, u)
            print "set of discovered nodes:", discovered
     
    return discovered

Here is a dictionary/set representation of the example graph shown above:

`G={
    's': set(['w', 'r']),
    'r': set(['s', 'v']),
    'w': set(['s', 't', 'x']),
    't': set(['w', 'x', 'u']),
    'x': set(['w', 't', 'u', 'y']),
    'y': set(['u', 'x']),
    'v': set(['r']),
    'u': set(['x', 'y', 't'])
}`<br>

In [4]:
G={
    's': set(['w', 'r']),
    'r': set(['s', 'v']),
    'w': set(['s', 't', 'x']),
    't': set(['w', 'x', 'u']),
    'x': set(['w', 't', 'u', 'y']),
    'y': set(['u', 'x']),
    'v': set(['r']),
    'u': set(['x', 'y', 't'])
}

### DFS
- Initialise:
  - All vertices `undiscovered`
  - Initialise Stack `S`
  - `S.push(root)`
- While S is not empty:
  - **v** = `S.pop()`
  - For all neighbouring vertices **n**:
  - if **n** is not `discovered`:
    - mark it as `discovered`
    - `S.push(n)`
    
This is the exact same algorithm only instead of queue, we use a stack.

In [5]:
def dfs(g, start):
    s = []
    s.insert(0, start)
    predecessor=None
    discovered={start}
    
    while s: # pythonic for a non-empty list (used here as a stack)
        popped = s.pop(0) # pop a node from the stack:
            
        neighbs = g.get(popped) # get the node's neighbours
        
        # sets difference: the undiscovered is the set difference of neighbours and dicsovered:
        undiscovered = neighbs - discovered #
            
        if undiscovered: 
            # add each newly discovered node to the stack and to the set of discovered
            for u in undiscovered: 
                discovered.add(u)
                s.insert(0, u)
            print "set of discovered nodes:", discovered
            
    return discovered

### Comparison

Here is a possible execution that shows the difference between BFS and DFS on the above example (it is not necessarilly exactly how the actual program will unfold):<br>
<img src="Graphics/graphds5.png" width="40%" align="left">

In [6]:
bfs(G, 's')

set of discovered nodes: set(['s', 'r', 'w'])
set of discovered nodes: set(['s', 'r', 'w', 'v'])
set of discovered nodes: set(['s', 'r', 't', 'w', 'v', 'x'])
set of discovered nodes: set(['s', 'r', 'u', 't', 'w', 'v', 'y', 'x'])


{'r', 's', 't', 'u', 'v', 'w', 'x', 'y'}

In [7]:
dfs(G, 's')

set of discovered nodes: set(['s', 'r', 'w'])
set of discovered nodes: set(['x', 's', 'r', 't', 'w'])
set of discovered nodes: set(['s', 'r', 'u', 't', 'w', 'x'])
set of discovered nodes: set(['s', 'r', 'u', 't', 'w', 'y', 'x'])
set of discovered nodes: set(['s', 'r', 'u', 't', 'w', 'v', 'y', 'x'])


{'r', 's', 't', 'u', 'v', 'w', 'x', 'y'}

### Exercises
#### Find the shortest path between two nodes

We will tweak BFS to keep track of the predecessor for each discovered node. At the end we will backtrack the path.

Here is an example of how the two trackers work in the code below. Say we want to find the path from a to s:<br>
`track=[a f g e g t v k s b j]`<br>
`prede=[0 a r l f o g u v n g]`<br>
The predecessor of the target s (add s to the path) is v (add v to the path), of v is g (add g to the path), of g is f (add  f to the path), of f is a and a is the target (add a to the path).

In [14]:
def bfs_paths(g, start, target):
    q = []
    q.insert(0, start)
    
    discovered={start}
    
    # We will keep track of all nodes in the order they get discovered
    tracker=[start]
    # We will keep the predecessor of each discovered node in a separate list
    # We will use this predecessor list to build the path
    previous=['0'] # the start element does not have a predecessor
    
    # This code is standard BFS
    while q: 
        popped = q.pop() 
            
        neighbs = g.get(popped) 
        
        undiscovered = neighbs - discovered 
            
        if undiscovered: 
            for u in undiscovered: 
                discovered.add(u)
                tracker.append(u) # Here we keep track of the newly discovered node
                previous.append(popped) # and its predecessor. The rest of the code remain the same
                q.insert(0, u)
    
    # We will now use the trackers to backtrack the path from the aim to the start
    path=[]
    size=len(tracker)
    
    # Start from the end and...
    i=size-1    
    prev = previous[i]
    
    # ...locate the aim node and its predecessor in the tracker and predecessor list respectively
    while tracker[i]!=target:
        i-=1
    t=tracker[i]
    p=previous[i]
    path.insert(0,t)
    
    # Then get the predessor for each node in the path until the start
    while i>=0:
        while tracker[i]!=p and i>=0:
            i-=1
        path.insert(0, tracker[i])
        p=previous[i]
        if tracker[i]==start: break
    
    print path

In [15]:
bfs_paths(G, 'v', 't')

['v', 'r', 's', 'w', 't']


#### Given a "maze" in the form of a graph, find the path to the exit from a given start
This is a very similar problem to the above, ony it is solves with DFS. Use DFS to to span the graph. Use a tracker and a predecessor tracker as above and batrack the path from the "exit" to the start, after DFS runs. You can optimise by stopping DFS when "exit" is reached