# Depth First Search

### Introduction

In this lesson, we'll show how we can search through the components of a tree using a technique called depth first search.  

* Benefits of searching

With depth first search and breadth first search, we can discover the connected components of a graph.  By that we mean, find the nodes that have a path from one to the other.  This has various use cases from discovering say the extent of someone's social network (think linkedin first to second connections), to discovering links between webpages, to seeing how a supply chain can be affected downstream.

Ok, let's get started.

### Depth first search approach

Let's say that we have the following tree, and that we want to take a depth first search approach.  The approach is to begin at the root node, and fully explore a path down to the leaf node.  So beginning on the left, we could visit `6 -> 8 -> 4` taking us from root node to leaf node.  And then from there visit `7`.  After this we visit the right half of the tree.

<img src="./tree-eg.png" width="40%">

We'll get into more details as to how this works soon.  But first, just notice the idea.  We want to explore each path fully, going from root node to leaf node, and then move down through the next path. 

### Our approach

Ok, so now how do we approach something like this?  Let's start by making our example smaller, by just looking at the right half of tree.

<img src="./smaller-tree.png" width="20%">

Our approach will be to add a node to a stack (beginning with the root node), print the node, and then add that node's direct children to the stack.  

So first add 6, print 6, and add 9 to the stack.  Then print the 9 and add 12 and 11 to the stack.  At this point the stack looks like the following:

`stack = [12, 11]`

We then remove the first element off of the stack, `11` and add it's children to the stack, `3`.  At this point:

`stack = [11, 3]`

So 3 gets printed and then 11.

### Implementing in code

Once again, our tree looks like the following:

<img src="./smaller-tree.png" width="20%">

And we can represent it as the following:

In [5]:
# set start to the root node of '6'
root = '6'

tree = {
    '6': ['9'],
    '9': ['12', '11'],
    '11': ['3']
}

And remember our approach:

* Add a node to a stack (beginning with the root node), process the node, and then add that node's direct children to the stack.

In [6]:
root = '6'

tree = {
    '6': ['9'],
    '9': ['12', '11'],
    '11': ['3']
}

# Add a node to a stack (beginning with the root node)
# process the node,
# and then add that node's direct children to the stack.

# initialize stack
stack = [root]

while stack:
    print('current stack is', stack)
    node = stack.pop()  # Pop a node from the stack.

    print('current node is', node)  # Process the node (e.g., print it).
    # Push all unvisited direct child nodes to the stack.
    for child in tree.get(node, []):
            stack.append(child)

current stack is ['6']
current node is 6
current stack is ['9']
current node is 9
current stack is ['12', '11']
current node is 11
current stack is ['12', '3']
current node is 3
current stack is ['12']
current node is 12


Notice that this code works well.  It follows our approach of moving fully down one path to the leaf nodes before moving onto other paths.

<img src="./smaller-tree.png" width="20%">

And really all we do is get the current node by removing it from the stack, print, and then add the direct children to the stack.  The stack is really the key to DFS, becauase by using last in first out, the effect is to **keep digging** down a path until we reach the leaf node.

Take a second and copy down the code now, making sure to understand each peace.

### One tweak

One issue with our current code is that we may revisit nodes.  For example, take a look at the following structure, we can see that our graph has cycles.

```md
    A
   / \
  B   C
 / \   \
D   E   F
 \ /   /
  G - H
```

This means that our DFS approach can lead to an infinite loop, as our keep digging approach will have it run in a circle.  

So instead we can update our code to keep track of the nodes that we visisted.

In [21]:
def dfs(tree, node):
    stack = [node]
    visited = [] # initialize visited

    while stack:
        node = stack.pop() 
        if node not in visited: # only process node if not in visited
            print('current node is', node)
            visited.append(node) # processing includes adding to visited

            for child in tree.get(node, []):
                if child not in visited:
                    stack.append(child)

In [23]:
node = '6'

tree = {
    '6': ['9'],
    '9': ['12', '11'],
    '11': ['3']
}

dfs(tree, node)

current node is 6
current node is 9
current node is 11
current node is 3
current node is 12


And even with our tree with cycles, our tree is processed correctly.

In [24]:
tree_with_cycles = {
    'A': ['B', 'C'],
    'B': ['D', 'E'],
    'C': ['F'],
    'D': ['G'],
    'E': ['G'],
    'F': ['H'],
    'G': ['H'],
    'H': []
}

```md
    A
   / \
  B   C
 / \   \
D   E   F
 \ /   /
  G - H
```

In [25]:
node = 'A'
dfs(tree_with_cycles, node)

current node is A
current node is C
current node is F
current node is H
current node is B
current node is E
current node is G
current node is D


### Summary

Our approach for DFS is the following:

* Process the current node (print it), and add it's direct children to the stack.  Keep repeating until there are no nodes left.

We then update this approach so that we do not revisit nodes.

In [29]:
def dfs(tree, node):
    stack = [node]
    visited = set() # initialize visited

    while stack:
        node = stack.pop() 
        if node not in visited: # only process node if not in visited
            print('current node is', node)
            visited.add(node) # processing includes adding to visited

            for child in tree.get(node, []):
                if child not in visited:
                    stack.append(child)

In [30]:
dfs(tree, root)

current node is 6
current node is 9
current node is 11
current node is 3
current node is 12


### Resources

[DFS](https://github.com/learn-co-curriculum/graph-dfs)

In [2]:
def dfs_recursive(graph, node, visited=None):
    if visited is None:
        visited = set()
        
    if node not in visited:
        print(node)  # Process the node (e.g., print it).
        visited.add(node)  # Mark the node as visited.
        
        # Recursively visit all unvisited adjacent nodes.
        for neighbor in graph[node]:
            if neighbor not in visited:
                dfs_recursive(graph, neighbor, visited)

dfs_recursive(graph, 'A')

A
B
D
E
F
C
