# Depth-First Search

The depth-first search algorithm adopts a strategy that suggests, as its name implies, to go "deeper" in the graph whenever possible. 

The strategy explores all the edges connected to the most recently discovered vertex $v$. Once all the edges are explored, the search backtracks to explore the remaining edges coming out of the vertex from which $v$ was discovered. This process repeats until all the vertices reachable from the source are reached. 

### Procedure

```
func DFS(G, u) {
    u.visited = true
    for each v in G.adj[u]
        if v.visited == false
            DFS(G, v)
}
```

As we can see from the above psudocode, the DFS itself doesn't do anything other than traversing the graph. In many situations, that is not sufficient to solve a particular problem. 

This requires us to do extra work in each state of the traversing, which in turn may require extending the state representation, as we will see in the examples below. 

## An Extremely Simple Example

### **Minimum Depth of Binary Tree** ([Leetcode 111](https://leetcode.com/problems/minimum-depth-of-binary-tree/description/))

Given a binary tree, find its minimum depth.

The minimum depth is the number of nodes along the shortest path from the root node down to the nearest leaf node.

Note: A leaf is a node with no children.

**Sample**

Given binary tree `[3,9,20,null,null,15,7]`
```
    3
   / \
  9  20
    /  \
   15   7
```

**Output** Minimum depth=2

**Analysis**

Each node (vertex) has a depth, which is equal to the depth of its parent + 1. A depth of the tree is equivalent to the depth of a leaf node, therefore the minimum depth of a tree can be found by finding the minimum depth amongst all the leaf nodes.

One approach is to calculate the depths for all nodes, starting from the root. In this approach, we denote $DFS(u, d)$ as calculating all the nodes' depths in the subtree rooted with $u$, whose depth is $d$. By repeatedly traversing all the states in this manner, we will consequently have all the depth assigned. 

$DFS(u, d) \to DFS(left_u, d+1), DFS(right_u, d+1)$

When a node is a leaf node, we compare its depth with the current optimal answer, if it's lesser, update the answer. A leaf node is defined as a node without any children. 

In [None]:
optimal_answer = 1 << 20

def DFS(u, d):
    if u is None:
        return
    if u.left is None and u.right is None:
        optimal_answer = min(optimal_answer, d)
        return
    DFS(u.left, d + 1)
    DFS(u.right, d + 1)

DFS(root, 1)
if optimal_answer == 1 << 20:
    optimal_answer = 0

Another approach can also achieve the same goal. 

Let's consider the state $f(u)$ as the minimum depth of the subtree rooted at $u$, by the problem statement and the definition of a binary tree, we can easily derive the state as $f(u)=min\big\{ f(left_u), f(right_u) \big\} + 1$, with the depth of a leaf node equals 1. 

In [None]:
def DFS(u):
    if u.left is None and u.right is None:
        return 1
    if u.left is not None:
        depth = DFS(u.left)
    if u.right is not None:
        depth = min(depth, DFS(u.right))
    depth += 1
    return depth

answer = DFS(root)

## There's more to that

The depth-first search may have been discovered and composed as an algorithm running on a graph. However, with proper abstraction and mathematical modelling, the depth-first search algorithm can be utilized to solve nearly every problem, and the key to this is breaking the original problem into multiple states with topological orders, which can be abstracted as vertices in a graph. The depth-first search traverses on this graph until it reaches the final state. 

Let's see an example which, by the first glance, we would not likely to categorize it into a graph problem. 

### Number of Islands ([Leetcode 200](https://leetcode.com/problems/number-of-islands/description/))

Given a 2d grid map of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

**Sample 1**
```
11110
11010
11000
00000```
**Output:** 1

**Sample 2**
```
11000
11000
00100
00011
```
**Output:** 3

**Analysis**

The problem asks to count the number of islands in a given 0-1 matrix where two cells belong to the same land if they are horizontally or vertically adjacent. 

One approach is to abstract each cell as a vertex in a graph, each has a value $\in \{0, 1\}$. For a "land" cell, there is an edge to its adjacent "land" cell for all 4 directions (up, down, left, right). 

After abstracting the whole matrix to a graph, we can repeatedly find a "land" vertex that has not been colorized yet, assign it and all its directly and indirectly connected vertices with a same color. 

After scanning through the whole graph, simply count the number of colors appearing in that graph and we get the number of islands. 

We can think of coloring a connected block of vertices as pouring water into one vertex and have the the water flood through all other reachable vertices. With this imagination in mind, we define the state as $f(u, c)$, where $u$ represents the current vertex we are pouring water into, and the color of the water is $c$. This methodology is called **flood fill**.

```
func flood_fill(u, c) {
    color[u] = c
    for v: u and v is connected and v is land
        if (v has not been colored yet)
            flood_fill(v, c)
}

func main_process(matrix) {
    m = matrix.row_number
    n = matrix.column_number
    color = 0
    for i = 1 to m
        for j = 1 to n
            if (matrix[i][j].color == null)
                flood_fill(matrix[i][j], color)
                color = color + 1
}
```

In [1]:
class Solution:
    def valid_position(self, x, y):
        m = len(self.grid)
        n = len(self.grid[0]) if m else 0
        return 0 <= x < m and 0 <= y < n

    def flood_fill(self, x, y):
        if not self.valid_position(x, y) or self.grid[x][y] != '1' or self.visited[x][y]:
            return
        self.visited[x][y] = True
        steps = [(-1, 0), (0, 1), (1, 0), (0, -1)]
        for (xstep, ystep) in steps:
            x_, y_ = x + xstep, y + ystep
            self.flood_fill(x_, y_)

    def numIslands(self, grid):
        self.grid = grid
        self.visited = [[False for _ in range(len(line))] for line in grid]
        m = len(grid)
        n = len(grid[0]) if m else 0
        lands = 0
        for i in range(m):
            for j in range(n):
                if not self.visited[i][j] and grid[i][j] == '1':
                    self.flood_fill(i, j)
                    lands += 1
        return lands

In [2]:
s = Solution()
grid_raw = [
    '11000',
    '11000',
    '00100',
    '00011'
]
grid = []
for i, line in enumerate(grid_raw):
    grid.append([])
    for j, ch in enumerate(line):
        grid[i].append(ch)
answer = s.numIslands(grid)
print(answer)

3


### Concatenated Word ([Leetcode 472](https://leetcode.com/problems/concatenated-words/description/))

Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words.
A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

**Note**
1. The number of elements of the given array will not exceed **10,000**
2. The length sum of elements in the given array will not exceed **600,000**.
3. All the input string will only include lower case letters.
4. The returned elements order does not matter.

**Sample Input**
`["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]`

**Sample Output**
`["catsdogcats","dogcatsdog","ratcatdogcat"]`

**Analysis**

Apparently we need to come up with an algorithm to check if each word is a concatenated word. We also know from the **Note** that there are at most 10K words, and by the Leetcode convention, the program must produce results in 1 second, that's approximately $10^7$ operations, which means an algorithm with the overall time complexity $O(n^2)$ and up is not accepted. 

Therefore, our algorithm for checking individual words should have the complexity upperbound of $O(logn)$.

The process of validating a word can be broken up into 2 subroutines:

1. check if there exists $i$ so that the prefix of word $word[0...i]$ is one of the words in the input list
2. if (1) stands true, then all we need to do is to validate the rest of the word $word[i+1...n]$, meaning repeatedly practice the process on the remainder of the word.

This composes a search graph, and the initial state is the original word. Having this initial state as the starting vertex in the search graph, we can derive the subsequent vertices, each of which has a state that holds a suffix of the original word, given the corresponding prefix is in the input list.

**Our task then is converted to finding a path describing how the original word can be gradually disassembled using components in the given word list.**

In [3]:
class Solution:
    def dfs(self, word, word_count):
        # if the current word's length = 0, it means that we have found all the components that 
        # concatenates to the original word, if it's more than one word, then we have successfully
        # found a right path that disassemble the original word
        if len(word) == 0:
            if word_count > 1:
                return True
            return False
        success = False
        # checking for all possible prefixes
        for prefix in self.words:
            if len(prefix) == 0:
                continue
            if not word.startswith(prefix):
                continue
            length = len(prefix)
            # if the remaining word can find a path, it means the current word can find a path, 
            # terminating the loop
            success = self.dfs(word[length:], word_count + 1)  
            if success:
                break
        return success

    def findAllConcatenatedWordsInADict(self, words):
        self.words = words
        answer = []
        for word in self.words:
            if self.dfs(word, 0):
                answer.append(word)
        return answer

s = Solution()
words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]
answer = s.findAllConcatenatedWordsInADict(words)
print(answer)

['catsdogcats', 'dogcatsdog', 'ratcatdogcat']


From the above code we can successfully solve the problem, but it has an important issue:

1. 