# Python Algorithms
## Chapter 5 Traversal: The Skeleton Key of Algorithmics
+ traversal: discovering, and later visiting, all the nodes in a graph.   

Finding the connected components of a graph. A graph is connected if there is a path from each node to each of the others and if the connected components are the maximal subgraphs that are connected.   
One way of finding a connected component would be to start 
at some place in the graph and gradually grow a larger connected subgraph until we can’t get any further. Let’s look at the following related problem. Show that you can order the nodes in a connected graph, $v_1, v_2, . .., v_n$, so that for any $i = 1.. .n$, the subgraph over $v_1, . .. , v_i$ is connected. If we can show this and we can figure out how to do the ordering, we can go through all the nodes in a connected component and know when they’re all used up.  
we need to get from $i–1$ to $i$. We know that the subgraph over the $i–1$ first nodes is connected. Because there are paths between any pair of nodes, consider a node $u$ in the first $i–1$ nodes and a node $v$ in the remainder. On the path from $u$ to $v$, consider the last node that is in the component we’ve built so far, as well as the first node outside it. Let’s call them $x$ and $y$. Clearly there must be an edge between them, 
so adding $y$ to the nodes of our growing component keeps it connected,


In [15]:
G = {
    'a':set('e'),
    'b':set('efg'),
    'c':set('df'),
    'd':set('cf'),
    'e':set('abf'),
    'f':set('bcde'),
    'g':set('b'),
    '1':set('23'),
    '2':set('13'),
    '3':set('12')
}

In [12]:
def walk(G,start='d'):
    visited,ToVisit = dict(), set()
    ToVisit.add(start)
    visited[start] = None
    while ToVisit:
        u = ToVisit.pop()
        for v in G[u].difference(visited):
            ToVisit.add(v)
            visited[v] = u  #the parent node of this node in the traversal tree
    return visited
walk(G)


{'d': None, 'c': 'd', 'f': 'd', 'e': 'f', 'b': 'f', 'a': 'e', 'g': 'b'}

In [17]:
def componnets(G): #Find all connected componnets
    comp = []
    seen = set()
    for node in G:
        if node in seen:
            continue
        C = walk(G,node)
        seen.update(C)
        comp.append(C)
    return comp
componnets(G)

[{'a': None, 'e': 'a', 'f': 'e', 'b': 'e', 'c': 'f', 'd': 'f', 'g': 'b'},
 {'1': None, '2': '1', '3': '1'}]

### A Walk in the Park
#### No Cycles Allowed
#### How to Stop Walking in Circles
+ start walking in any direction, backtracking whenever you came to a dead end or an intersection you had already walked through

In [42]:
G = {
    'a':set('e'),
    'b':set('efg'),
    'c':set('df'),
    'd':set('cf'),
    'e':set('abf'),
    'f':set('bcde'),
    'g':set('b')
}

In [27]:
def DFS_rec(G,start,visited = []):
    visited.append(start)
    print(visited)
    for n in G[start]:
        if n in visited:
            continue
        DFS_rec(G,n,visited)
DFS_rec(G,'b')

['b']
['b', 'e']
['b', 'e', 'a']
['b', 'e', 'a', 'f']
['b', 'e', 'a', 'f', 'd']
['b', 'e', 'a', 'f', 'd', 'c']
['b', 'e', 'a', 'f', 'd', 'c', 'g']


#### Go Deep!

In [30]:
def DFS_it(G,start):
    visited, ToVisit = set(),[]
    ToVisit.append(start)
    while ToVisit:
        u = ToVisit.pop()
        if u in visited:
            continue
        visited.add(u)
        ToVisit.extend(G[u])
        yield u
list(DFS_it(G,'a'))

['a', 'e', 'b', 'g', 'f', 'c', 'd']

#### Depth-First Timestamps and Topological Sorting

In [39]:
def DFS_time(G,d=dict(),f=dict(),start='a',visited = [],t=0): #DFS with timestamp
    visited.append(start) 
    print('d:{}'.format(d))
    d[start] = t #discovery time
    t += 1
    for n in G[start]:
        if n in visited:
            continue
        t = DFS_time(G,d,f,start=n,visited=visited,t=t)
    f[start] = t # finish time
    t += 1
    print('f:{}'.format(f))
    return t
DFS_time(G)

d:{}
d:{'a': 0}
d:{'a': 0, 'e': 1}
d:{'a': 0, 'e': 1, 'f': 2}
d:{'a': 0, 'e': 1, 'f': 2, 'd': 3}
f:{'c': 5}
f:{'c': 5, 'd': 6}
d:{'a': 0, 'e': 1, 'f': 2, 'd': 3, 'c': 4}
d:{'a': 0, 'e': 1, 'f': 2, 'd': 3, 'c': 4, 'b': 7}
f:{'c': 5, 'd': 6, 'g': 9}
f:{'c': 5, 'd': 6, 'g': 9, 'b': 10}
f:{'c': 5, 'd': 6, 'g': 9, 'b': 10, 'f': 11}
f:{'c': 5, 'd': 6, 'g': 9, 'b': 10, 'f': 11, 'e': 12}
f:{'c': 5, 'd': 6, 'g': 9, 'b': 10, 'f': 11, 'e': 12, 'a': 13}


14

In [61]:
def DFS_topsort(G): 
# This can be used to sort the nodes of a general graph by decreasing finish times,
# when looking for strongly connected components
    visited,res=set(),[]
    def rec(u):
        if u in visited:
            return
        visited.add(u)
        for v in G[u]:
            rec(v)
        res.append(u) #finished exploring its children, add to res
    for n in G:
        rec(n)
    res.reverse()
    return res
print(DFS_topsort(G))

['a', 'e', 'f', 'b', 'g', 'd', 'c']


#### Infinite Mazes and Shortest (Unweighted) Paths
+ If we’re looking for the shortest paths (disregarding edge weights, for now) from our start node to all the others, DFS will, most likely, give us the wrong answer
+ iterative deepening depth-first search, or IDDFS, and it simply consists of running a depth-constrained DFS with an iteratively incremented depth limit
+ There is really only one situation where IDDFS would be preferable over BFS: when searching a huge tree (or some state space “shaped” like a tree). Because there are no cycles, we don’t need to remember which nodes we’ve visited, which means that IDDFS needs only store the path back to the starting node. BFS, on the other hand, must keep the entire fringe in memory (as its queue), and as long as there is some branching, this fringe will grow exponentially with the distance to the root. In other words, in these cases IDDFS can save a significant amount of memory, with little or no asymptotic slowdown.

In [53]:
def idDFS(G,start):
    yielded = set() #visited
    def rec(G,s,depth,visited = set()):
        if s not in yielded:
            yield s
            yielded.add(s)
        if depth == 0: 
            return  # max depth reached
        visited.add(s)
        for u in G[s]:
            if u in visited:
                continue
            for v in rec(G,u,depth=depth-1,visited=visited):
                yield v
    n = len(G)
    for d in range(n):
        if len(yielded) == n: #all nodes visited
            break
        for u in rec(G,start,d):
            yield u
list(idDFS(G,'a'))

['a', 'e', 'f', 'b']

In [58]:
from collections import deque
def bfs(G,start):
    visited,ToVisit = {start:None},deque([start])
    while ToVisit:
        u = ToVisit.popleft()
        for v in G[u]:
            if v in visited:
                continue
            visited[v] = u
            ToVisit.append(v)
    return visited
bfs(G,'b')

{'b': None, 'e': 'b', 'f': 'b', 'g': 'b', 'a': 'e', 'd': 'f', 'c': 'f'}

### Strongly Connected Component
+ A connected component is a maximal subgraph where all nodes can reach each other if you ignore edge directions (or if the graph is undirected). To get strongly connected components, though, you need to follow the edge directions; so, SCCs are the maximal subgraphs where there is a directed path from any node to any other.
+ In fact, in general, if there is an edge from any strong component X to another strong component Y, the last finish time in X will be later than the latest in Y.

In [64]:
G = {
    'a':set('bc'),
    'b':set('dei'),
    'c':set('d'),
    'd':set('ah'),
    'e':set('f'),
    'f':set('g'),
    'g':set('eh'),
    'h':set('i'),
    'i':set('h'),
}

In [67]:
def walk(G,start='d',S = set()):
    visited,ToVisit = dict(), set()
    ToVisit.add(start)
    visited[start] = None
    while ToVisit:
        u = ToVisit.pop()
        for v in G[u].difference(visited,S):
            ToVisit.add(v)
            visited[v] = u  #the parent node of this node in the traversal tree
    return visited

In [65]:
DFS_topsort(G)

['a', 'b', 'e', 'f', 'g', 'c', 'd', 'h', 'i']

In [68]:
def tr(G): #reverse all edges of g
    GT = {}
    for u in G:
        GT[u] = set()
    for u in G:
        for v in G[u]:
            GT[v].add(u)
    return GT
def scc(G):
    GT = tr(G)
    scc,seen = [],set()
    for u in DFS_topsort(G):
        if u in seen:
            continue
        C = walk(GT,start=u,S=seen)
        seen.update(C)
        scc.append(C)
    return scc
scc(G)

[{'a': None, 'd': 'a', 'c': 'd', 'b': 'd'},
 {'e': None, 'g': 'e', 'f': 'g'},
 {'h': None, 'i': 'h'}]

### Exercises
1. In the components function in Listing 5-2, the set of seen nodes is updated with an entire component at a time. Another option would be to add the nodes one by one inside `walk`. How would that be different (or, perhaps, not so different)   
   It might be slower
2. If you’re faced with a graph where each node has an even degree, how would you go about finding 
an Euler tour?

In [8]:
G = {
    'a':set('bcde'),
    'b':set('ae'),
    'c':set('af'),
    'd':set('ae'),
    'e':set('abdf'),
    'f':set('ce'),
}

In [3]:
def isBridge(G,end):
    return len(G[end]) == 1
def FindEuler(G,start='a',seen = set()):
    print('{}-'.format(start),end='')
    seen.add(start)
    if len(seen) == len(G):
        print('end')
        return 
    GoBridge = True
    for n in G[start]:
        if not isBridge(G,n):
            G[n].remove(start)
            G[start].remove(n)
            FindEuler(G,start=n,seen=seen)
            GoBridge = False
            break
    if GoBridge:
        for n in G[start]:
            if isBridge(G,n):
                FindEuler(G,start=n,seen=seen)
FindEuler(G)

a-c-f-e-a-d-e-b-end


3. If every node in a directed graph has the same in-degree as out-degree, you could find a directed
Euler tour. Why is that? How would you go about it, and how is this related to Trémaux’s algorithm?
4. One basic operation in image processing is the so-called flood fill, where a region in an image is filled with a single color. In painting applications (such as GIMP or Adobe Photoshop), this is typically done with a paint bucket tool. How would you implement this sort of filling?   
   Traversing the grid, with adjacent pixels as neighbours
5. In Greek mythology, when Ariadne helped Theseus overcome the Minotaur and escape the labyrinth, she gave him a ball of fleece thread so he could find his way out again. But what if Theseus forgot to fasten the thread outside on his way in and remembered the ball only once he was thoroughly lost—what could he use it for then?
6. In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where 
has the backtracking gone in the iterative version?  
   when the loop continues to the next one
7. Write a nonrecursive version of DFS that can deal determine finish times

In [23]:
def DFS_ittime(G,start='a'): #DFS with timestamp
    visited,ToVisit = set(),[]
    ToVisit.append(start) 
    d,f=dict(),dict()
    finished = []
    t=0
    while ToVisit:
        u = ToVisit.pop()
        if u in finished:
            continue
        if u in visited:
            finish = True
            for n in G[u]:
                if n not in visited:
                    finish = False
            if finish:
                f[u] = t
                t += 1
                finished.append(u)
            continue
        visited.add(u)
        d[u] = t
        t+= 1
        finish = True
        ToVisit.append(u)
        for n in G[u]:
            if n not in visited:
                ToVisit.append(n)
                finish = False
        if finish:
            f[u] = t
            t += 1
            finished.append(u)
    print(d)
    print(f)
DFS_ittime(G)

{'a': 0, 'e': 1, 'd': 2, 'f': 4, 'c': 5, 'b': 8}
{'d': 3, 'c': 6, 'f': 7, 'b': 9, 'e': 10, 'a': 11}


8. In `dfs_topsort` (Listing 5-8), a recursive DFS is started from every node (although it terminates immediately if the node has already been visited). How can we be sure that we will get a valid topological sorting, even though the order of the start nodes is completely arbitrary?   
   If `u` must go before `v`, we will never reach `u` if we go through `v` first, and always reach `v` if we go through `u` first 
9. Write a version of DFS where you have hooks (overridable functions) that let the user perform 
custom processing in pre- and postorder.
10. Show that if (and only if) DFS finds no back edges, the graph being traversed is acyclic.   
    DFS will keep going into the circle until it meets the entry point, creating a back edge
11. What challenges would you face if you wanted to use other traversal algorithms than DFS to look 
for cycles in directed graphs? Why don’t you face these challenges in undirected graphs?  
    It would be hard to distinguish cross edges and back edges. In undirected grapshes, it would not matter
12. If you run DFS in an undirected graph, you won’t have any forward or cross edges. Why is that?   
    The edge would already be traversed
13. Write a version of BFS that finds the distances from the start node to each of the others, rather than the actual paths.

In [26]:
from collections import deque
def bfs_dis(G,start):
    visited,ToVisit = {start:None},deque([start])
    while ToVisit:
        u = ToVisit.popleft()
        for v in G[u]:
            if v in visited:
                continue
            try:
                visited[v] = visited[u] + 1
            except:
                visited[v] = 1
            ToVisit.append(v)
    return visited
bfs_dis(G,'b')

{'b': None, 'a': 1, 'e': 1, 'c': 2, 'd': 2, 'f': 2}

14. As mentioned in Chapter 4, a graph is called bipartite if you can partition the nodes into two sets so that no neighbors are in the same set. Another way of thinking about this is that you’re coloring each node either black or white (for example) so that no neighbors get the same color. Show how you’d find such a bipartition (or two-coloring), if one exists, for any undirected graph    
    Traveser the graph and paint the nodes accordingly. If it can not be done, there is no solution. 
15. If you reverse all the edges of a directed graph, the strongly connected components remain the 
same. Why is that?
16. Let $X$ and $Y$ be two strongly connected components of the same graph, $G$. Assume that there is at least one edge from $X$ to $Y$. If you run DFS on $G$ (restarting as needed, until all nodes have been visited), the latest finish time in $X$ will always be later than the latest in $Y$. Why is that?   
    It is impossible to get back to $X$ without backtracking, so every node in $Y$ must be finished  
17. In Kosaraju’s algorithm, we find starting nodes for the final traversal by descending finish times from an initial DFS, and we perform the traversal in the transposed graph (that is, with all edges reversed). Why couldn’t we just use ascending finish times in the original graph?

In [None]:
G = {
    'a':set('bc'),
    'b':set('a'),
    'c':set('ad'),
    'd':set('c'),
}