**Exercise:** In my implementation of `reachable_nodes`, you might be bothered by the apparent inefficiency of adding *all* neighbors to the stack without checking whether they are already in `seen`.  Write a version of this function that checks the neighbors before adding them to the stack.  Does this "optimization" change the order of growth?  Does it make the function faster?

The following functions from the textbook are included for testing purposes only.

In [10]:
import networkx as nx

def all_pairs(nodes):
    for i, u in enumerate(nodes):
        for j, v in enumerate(nodes):
            if i < j:
                yield u, v

def make_complete_graph(n):
    G = nx.Graph()
    nodes = range(n)
    
    G.add_nodes_from(nodes)
    G.add_edges_from(all_pairs(nodes))
    return G

def reachable_nodes(G, start):
    seen = set()
    stack = [start]

    while stack:
        node = stack.pop()
        
        if node not in seen:
            seen.add(node)
            stack.extend(G.neighbors(node))
            
    return seen

complete = make_complete_graph(10)

The following function is the modified `reachable_nodes` which includes an additional check at the end of the function.

In [11]:
def reachable_nodes_precheck(G, start):
    seen = set()
    stack = [start]
    
    while stack:
        node = stack.pop()
        
        if node not in seen:
            seen.add(node)
            
            neighbors = G.neighbors(node)
            
            for k in neighbors:
                if k not in seen:       # This checks if the selected node from neighbors is not already in seen
                    stack.append(k)
                
    return seen

In [15]:
print(reachable_nodes(complete, 0),reachable_nodes_precheck(complete, 0))

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


This shows that both `reachable_nodes` and the modified `reachable_nodes_precheck` provides the same results.

In [16]:
%timeit len(reachable_nodes(complete, 0))
%timeit len(reachable_nodes_precheck(complete, 0))

11.8 µs ± 87.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.6 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


The modified version of `reachable_nodes` checks whether one of the nodes in `neighbors(node)` is already in seen. This is a linear search which would increase the runtime of the code. However as was explained in exercise 2.2, if the two algorithms have the same leading term then they are equivalent, hence the order of growth would be $O(n+n+m) =O(2n+m) \approx O(n+m)$. As is noted in **Think Python**, for two algorithms with the same leading term the faster one is entirely dependent on the details. In my case, `reachable nodes` has a run time of 11.8 µs ± 87.5 ns per loop while `reachable_nodes_precheck` has a run time of 12.6 µs ± 34.3 ns per loop. The addition of another check added 1.0 µs of run time.