Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = "Lars Janssen"

---

For those not familiar with Python, a quick overview is given [here](https://github.com/palcu/python-for-competitive-programming/blob/master/python-for-competitive-programming.ipynb).

# Notebook BAPC week 11: Toposort

## A general remark on reading input
When reading *large* amounts of data, `input()` is sometimes too slow.
A much faster alternative is to `import sys` and later replace all `input()` by `sys.stdin.readline()`. Note that using `readline`, the line will end in a newline character `\n`; this doesn't matter when using it to read integers, but watch out if you're using it to read strings. See [this link](https://stackoverflow.com/a/58537094/12354474) if you are interested.

## Prologue: graphs with vertices represented as strings
In the lecture of week 4, we briefly commented on examples of graphs where the vertices are not *numbers* from 0 to $V-1$, but rather strings. This week, you will almost certainly come across a problem of this form (for instance [Build Dependencies](https://open.kattis.com/problems/builddeps)). For this problem, the easiest solution will be to store "adjacency dictionaries" instead of the normal adjacency lists, like so:

In [2]:
# The following code allows us to emulate command line I/O.
from io import StringIO
from sys import stdin
# Overwrite the jupyter input function.
def input():
    return stdin.readline()

# Load the example graph from the Kattis problem "builddeps".
stdin = StringIO("""6
gmp:
solution: set map queue
base:
set: base gmp
map: base gmp
queue: base
gmp""")

V = int(input())
adj = {}
for _ in range(V):
    vertex, parents = input().split(":")
    parents = parents.strip()
    if len(parents) == 0:
        adj[vertex] = []
    else:
        adj[vertex] = parents.split(" ")

assert adj == {'gmp' : [], 'solution' : ['set', 'map', 'queue'],
               'base' : [], 'set' : ['base', 'gmp'],
               'map': ['base', 'gmp'], 'queue' : ['base']}

## Exercise 0: Kahn's algorithm
A topological sort of a Directed Acyclic Graph (DAG) is an ordering of its vertices, such that for every edge $u \to v$, the vertex $u$ comes before $v$ in the ordering. There are two popular algorithms for calculating the topological sort of a DAG. Today we will focus on *Kahn's Algorithm*.

The code below is Kahn's algorithm as described in the slides.

In [3]:
def kahn(parents, children):
    """ Performs Kahn's Algorithm given the incoming and outgoing adjlists. """
    n = len(parents)
    assert n == len(children)
    num_parents = [len(l) for l in parents]
    stack = [i for i in range(n) if num_parents[i] == 0]
    toposort = [] # will contain the toposort ordering

    while len(stack) > 0:
        cur = stack.pop()
        toposort.append(cur)
        for child in children[cur]:
            num_parents[child] -= 1
            if num_parents[child] == 0:
                stack.append(child)
    return toposort

In [4]:
def slides_example():
    parents = [[1, 2], [3], [3], [], [0, 2, 5], []]
    children = [[4], [0], [0, 4], [1, 2], [], [4]]
    return parents, children
print("A toposort for the graph from the lecture is", kahn(*slides_example()))

A toposort for the graph from the lecture is [5, 3, 2, 1, 0, 4]


For the following graph, we can also compute a toposort, but it will be different from the one pictured:
<img src="https://i1.wp.com/algorithms.tutorialhorizon.com/files/2018/03/Topological-Sort.png?w=750&ssl=1">

In [5]:
def notebook_example():
    parents = [[1], [2, 3], [5], [6], [5, 6], [7], [7], []]
    children = [[], [0], [1], [1], [], [2, 4], [3, 4], [5, 6]]
    return parents, children
print("A toposort for this graph is", kahn(*notebook_example()))

A toposort for this graph is [7, 6, 3, 5, 4, 2, 1, 0]


## Exercise 1: generating input for Kahn
The Kahn algorithm as above requires us to input two adjacency lists: one for incoming edges, and one for outgoing edges. Use the cell below to write a function that produces these two adjacency lists from an edge list.

In [14]:
def outgoing_to_incoming_adjlist(outgoing_adjlist):
    incoming_adjlist = [[] for _ in outgoing_adjlist]
    for u, lst in enumerate(outgoing_adjlist):
        for v in lst:
            incoming_adjlist[v].append(u)
    return incoming_adjlist

def edgelist_to_adjlists(num_vertices, edgelist):
    incoming_adjlist = [[] for _ in range(num_vertices)]
    outgoing_adjlist = [[] for _ in range(num_vertices)]
    for i in range(len(edgelist)):
        incoming_adjlist[edgelist[i][1]].append(edgelist[i][0])
        outgoing_adjlist[edgelist[i][0]].append(edgelist[i][1])
    return incoming_adjlist, outgoing_adjlist

In [15]:
def equivalent(adjlist1, adjlist2):
    """ Returns true exactly when two adjacency lists contain the same elements. """
    return all(set(l1) == set(l2) for (l1, l2) in zip(adjlist1, adjlist2))
        
# The example from the lecture
edgelist = [(3, 1), (3, 2), (1, 0), (2, 0), (0, 4), (2, 4), (5, 4)]
parents, children = slides_example()
assert equivalent(parents, outgoing_to_incoming_adjlist(children))
assert all(equivalent(a, b) for (a,b) in zip([parents, children], edgelist_to_adjlists(len(parents), edgelist)))

## Exercise 2: testing if there is only one toposort
We have previously seen that in general, a graph will have more than one topological ordering. In the example from the lecture, both nodes 3 and 5 have no incoming edges so both `[5, 3, 2, 1, 0, 4]` and `[3, 5, 2, 1, 0, 4]` are valid toposorts. Finish the function below to determine if a given graph has a unique toposort ordering.

In [16]:
def has_unique_toposort(parents, children):
    dist = [0 for i in range(len(parents))]
    for y in range(len(parents)):
        for i in range(len(parents)):
            if(len(parents[i]) == y and y!= 0):
                dist[i] = max(parents[i]) + 1
                
    for i in range(len(parents)):
        if(dist.count(i) > 1):
            return False
    return True

In [17]:
# Test that the result holds for some example graphs
def path_graph(V):  # Connected DAG with E=V-1 edges. It's a path.
    parents = [[]] + [[i-1] for i in range(1, V)]
    children = [[i+1] for i in range(0, V-1)] + [[]]
    return parents, children

def full_dag(V):  # Connected DAG with E=V(V-1) edges. It connects a vertex v to all vertices after it.
    parents = [list(range(i)) for i in range(V)]
    children = [list(range(i+1, V)) for i in range(V)]
    return parents, children

assert has_unique_toposort(*path_graph(10))
assert has_unique_toposort(*full_dag(10))
assert not has_unique_toposort(*slides_example())
assert not has_unique_toposort(*notebook_example())

## Exercise 3: testing if an ordering is a toposort
The Kahn algorithm produces an ordering of the vertices. We can use the definition of a toposort directly to check if a vertex ordering is a toposort. The runtime of this solution will be either $\mathcal O(VE)$ or $\mathcal O(V^2 E)$, depending on your implementation. Write the `is_toposort` function below, and if you manage, try to come up with a different implementation that runs in *linear time*.

In [18]:
def is_toposort(parents, children, ordering):
    sortedlist = [i for i in range(len(ordering))]
    if(sorted(ordering) != sortedlist or len(ordering) != len(parents)):
        return False
    
    for i in range(len(parents)):
        for y in range(len(parents[i])):
            if(ordering.index(parents[i][y]) > ordering.index(i)):
                return False
    return True

In [19]:
# TEST that the implementation is correct with some examples.

# Example from the slides
parents, children = slides_example()
assert is_toposort(parents, children, kahn(parents, children))
assert is_toposort(parents, children, [3, 5, 2, 1, 0, 4])
assert is_toposort(parents, children, [3, 5, 1, 2, 0, 4])

# Example from above
parents, children = notebook_example()
assert is_toposort(parents, children, [7, 6, 3, 5, 4, 2, 1, 0])
assert is_toposort(parents, children, [7, 6, 5, 4, 3, 2, 1, 0])

assert not is_toposort(parents, children, [0, 1, 2, 3, 4, 5, 6, 7])
assert not is_toposort(parents, children, [0, 0, 0, 0, 0, 0, 0, 0]), "Did you check duplicate vertices?"
assert not is_toposort(parents, children, []), "Did you check missing vertices?"

In [12]:
# Test that runtime is not worse than O(V^2 E)
from math import log
time_P5K = %timeit -r 1 -q -o is_toposort(*path_graph(5000), list(range(5000)))
time_P10K = %timeit -r 1 -q -o is_toposort(*path_graph(10000), list(range(10000)))
assert log(time_P10K.best/time_P5K.best)/log(2) < 3.2, "Your algorithm is slower than O(V^3) for the path graph :-("

time_F400 = %timeit -r 1 -q -o is_toposort(*full_dag(400), list(range(400)))
time_F800 = %timeit -r 1 -q -o is_toposort(*full_dag(800), list(range(800)))
assert log(time_F800.best/time_F400.best)/log(2) < 4.2, "Your algorithm is slower than O(V^4) for the full graph :-("

In [13]:
# Test that runtime is linear

import sys
!{sys.executable} -m pip install stopit
import stopit

with stopit.ThreadingTimeout(3.0) as t:
    assert is_toposort(*path_graph(1000000), list(range(1000000)))
    assert is_toposort(*full_dag(2000), list(range(2000)))

Collecting stopit
Installing collected packages: stopit
Successfully installed stopit-1.1.2
