# SCC Finding (Kosaraju-Sharir Algorithm)

Here, we will implement the SCC finding algorithm in DPV, more commonly known as the Kosaraju-Sharir algorithm.

In [None]:
import base64
import networkx as nx
import typing
import numpy as np
ok = base64.b85decode(b'AmWyU;+BBomVn}xgyNQf;+BBomVn}xgdiZ|mW1M#fZ~>b;+BZwmV_W6AmWyU;+BBomVn}xfZ~>b;+BLUAmWyU;+BBomVn}xfZ~>b;+BLU3gVWCARr*(mWUuAAmWyYAmWygAmWyU;+B9Q;+BBomV_YUmWUwYmWUuAAmWygAmWyYAmWyYARr*(mXII{;+BvgAmWyY;+BBomVn}xfZ~>bARr(h;+BZwmW1M#gyNQj;+BXYARr*(mVn}xgdiXwARr*(mVn}xgdiXw3LqdL;+BXYARr(hARywFgyNQfARr*(mWUwYmVn}xgdiXw;+BXYAmWyQ;+BLUARywFh#(*eAmWyU;+B9QARr(hARywFh#(*!;+BNsmVh82AmWyY;+BBomVn}xfFK|sAmWyY;+BBomVn}xfFK|s3gVWCARr(hARr(hAmWygARr*(mXIJIARywFkRTu+ARr(h;+BvgARr(hAPVA^kRTu+ARr(hARr(hARr(hARr(hARr(hARr(hARr(hARr0~').decode()

Below is a Graph with iconic SCC generator. It generates 4 possibly strongly connected components ABCD, then add edge AB, AC and BD, CD.

In [None]:
try:
    nx.gnp_random_graph(7, 0.65, seed=np.random.default_rng(seed=1145141919), directed=True)
    prng = np.random.default_rng(seed=114514)
except Exception as e:
    prng = np.random.RandomState(seed=1919810)

In [None]:
def random_scc_graph():
    sizes = [np.random.randint(6, 20) for _ in range(4)]
    random_graph1 = nx.gnp_random_graph(sizes[0], 0.65, seed=prng, directed=True)
    
    random_graph2 = nx.gnp_random_graph(sizes[1], 0.65, seed=prng, directed=True)
    nx.relabel_nodes(random_graph2, {i:i + sizes[0] for i in range(sizes[1])}, copy=False)
    
    random_graph3 = nx.gnp_random_graph(sizes[2], 0.65, seed=prng, directed=True)
    nx.relabel_nodes(random_graph3, {i:i + sizes[0] + sizes[1] for i in range(sizes[2])}, copy=False)
    
    random_graph4 = nx.gnp_random_graph(sizes[3], 0.65, seed=prng, directed=True)
    nx.relabel_nodes(random_graph4, {i:i + sizes[0] + sizes[1] + sizes[2] for i in range(sizes[3])}, copy=False)
    
    random_graph = nx.compose(nx.compose(nx.compose(random_graph1, random_graph2), random_graph3), random_graph4)
    random_graph.add_edge(np.random.randint(0, sizes[0]), np.random.randint(sizes[0], sizes[0] + sizes[1]))
    random_graph.add_edge(np.random.randint(0, sizes[0]), np.random.randint(sizes[0] + sizes[1], sizes[0] + sizes[1] + sizes[2]))
    random_graph.add_edge(np.random.randint(sizes[0], sizes[0] + sizes[1]), np.random.randint(sizes[0] + sizes[1] + sizes[2], sizes[0] + sizes[1] + sizes[2] + sizes[3]))
    random_graph.add_edge(np.random.randint(sizes[0] + sizes[1], sizes[0] + sizes[1] + sizes[2]), np.random.randint(sizes[0] + sizes[1] + sizes[2], sizes[0] + sizes[1] + sizes[2] + sizes[3]))
    return random_graph

In [None]:
random_graph = random_scc_graph()
nx.draw_networkx(random_graph)
print([len(c) for c in nx.strongly_connected_components(random_graph)])

## Utilities
### Graph Reversal
One of the most iconic component in Kosajaru algorithm is graph reversal. Write a function that, given a list of edges, reverse the graph.

In [None]:
def reverse_graph(g: typing.Set[typing.Tuple[int, int]]) -> typing.Set[typing.Tuple[int, int]]:
    # Your code here
    return {(v, u) for u, v in g}


In [None]:
for _ in range(50):
    random_graph = nx.gnp_random_graph(50, 0.6, seed=1145141919, directed=True)
    gre0 = reverse_graph(set(random_graph.edges))
    gre1 = set(random_graph.reverse().edges)
    assert gre0 == gre1
print(ok)

Like what Prof. Hilfinger's CS61B did, as you implemented this function you are then free to use implementations in `networkx`: instead of using `reverse_graph`, you can also use [`nx.DiGraph.reverse`](https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.reverse.html?highlight=reverse#digraph-reverse) (click for documentation).

### DFS
Another utility you will need is hyped up DFS: you will need one that records `post` number when traversing the graph, and a separate `explore(v)` function that traverses a sink SCC.  

You might want to define a helper function in `dfs()`, and use `nonlocal` to modify variables in `dfs()`'s scope.

In [None]:
visited = set()
post = dict()
clock = 0
# Given a graph G and a vertex v, return all vertices reachable from v
# Return: a set of integers that contains all vertices reachable from v
def explore(g, v) -> typing.Set[int]:
    # Your code here
    global clock
    visited.add(v)
    clock += 1
    reachable = {v}
    for u in g[v]:
        if u not in visited:
            reachable.update(explore(g, u))
    post[v] = clock
    clock += 1
    return reachable

# Given a graph, do DFS and return a tuple with all vertices as key and their post number as value
def dfs(g) -> typing.Dict[int, int]:
    # Your code here
    global visited, post, clock
    visited = set()
    post = {v: 0 for v in g}
    clock = 1
    for v in g:
        if v not in visited:
            explore(g, v)
    answer = {i: post[i] for i in g}
    return answer


For the sake of abstraction, the AG will only check if you can correctly find all vertices in the sink SCC.

In [None]:
for _ in range(100):
    random_graph = random_scc_graph()
    visited = set()
    sccs = {frozenset(c) for c in nx.strongly_connected_components(random_graph)}
    scc = explore(nx.to_dict_of_lists(random_graph), len(random_graph.nodes) - 1)
    # if scc not in sccs: 
    #     nx.draw_networkx(nx.gnp_random_graph(10, 0.65, seed=prng, directed=True))
    assert scc in sccs, f"scc = {scc};\nsccs = {sccs}"
print(ok)

## The Algorithm
Now, implement Kosaraju-Sharir algorithm. Essentially, what it does is:

0. Do DFS on $G^R$.
1. Find the vertex $v$ with highest post number in $G^R$. It must reside in a source SCC in $G^R$ and sink SCC in $G$.
2. `explore` from $v$ to extract all vertices in this SCC and only in this SCC.
3. (Effectively) remove this SCC (you can do it by maintaining a "disabled vertices" set).
4. Repeat 1-3, until the graph is empty.

You should return a list of sets where each set is the vertices that a SCC contains.

Like above, you can freely use `nx.dfs_postorder_nodes` as you correctly implemented DFS.

In [None]:
def graph_from_edges(edges):
    graph = dict()
    for u, v in edges:
        if u not in graph: graph[u] = list()
        if v not in graph: graph[v] = list()
        graph[u].append(v)
    return graph

def add_edges(graph, edges):
    for u, v in edges:
        if u in graph and v in graph:
            graph[u].append(v)

def kosaraju(g) -> typing.List[typing.Set]:
    # Your code here
    sccs = list()
    edges = set()
    for u in g:
        for v in g[u]:
            edges.add((u, v))
    while g:
        edge_rev = reverse_graph(edges)
        g_rev = {u: list() for u in g}
        add_edges(g_rev, edge_rev)
        postorder = dfs(g_rev)
        v = max(postorder, key=postorder.get)
        visited.clear()
        scc = explore(g, v)
        sccs.append(scc)
        new_edges = set()
        for e in edges:
            if e[0] not in scc and e[1] not in scc:
                new_edges.add(e)
        edges = new_edges
        g = {u: list() for u in g_rev if u not in scc}
        add_edges(g, edges)
    return sccs           


In [None]:
for _ in range(100):
    random_graph = random_scc_graph()
    sccs_tarjan = {frozenset(c) for c in nx.strongly_connected_components(random_graph)}
    sccs_kosaraju = {frozenset(c) for c in kosaraju(nx.to_dict_of_lists(random_graph))}
    assert sccs_tarjan == sccs_kosaraju, f"sccs_tarjan: {sccs_tarjan};\nsccs_kosaraju: {sccs_kosaraju}"
for _ in range(100):
    random_graph = nx.gnp_random_graph(100, 0.6, directed=True)   
    sccs_tarjan = {frozenset(c) for c in nx.strongly_connected_components(random_graph)}
    sccs_kosaraju = {frozenset(c) for c in kosaraju(nx.to_dict_of_lists(random_graph))}
    assert sccs_tarjan == sccs_kosaraju

print(ok)

## A Question
Now that you completed the algorithm you are free to use `nx.kosaraju_strongly_connected_components`  
This question is adapted from Question A in ICPC 2020 North America Qualifier.

#### Problem Statement
Characters in Star Wars each speak a language, but they typically understand a lot more languages that they don’t or can’t speak. For example, Han Solo might speak in Galactic Basic and Chewbacca might respond in Shyriiwook; since they each understand the language spoken by the other, they can communicate just fine like this.

We’ll say two characters can converse if they can exchange messages in both directions. Even if they didn’t understand each other’s languages, two characters can still converse as long as there is a sequence of characters who could translate for them through a sequence of intermediate languages. For example, Jabba the Hutt and R2D2 might be able to converse with some help. Maybe when Jabba spoke in Huttese, Boba Fett could translate to Basic, which R2D2 understands. When R2D2 replies in Binary, maybe Luke could translate to Basic and then Bib Fortuna could translate back to Huttese for Jabba.

In Star Wars Episode IV, there’s a scene with a lot of different characters in a cantina, all speaking different languages. Some pairs of characters may not be able to converse (even if others in the cantina are willing to serve as translators). This can lead to all kinds of problems, fights, questions over who shot first, etc. You’re going to help by asking some of the patrons to leave. The cantina is a business, so you’d like to ask as few as possible to leave. You need to determine the size of the smallest set of characters $S$ such that if all the characters in $S$ leave, all pairs of remaining characters can converse.

For example, in the first sample input below, Chewbacca and Grakchawwaa can converse, but nobody else understands Shyriiwook, so they can’t converse with others in the bar. If they leave, everyone else can converse. In the second sample input, Fran and Ian can converse, as can Polly and Spencer, but no other pairs of characters can converse, so either everyone but Polly and Spencer must leave or everyone but Fran and Ian.

#### Input
Input starts with a positive integer, $1\leq N \leq 100$, the number of characters in the cantina. This is followed by $N$ lines, each line describing a character. Each of these $N$ lines starts with the character’s name (which is distinct), then the language that character speaks, then a list of $0$ to $20$ additional languages the character understands but doesn’t speak. All characters understand the language they speak. All character and language names are sequences of $1$ to $15$ letters (a-z and A-Z), numbers, and hyphens. Character names and languages are separated by single spaces.

#### Output
Print a line of output giving the size of the smallest set of characters $S$ that should be asked to leave so that all remaining pairs of characters can converse.

#### Samples
##### Sample Input 1
```
7
Jabba-the-Hutt Huttese
Bib-Fortuna Huttese Basic
Boba-Fett Basic Huttese
Chewbacca Shyriiwook Basic
Luke Basic Jawaese Binary
Grakchawwaa Shyriiwook Basic Jawaese
R2D2 Binary Basic
```
##### Sample Output 1
```
2
```

##### Sample Input 2
```
6
Fran French Italian
Enid English German
George German Italian
Ian Italian French Spanish
Spencer Spanish Portugese
Polly Portugese Spanish
```
##### Sample Output 2
```
4
```

Below is a playground which you can test your implementation of this question if you'd like to. You can also register an account on Kattis and submit the question [here](https://open.kattis.com/problems/cantinaofbabel) (but you can't use `networkx` there)

In [None]:
def solution():
    pass