# Networks and their Structure Assignment

## Network Science Topic 3

Note that the networks in this exercise are all undirected.

Here is a definition of a further kind of centrality known as *assignment centrality*.  It is similar to closeness centrality in that it depends on distances, but the definition is slightly different.  Again, let $d_{ij}$ be the distance from $i$ to $j$.  Then the assignment centrality of $j$ is

$$ \sum_{i \neq j} \frac{1}{d_{ij}} $$

where $1/d_{ij}$ is 0 if there is no path from $i$ to $j$.

1. [3 marks] Describe one advantage of using assignment centrality in preference to closeness centrality.

**The advantage is that assignment centrality produces numbers that are easier to work with, especially with larger networks, than closeness centrality. This is because of their respective definitions. Closeness centrality adds distances in the denominator with a constant numerator of 1, so the resulting numbers are small and close to zero, which is hard to interpret. In contrast, assignment centrality adds the fractions together so that the result is larger and corresponds to a node with a higher value having a greater effect on centrality - this is more meaningful.**


The Medici family were, through success in commerce and banking, wealthy and politically powerful in Florence beginning in the 13th century.  It has been suggested that their prominence can be explained by considering the network, displayed below, of Florentine families and their links by marriage.

<img src="medici.jpg" width="400">

2. [5 marks] Calculate the assignment centrality of each node in the network and comment on whether the results support the theory that the Medici's power was a result of their position in this network.


In [1]:
from itertools import combinations
from queue import PriorityQueue

In [2]:
def assignment_centrality(a, b, c, d, e):

	return (1 * a) + (1/2 * b) + (1/3 * c) + (1/4 * d) + (1/5 * e)

Peruzzi: $(\frac{1}{1} * 3) + (\frac{1}{2} * 3) + (\frac{1}{3} * 4) + (\frac{1}{4} * 3) + (\frac{1}{5} * 1) + 0 = 6.783$  
Bischeri: $(\frac{1}{1} * 3) + (\frac{1}{2} * 5) + (\frac{1}{3} * 3) + (\frac{1}{4} * 2) + (\frac{1}{5} * 1) + 0 = 7.200$  
Castellani: $(\frac{1}{1} * 3) + (\frac{1}{2} * 3) + (\frac{1}{3} * 5) + (\frac{1}{4} * 3) + 0 = 6.916$  
Lamberteschi: $(\frac{1}{1} * 1) + (\frac{1}{2} * 3) + (\frac{1}{3} * 5) + (\frac{1}{4} * 4) + (\frac{1}{5} * 1) + 0 = 5.366$  
Strozzi: $(\frac{1}{1} * 4) + (\frac{1}{2} * 4) + (\frac{1}{3} * 4) + (\frac{1}{4} * 2) + 0 = 7.833$  
Guadagni: $(\frac{1}{1} * 4) + (\frac{1}{2} * 5) + (\frac{1}{3} * 4) + (\frac{1}{4} * 1) + 0 = 8.083$  
Ridolfi: $(\frac{1}{1} * 3) + (\frac{1}{2} * 8) + (\frac{1}{3} * 3) + 0 = 8.000$  
Tornabuoni: $(\frac{1}{1} * 3) + (\frac{1}{2} * 7) + (\frac{1}{3} * 4) + 0 = 7.833$  
Barbadori: $(\frac{1}{1} * 2) + (\frac{1}{2} * 7) + (\frac{1}{3} * 4) + (\frac{1}{4} * 1) + 0 = 7.083$  
Medici: $(\frac{1}{1} * 6) + (\frac{1}{2} * 5) + (\frac{1}{3} * 3) + 0 = 9.500$  
Albizzi: $(\frac{1}{1} * 3) + (\frac{1}{2} * 7) + (\frac{1}{3} * 4) + 0 = 7.833$  
Acciaiuoli: $(\frac{1}{1} * 1) + (\frac{1}{2} * 5) + (\frac{1}{3} * 5) + (\frac{1}{4} * 3) + 0 = 5.916$  
Salviati: $(\frac{1}{1} * 2) + (\frac{1}{2} * 5) + (\frac{1}{3} * 4) + (\frac{1}{4} * 3) + (\frac{1}{5} * 1) + 0 = 6.583$  
Ginori: $(\frac{1}{1} * 1) + (\frac{1}{2} * 2) + (\frac{1}{3} * 7) + (\frac{1}{4} * 3) + (\frac{1}{5} * 1) + 0 = 5.283$  
Pazzi: $(\frac{1}{1} * 1) + (\frac{1}{2} * 1) + (\frac{1}{3} * 5) + (\frac{1}{4} * 4) + (\frac{1}{5} * 3) + 0 = 4.766$  
Pucci: 0

**From the calculations it is clear that the Medici family had the highest assignment centrality of 9.5, whereas the other families ranged from around 6 to 8 on average, one even being zero (an outlier). The results therefore support the theory that the Medici family's power was due to their position in this network.**

The next two questions require an implementation of Newman's agglomerative algorithm for community detection (see the hints at the end of these questions).  We met it in `topic3b.pdf`: it is described on Slide 4.  Recall that the algorithm finds many community decompositions.  It starts with the decomposition in which every node alone is a community, and then merges communities until the decomposition in which every node is in the same community is reached.  Thus the output to the algorithm contains a list of many decompositions. Also the algorithm calculates the change in the modularity $Q$ at each step so the relative value of $Q$ for each decomposition is part of the output.  For example, here is the output for the example on Slide 7.

$$ \begin{array}{ll}
\{1\}, \{2\}, \{3\}, \{4\}, \{5\}, \{6\}, \{7\} & Q=0\\
\{1, 2\}, \{3\}, \{4\}, \{5\}, \{6\}, \{7\} & Q=0.086 \\
\{1, 2, 3\}, \{4\}, \{5\}, \{6\}, \{7\} & Q=0.210 \\
\{1, 2, 3\}, \{4\}, \{5\}, \{6, 7\} & Q=0.296 \\
\{1, 2, 3\}, \{4\}, \{5, 6, 7\} & Q=0.420 \\
\{1, 2, 3, 4\}, \{5, 6, 7\} & Q=0.432\\
\{1, 2, 3, 4, 5, 6, 7\} & Q=0.160
\end{array}
$$

(Note that this implies that $\{1, 2, 3, 4\}, \{5, 6, 7\}$ is the best community decomposition as it has the greatest modularity.  Note also that these are relative values as the modularity of the initial decomposition is not zero, but this is enough information for us to determine which decomposition is best.)

3. [8 marks] Construct the network defined in `zachary.txt` ignoring edge weights.  Run Newman's agglomerative algorithm on this network and write down the output in the same format used in the example above.  The file relates to the example mentioned in the lecture of a karate club that split in two (see the description in the file).  How do your results compare with what actually happened?

In [3]:
# load the zachary graph ignoring edge weights
graph = open("zachary.txt")

karate_graph = {}
nodes = 0

for line in graph:
    
    data = line.split(' ')
    
    # skip the text at the beginning
    if len(data) == 3:
    
        node = int(data[0])
        neighbour = int(data[1])

        if node in karate_graph:
            karate_graph[node].add(neighbour)
        else:
            karate_graph[node] = set([neighbour])
            nodes += 1
        
        if neighbour in karate_graph:
            karate_graph[neighbour].add(node)
        else:
            karate_graph[neighbour] = set([node])
            nodes += 1
        
print("Loaded graph with", nodes, "nodes")

Loaded graph with 34 nodes


In [4]:
test_graph = {1: [2,3],
 2: [1,3],
 3: [1,2,4,5],
 4: [3,5],
 5: [3,4,6,7],
 6: [5,7],
 7: [5,6]}

In [5]:
# calculates change in modularity if this pair of communities is merged
def deltaQ(graph, communities, pair, edges, total_edges, total_endpoints):
    
    a1 = communities[pair[0]][1]
    a2 = communities[pair[1]][1]
        
    modularity_change = (edges / total_edges) - (2 * (a1 / total_endpoints) * (a2 / total_endpoints))
    return modularity_change

In [6]:
# gets the number of nodes in a pair of communities
def get_nodes(community1, community2, communities, num_nodes):
        
    nodes = communities[community1][0] + communities[community2][0]
    return nodes

In [7]:
# gets the number of edges between two communities
def get_num_edges(graph, community1, community2, nodes2, communities):
    
    edges = 0
    nodes1 = communities[community1][0]
    
    for i in nodes1:
        for j in nodes2:
            if i in graph[j]:
                edges += 1
    
    return edges

In [8]:
# runs Newman's agglomerative algorithm on a given graph
def Newman_algorithm(graph, graph_name):
    
    # keys are current communities, values are nodes and number of endpoints
    communities = {key:([key], len(graph[key])) for key in graph}
    num_nodes = len(graph)
    
    # key for the next merged community
    if graph_name == 'karate' or graph_name == 'test':
        new_key = num_nodes + 1  
    else:
        new_key = 'A'
    
    # get total numbers of edges and endpoints for calculating modularity change
    total_endpoints = sum([len(graph[key]) for key in graph])
    total_edges = total_endpoints // 2
    
    # keys are pairs of current communities, values are number of edges between the pair
    pairs = {}
    nodes = [node for node in graph]
    queue = PriorityQueue()  # initialise priority queue
    
    # initialise pairs dictionary
    for combination in list(combinations(nodes, 2)):
        
        if combination[1] in graph[combination[0]]:
            pairs[combination] = 1
        else:
            pairs[combination] = 0
    
    # initialise values
    prev_Q = 0
    total_Q = 0
    written = False
    new_pairs = pairs
    
    if graph_name == 'karate' or graph_name == 'test':
        print([set(i[0]) for i in list(communities.values())], 'Q = ' + str(total_Q))

    while len(communities) > 1:
        
        # add all community pairs to priority queue
        for pair, edges in new_pairs.items():

            Q = deltaQ(graph, communities, pair, edges, total_edges, total_endpoints)
            queue.put((Q * -1, pair))

        # get community pair with greatest increase in modularity and the nodes in the merged community
        while True:
            Q, pair = queue.get()
            if pair in pairs:
                break
        
        # update total Q and get list of nodes in these communities
        total_Q += Q * -1
        
        # write the best community decomposition to file
        if graph_name == 'yeast' and total_Q < prev_Q and not written:
            
            best_communities = [set(i[0]) for i in list(communities.values())]
            file = open("yeast_communities.txt", "w")

            # write nodes of each community on separate lines
            for community in best_communities:
                
                file.write(' '.join(node for node in community))
                file.write('\n')

            written = True
        
        # get nodes in these communities and the number of endpoints
        nodes = get_nodes(pair[0], pair[1], communities, num_nodes)
        endpoints = 0
        for node in nodes:
            endpoints += len(graph[node])

        # update communities
        del communities[pair[0]]
        del communities[pair[1]]
        
        # update pairs by removing entries containing nodes that have been merged 
        pairs_copy = pairs.copy()
        for p in pairs_copy:
            if pair[0] in p or pair[1] in p:
                del pairs[p]
        
        new_pairs = {}
        
        # add new pairs of the merged community with all other communities
        for i in communities.keys():
            
            edges = get_num_edges(graph, i, new_key, nodes, communities)
            pairs[i, new_key] = edges
            new_pairs[i, new_key] = edges

        communities[new_key] = (nodes, endpoints)
        
        if graph_name == 'karate' or graph_name == 'test':
            new_key += 1
            print([set(i[0]) for i in list(communities.values())], 'Q = {:.3f} '.format(round(total_Q, 3)))
        else:
            new_key += 'A'
            print('Q = ' + str(total_Q))
        
        prev_Q = total_Q

In [88]:
Newman_algorithm(test_graph, 'test')

[{1}, {2}, {3}, {4}, {5}, {6}, {7}] Q = 0
[{3}, {4}, {5}, {6}, {7}, {1, 2}] Q = 0.086 
[{4}, {5}, {6}, {7}, {1, 2, 3}] Q = 0.210 
[{4}, {5}, {1, 2, 3}, {6, 7}] Q = 0.296 
[{4}, {1, 2, 3}, {5, 6, 7}] Q = 0.420 
[{5, 6, 7}, {1, 2, 3, 4}] Q = 0.432 
[{1, 2, 3, 4, 5, 6, 7}] Q = 0.160 


In [87]:
Newman_algorithm(karate_graph, 'karate')

[{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}, {11}, {12}, {13}, {14}, {18}, {20}, {22}, {32}, {31}, {10}, {28}, {29}, {33}, {17}, {34}, {15}, {16}, {19}, {21}, {23}, {24}, {26}, {30}, {25}, {27}] Q = 0
[{1}, {2}, {3}, {4}, {5}, {7}, {8}, {9}, {11}, {12}, {13}, {14}, {18}, {20}, {22}, {32}, {31}, {10}, {28}, {29}, {33}, {34}, {15}, {16}, {19}, {21}, {23}, {24}, {26}, {30}, {25}, {27}, {17, 6}] Q = 0.012 
[{1}, {2}, {3}, {4}, {5}, {8}, {9}, {11}, {12}, {13}, {14}, {18}, {20}, {22}, {32}, {31}, {10}, {28}, {29}, {33}, {34}, {15}, {16}, {19}, {21}, {23}, {24}, {26}, {30}, {25}, {27}, {17, 6, 7}] Q = 0.036 
[{2}, {3}, {4}, {5}, {8}, {9}, {11}, {12}, {13}, {14}, {18}, {20}, {22}, {32}, {31}, {10}, {28}, {29}, {33}, {34}, {15}, {16}, {19}, {21}, {23}, {24}, {26}, {30}, {25}, {27}, {1, 17, 6, 7}] Q = 0.048 
[{2}, {3}, {4}, {8}, {9}, {11}, {12}, {13}, {14}, {18}, {20}, {22}, {32}, {31}, {10}, {28}, {29}, {33}, {34}, {15}, {16}, {19}, {21}, {23}, {24}, {26}, {30}, {25}, {27}, {1, 5, 6, 7, 17}] Q

**The results show that with more communities merging, the modularity increases up to the third last step in the algorithm. The modularity is 0.430 at this stage, which is its maximum, and there are 3 communities. If the algorithm goes further than this and merges 2 of the 3 communities together, the modularity decreases for the first time slightly to 0.422. In the final step when the remaining two communities are merged, the modularity decreases significantly to 0.050. This shows that these two communities are very distinct from each other, with very few edges between them to create such a drastic drop in the modularity. The results therefore clearly show two different groups, which is what happened as the karate club split into two separate clubs.**

Living cells can be considered as complex webs of macromolecular interactions known as *interactome networks*.  Finding communities in these networks can aid understanding of how the cells function.

4. [9 marks] Construct the network defined in `CCSB-Y2H.txt`.  The first two items on each line are a pair of nodes joined by an edge (the rest of the line can be ignored).  The network is of interactions amongst proteins in yeast. Run Newman's agglomerative algorithm on this network.  The best community decomposition found should be written to a text file with the nodes of each community on a separate line.  Are there many other decompositions of a similar quality?  (See the link in the text file for further information on this dataset.)


In [9]:
# load the yeast graph ignoring edge weights
graph = open("CCSB-Y2H.txt")

yeast_graph = {}
nodes = 0

for line in graph:
    
    data = line.split('\t')
    
    # skip the text at the beginning
    if len(data) == 3:
    
        node = data[0]
        neighbour = data[1]
        
        if node != neighbour:  # remove self loops

            if node in yeast_graph:
                yeast_graph[node].add(neighbour)
            else:
                yeast_graph[node] = set([neighbour])
                nodes += 1

            if neighbour in yeast_graph:
                yeast_graph[neighbour].add(node)
            else:
                yeast_graph[neighbour] = set([node])
                nodes += 1
        
print("Loaded graph with", nodes, "nodes")

Loaded graph with 1233 nodes


In [84]:
Newman_algorithm(yeast_graph, 'yeast') # max = 0.7686086017168975

Q = 0.0006091988468855475
Q = 0.001218397693771095
Q = 0.0018275965406566426
Q = 0.00243679538754219
Q = 0.0030459942344277374
Q = 0.003655193081313285
Q = 0.004264391928198832
Q = 0.00487359077508438
Q = 0.005482789621969928
Q = 0.006091988468855476
Q = 0.0067011873157410235
Q = 0.007310386162626571
Q = 0.007919585009512118
Q = 0.008528783856397666
Q = 0.009137982703283214
Q = 0.009747181550168762
Q = 0.01035638039705431
Q = 0.010965579243939858
Q = 0.011574778090825405
Q = 0.012183976937710953
Q = 0.012793175784596501
Q = 0.013402374631482049
Q = 0.014011573478367597
Q = 0.014620772325253144
Q = 0.015229971172138692
Q = 0.01583917001902424
Q = 0.016448368865909786
Q = 0.017057567712795332
Q = 0.01766676655968088
Q = 0.018275965406566425
Q = 0.01888516425345197
Q = 0.019494363100337517
Q = 0.020103561947223063
Q = 0.02071276079410861
Q = 0.021321959640994155
Q = 0.0219311584878797
Q = 0.022540357334765247
Q = 0.023149556181650793
Q = 0.02375875502853634
Q = 0.024367953875421886
Q = 0.

Q = 0.21367357116783056
Q = 0.2142818416409781
Q = 0.21489011211412565
Q = 0.2154983825872732
Q = 0.21610665306042073
Q = 0.21671492353356828
Q = 0.21732319400671582
Q = 0.21793146447986336
Q = 0.2185397349530109
Q = 0.21914800542615845
Q = 0.219756275899306
Q = 0.22036454637245353
Q = 0.22097281684560108
Q = 0.22158108731874862
Q = 0.22218935779189616
Q = 0.2227976282650437
Q = 0.22340589873819125
Q = 0.2240141692113388
Q = 0.22462243968448634
Q = 0.22523071015763388
Q = 0.22583898063078142
Q = 0.22644725110392896
Q = 0.2270555215770765
Q = 0.22766379205022405
Q = 0.2282720625233716
Q = 0.22888033299651914
Q = 0.22948860346966668
Q = 0.23009687394281422
Q = 0.23131007274365242
Q = 0.23241242372017162
Q = 0.23302069419331917
Q = 0.2336289646664667
Q = 0.23423723513961425
Q = 0.2348455056127618
Q = 0.23545359041116173
Q = 0.23606167520956167
Q = 0.2366697600079616
Q = 0.23727784480636155
Q = 0.2378859296047615
Q = 0.23849401440316143
Q = 0.23910209920156136
Q = 0.2397101839999613
Q = 0.

Q = 0.5371024564397758
Q = 0.5380382571676952
Q = 0.5389371086208414
Q = 0.5397124863668317
Q = 0.5404811798219082
Q = 0.541243188986071
Q = 0.5419985138593202
Q = 0.5433565389632887
Q = 0.5448668173602917
Q = 0.5469032979920018
Q = 0.5490162766197243
Q = 0.54966186771714
Q = 0.5502682814428115
Q = 0.550874695168483
Q = 0.5520696978440561
Q = 0.5526761115697276
Q = 0.5532823396206515
Q = 0.5538883819968279
Q = 0.5544944243730042
Q = 0.5551004667491806
Q = 0.5557065091253569
Q = 0.5569135806595242
Q = 0.5587209386526921
Q = 0.5593269810288685
Q = 0.5599330234050448
Q = 0.5605390657812211
Q = 0.5617394530244747
Q = 0.562345495400651
Q = 0.5629515377768274
Q = 0.563557394478256
Q = 0.5641630655049371
Q = 0.5647687365316182
Q = 0.5653744075582993
Q = 0.5659800785849803
Q = 0.5665857496116614
Q = 0.5671914206383425
Q = 0.5677970916650236
Q = 0.5684027626917046
Q = 0.5690082480436381
Q = 0.5696137333955716
Q = 0.5702192187475051
Q = 0.5708247040994386
Q = 0.5720174786780405
Q = 0.57322287913

Q = 0.7601238227756816
Q = 0.7605783545578138
Q = 0.7610046637783103
Q = 0.7614309729988069
Q = 0.7618565395203131
Q = 0.7622774641731291
Q = 0.7626584687552106
Q = 0.7630246193574838
Q = 0.7633571628304409
Q = 0.7636880352306694
Q = 0.764018907630898
Q = 0.7643358544250564
Q = 0.7646511301464863
Q = 0.7649647347951879
Q = 0.765276668371161
Q = 0.7655858168259201
Q = 0.7658932942079507
Q = 0.7661991005172529
Q = 0.7665032357538267
Q = 0.7667897318893782
Q = 0.7670114275380162
Q = 0.767232751837159
Q = 0.7674466491463977
Q = 0.767632695243496
Q = 0.7677751077749075
Q = 0.767912878437629
Q = 0.7680460072316604
Q = 0.7681744941570017
Q = 0.7682760582439405
Q = 0.7683739088359273
Q = 0.768470831054176
Q = 0.7685640397774728
Q = 0.7686086017168975
Q = 0.7686078590179071
Q = 0.7686071163189166
Q = 0.7686063736199262
Q = 0.7686056309209358
Q = 0.7686048882219454
Q = 0.768604145522955
Q = 0.7686034028239646
Q = 0.7686026601249741
Q = 0.7686019174259837
Q = 0.7686011747269933
Q = 0.768600432028

**The modularity of the best community decomposition found for this network is 0.7686086017168975. There are a large number of other decompositions of the same quality of around ~0.768, as shown in the results above.**

Let us describe an algorithm called *community finder*.  Rather than find a community decomposition, it just finds the community that a particular vertex $v$ belongs to.   The input to community finder is a network $N$, one of its nodes $v$ and a threshold value $a$ (a positive real number).  Let $d_i$ be the number of edges between nodes at distance $i$ from $v$ to nodes at distance $i+1$ from $v$.  So $d_0$ is just the degree of $v$.  Let $\Delta_i=d_i/d_{i-1}$.   Community finder calculates $\Delta_1, \Delta_2, \ldots$ and stops when it finds a value $\ell$ such that $\Delta_\ell < a$.  Then the community of $v$ is simply all nodes at distance at most $\ell$ from $v$.

5. [3 marks] Does the definition of community suggest that the community finder algorithm is likely to succeed in finding communities?  In what circumstances might it perform well or badly?

**The definition of community suggests that the community finder algorithm is unlikely to succeed at finding communities. It would perform well in graphs with a large diameter, as these graphs have greater distances between nodes and are less likely to run into the issue of having no more distances of size l before reaching a value of l that is less than a which terminates the algorithm. This means the algorithm would not terminate due to an error. However, in smaller graphs with a smaller diameter, the algorithm would perform badly because it would run into this problem more frequently and terminate early before finding a community, unless the parameter a was tuned finely for each vertex v.**

6. [6 marks] Implement community finder and apply to the network of `zachary.txt` and comment on the results compared to your findings of Question 3.  You will need to run it for various choices of $v$ and $a$.

In [10]:
# returns dictionary of distances from v as keys and nodes at those distance from v as values
def breadth_first_search(graph, v):
    
    # create queue containing source vertex
    queue = [v]
    distance_nodes_dict = {
        0: set([v])
    }
    nodes_distances_dict = {
        v: 0
    }
    
    while queue:
        
        u = queue.pop(0)
        u_distance = nodes_distances_dict[u]
        
        neighbours = graph[u]
        new_distance = u_distance + 1
        
        
        for neighbour in neighbours:
            
            if neighbour not in nodes_distances_dict:
                
                queue.append(neighbour)
                
                if new_distance not in distance_nodes_dict:
                    distance_nodes_dict[new_distance] = set()
            
                distance_nodes_dict[new_distance].add(neighbour)
                nodes_distances_dict[neighbour] = new_distance
    
    return distance_nodes_dict

In [11]:
# calculates delta di / delta di - 1
def delta_calculator(graph, distance_nodes_dict, v, i):
    
    edges_i = edges_i_minus_1 = 0
    
    # calculate the number of edges from nodes at distance i from v to nodes at distance i + 1 from v
    # and from nodes at distance i - 1 to nodes at distance i for di-1
    nodes_distance_i_minus_1 = distance_nodes_dict[i - 1]
    nodes_distance_i = distance_nodes_dict[i]
    nodes_distance_i_plus_1 = distance_nodes_dict[i + 1]
    
    for i in nodes_distance_i:
        
        for j in nodes_distance_i_plus_1:
            if i in graph[j]:
                edges_i += 1
        
        for k in nodes_distance_i_minus_1:
            if i in graph[k]:
                edges_i_minus_1 += 1
    
    delta_i = edges_i / edges_i_minus_1
    return delta_i

In [40]:
# finds and returns the community that v belongs to
def community_finder(graph, v, a):
    
    l = 0
    delta_l = float('inf')
    distance_nodes_dict = breadth_first_search(graph, v)
    
    # calculate delta l starting at l = 1 and stop if delta l found smaller than a
    while delta_l >= a:
        
        l += 1
        delta_l = delta_calculator(graph, distance_nodes_dict, v, l)
        print(l, delta_l)

    # add all nodes at distance <= l from v to community
    community = []
    for distance in distance_nodes_dict:
        if distance <= l:
            community += (list(distance_nodes_dict[distance]))
        
    return set(sorted(community))

In [43]:
community = community_finder(karate_graph, 3, 4)
print(community, len(community))

# Newman's algorithm results
#[{2, 3, 4, 8, 10, 13, 14, 18, 22}, {32, 33, 34, 9, 15, 16, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31}, 
#  {1, 5, 6, 7, 11, 12, 17, 20}] Q = 0.430
#[{32, 33, 34, 9, 15, 16, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31}, 
#  {1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 17, 18, 20, 22}] Q = 0.422

1 3.4
{1, 2, 3, 4, 33, 8, 9, 10, 14, 28, 29} 11


In [44]:
community = community_finder(karate_graph, 3, 2)
print(community, len(community))

1 3.4
2 0.20588235294117646
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 28, 29, 30, 31, 32, 33, 34} 31


In [47]:
community = community_finder(karate_graph, 12, 1.5)
print(community, len(community))

1 15.0
2 1.1333333333333333
{32, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 18, 20, 22} 17


In [50]:
community = community_finder(karate_graph, 12, 1.1)
print(community, len(community))

1 15.0
2 1.1333333333333333
3 1.0
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 20, 22, 25, 26, 28, 29, 31, 32, 33, 34} 26


In [53]:
community = community_finder(karate_graph, 20, 1.1)
print(community, len(community))

1 12.333333333333334
2 0.16216216216216217
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 27, 28, 29, 30, 31, 32, 33, 34} 31


In [54]:
community = community_finder(karate_graph, 20, 13)
print(community, len(community))

1 12.333333333333334
{1, 2, 20, 34} 4


In [63]:
community = community_finder(karate_graph, 17, 1.5)
print(community, len(community))

1 2.0
2 3.0
3 1.25
{32, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 17, 18, 20, 22} 18


In [64]:
community = community_finder(karate_graph, 17, 1.2)
print(community, len(community))

1 2.0
2 3.0
3 1.25
4 1.1333333333333333
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 20, 22, 25, 26, 28, 29, 31, 32, 33, 34} 26


**The results of community finder run on the karate club graph compared to Newman's algorithm are quite interesting. With larger values of a, the community found is smaller than the second-optimal communities found by Newman's algorithm which is the case for most test nodes. With smaller values of a, the community found is larger than the second-optimal ones found by Newman's algorithm. Whether they are smaller or larger, the nodes within the communities when compared are almost identical for all nodes tested.**

7. [6 marks] You could run community finder for every node in a network, but then you would have as many communities as there are nodes and they would overlap.  Suggest how community finder could be used as the basis of a community decomposition algorithm (that, as usual, partitions the nodes into disjoint sets) and test your idea on the network of `zachary.txt`.  

----
#### Newman's agglomerative algorithm for community detection
 
If you can find a library to provide the algorithm, you can use that although it might require you to define networks in a different way.  Here is an outline approach to the implementation.  You should first reread the description of the algorithm.

Define a dictionary called, say, `communities` that throughout the run of the algorithm will contains as keys the current communities (so the keys will keep changing as communities are merged).  The values in the dictionary will record the nodes in the community and the number of endpoints incident with those nodes.

Define a dictionary called, say, `pairs` that throughout the run of the algorithm will contains as keys each pair of current communities.  The values in the dictionary will record the number of edges between the pair.

Define a function called, say, `deltaQ` whose input is a pair of communities and whose output is the change in modularity if that pair of communities is merged. 

Create a priority queue that contains each pair of communities keyed by the change in modularity if that pair were merged.

Then the algorithm repeatedly picks the pair of communities from the priority queue whose merger will give the maximum change in modularity and merges that pair (and updates `communities` and `pairs` and pushes new pairs onto the priority queue).

For the priority queue you could use

```python
from queue import PriorityQueue
```

but note that this only allows queues from which you can pick the entry with the minimum value.  You can make such a queue perform as a *maximum* priority queue by multiplying all the keys by -1.

Below is the network from the lecture that you can use to test your implementation.

Note that, with a reasonable implementation, no computation required by this exercise should take more than a couple of minutes.