# Homework 5 - Visit the Wikipedia hyperlinks graph!
In this assignment we perform an analysis of the Wikipedia Hyperlink graph. In particular, given extra information about the categories to which an article belongs to, we are curious to rank the articles according to some criteria. 

In [1]:
import pandas as pd
import json
import pickle
from tqdm import tqdm

from collections import defaultdict
from heapq import *
import numpy as np
from tqdm import tqdm

## Research questions


### **[RQ1]** 
Build the graph <img src="https://latex.codecogs.com/gif.latex?G=(V,&space;E)" title="G=(V, E)" /> where *V* is the set of articles and *E* the hyperlinks among them, and provide its basic information:
 
- If it is direct or not
- The number of nodes
- The number of edges 
- The average node degree. Is the graph dense?

###### Build the graph!

In [2]:
F = open('wiki-topcats-reduced.txt','r') 
rows=F.read().split('\n') #split the rows
grafo={}#initialize the graph
reciver_nodes=set()
for row in rows:
        link=row.split('\t') 
        if link[0] not in grafo: #add the vertex if it doesn't exist
            try:
                grafo[link[0]]=set()
                grafo[link[0]].add(link[1]) #add the edge
                reciver_nodes.add(link[1])
            except: print('empty row')
        else:
            grafo[link[0]].add(link[1])
            reciver_nodes.add(link[1])
F.close()

empty row


In [3]:
onlyreciver=(reciver_nodes-set(grafo.keys()))
for node in onlyreciver:
    grafo[node]=set()

###### Find out if it's directed or not:

In [4]:
for neighbors in grafo['52']:
    print('52' in grafo[neighbors])

False
False
False


We see that it's directed, since we have unspecular edges.

###### Get the number of nodes!

In [5]:
nodes=len(grafo)
nodes

461194

###### Get the number of edges!

In [6]:
edges=0
for node in grafo:
    try: edges+=(len(grafo[node]))
    except: pass
edges

2645247

###### Get the average node degree. Is the graph dense?

In graph theory, the degree of a vertex of a graph is the number of edges incident to the vertex. The degree of a vertex $v$ is denoted $\deg(v)$.

The average degree is denoted as $\frac{E}{N}$:


In [7]:
avg_degree= edges/nodes
avg_degree

5.735649206190887

As we see, the average node degree is slightly great than six.
In mathematics, a dense graph is a graph in which the number of edges is close to the maximal number of edges.
Looking at our numbers, we can immediatly say that the graph is not dense, but very sparse.

To have a matematical confirm of this we can compute the density as $D={\frac{|E|}{|N|\,(|N|-1)}}$

In [8]:
density = edges/(nodes*(nodes-1))
density

1.2436548703451455e-05

Our hypothesis is verified.

For completeness we create a dictionary that maps every page with it's name:

In [9]:
F = open('wiki-topcats-page-names.txt','r')
articles={}
for line in F.readlines():
    num=line.split()[0]
    tit=line.split()[1:]
    title=' '.join(tit)
    articles[num]=title
F.close()

### **[RQ2]** 
Given a category <img src="https://latex.codecogs.com/gif.latex?C_0&space;=&space;\{article_1,&space;article_2,&space;\dots&space;\}" title="C_0 = \{article_1, article_2, \dots \}" /> as input we want to rank all of the nodes in *V* according to the following criteria:
	
* Obtain a *block-ranking*, where the blocks are represented by the categories. In particular, we want:

<img src="https://latex.codecogs.com/gif.latex?block_{RANKING}&space;=\begin{bmatrix}&space;C_0&space;\\&space;C_1&space;\\&space;\dots&space;\\&space;C_c\\&space;\end{bmatrix}" title="block_{RANKING} =\begin{bmatrix} C_0 \\ C_1 \\ \dots \\ C_c\\ \end{bmatrix}" />
	
Each category $C_i$ corresponds to a list of nodes. 

The first category of the rank, $C_0$, always corresponds to the input category. The order of the remaining categories is given by:

<img src="https://latex.codecogs.com/gif.latex?$$distance(C_0,&space;C_i)&space;=&space;median(ShortestPath(C_0,&space;C_i))$$" title="distance(C_0, C_i) = median(ShortestPath(C_0, C_i))" />

The lower is the distance from $C_0$, the higher is the $C_i$ position in the rank. $ShortestPath(C_0, C_i)$ is the set of all the possible shortest paths between the nodes of $C_0$  and $C_i$. Moreover, the length of a path is given by the sum of the weights of the edges it is composed by.



###### Creating the categories dictionary

We create a dictionary that associate evry category with his articles.

In [4]:
F = open('wiki-topcats-categories.txt','r')
categorie={}
for line in F.readlines():
    riga=line.split(' ')
    categoria=(riga[0].replace('Category:','').replace(';',''))
    articles=(riga[1:-1])
    articles.append(riga[-1].replace('\n',''))
    categorie[categoria]= articles
F.close()

We take care of categories that contains only articles that are not in our graph:

We must take into account all the categories that has a number of articles greater than 3500, so we clean up:

In [5]:
categories={}
for cat in categorie:
    if len(categorie[cat])>=3500:
        categories[cat]=categorie[cat]

We remove every article that is not in our graph:

In [9]:
for cat in categories:
    toremove=[]
    for art in categories[cat]:
        if art not in grafo:
            toremove.append(art)
    for art in toremove:
        categories[cat].remove(art)

        

We store our final categories dictionary in an external pickle file so we can reload it in need.

In [12]:
with open('categories.pickle', 'wb') as handle:
    pickle.dump(categories, handle, protocol=pickle.HIGHEST_PROTOCOL)


In [4]:
with open('categories.pickle', 'rb') as handle:
    categories = pickle.load(handle)

In [5]:
len(categories)

35

###### The block ranking

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/DijkstraDemo.gif/220px-DijkstraDemo.gif)

We can try to solve the problem with de dijkstra algorithm:

In [13]:
def dijkstra(graph, f, t):
    #sorting the graph
    g = defaultdict(list, grafo)
    q= [(0,f)] #intializing the queque
    seen=set() #initializing the set of seen nodes
    distances = {f: 0} #initializing the dict of the distances
    while q:
        (cost,v1) = heappop(q)
        if v1 not in seen:
            seen.add(v1)
            if v1 == t: return (cost)
            if g.get(v1) is not None: 
                for v2 in g.get(v1):
                    if v2 in seen: continue
                    prev = distances.get(v2, None)
                    new = cost + 1
                    if prev is None or new < prev:
                        distances[v2] = new
                        heappush(q, (new, v2))

    return float("inf")


In [14]:
def catdistance(graph, catdict, cat1, cat2):
    cat1nodes= catdict[cat1]
    cat2nodes= catdict[cat2]
    distances=[]
    for node in cat1nodes:
        for node2 in cat2nodes:
            distances.append(dijkstra(graph=graph, f=node, t=node2))
    return np.median(distances)

In [15]:
catdistance(cat1='Japanese_rock_music_groups', cat2='Visual_kei_bands', catdict=categories, graph=grafo)


2.5

This works but is really slow. We want something more efficient. We can try a different approach using BFS algorithm, since the edges are not weighted.

BFS should be faster since his running time is something like $	\mathcal{O}{\bigl (}N+E{\bigr )} $, while dijkstra's is ${\displaystyle \mathcal{O}{\bigl (}E+N^{2}{\bigr )}=\mathcal{O}{\bigl (}N^{2}{\bigr )}}$

Creating a function that takes a graph and a node as input and return a dictionary of his distance from every other node:

In [80]:
def bfs(graph, node):
    shortestpath={}
    visited=set()
    queue=[node]
    dist=0
    while queue:
        vicini = [] #neighbours of the nodes
        for vertex in queue:
            if vertex not in visited:
                visited.add(vertex)
                neighbours = graph[vertex]
                vicini.extend(neighbours - visited) 
                shortestpath[vertex] = dist
        queue=vicini
        dist += 1
    return shortestpath

Now we want a function that collects the bfs that starts from every node of the input category.

We also want to store this list in an external file since it's a long operation.

In [88]:
def solutiondata(graph,inp_cat, catdict):
    inpnodes=catdict[inp_cat]
    bfssol=[]
    for node in tqdm(inpnodes):
        bfssol.append(bfs(grafo, node))
    with open('bfssol.pickle', 'wb') as handle:
        pickle.dump(bfssol, handle, protocol=pickle.HIGHEST_PROTOCOL)


Solving the task:

In [8]:
inputcat=input()

Year_of_birth_unknown


In [89]:
solutiondata(catdict=categories, graph=grafo, inp_cat=inputcat)







  0%|                                                                                         | 0/2264 [00:00<?, ?it/s]





  0%|                                                                                 | 1/2264 [00:00<20:36,  1.83it/s]





  0%|                                                                                 | 2/2264 [00:01<20:31,  1.84it/s]





  0%|                                                                                 | 3/2264 [00:01<20:26,  1.84it/s]





  0%|▏                                                                                | 4/2264 [00:02<20:26,  1.84it/s]





  0%|▎                                                                                | 8/2264 [00:02<15:56,  2.36it/s]





  0%|▎                                                                                | 9/2264 [00:03<17:29,  2.15it/s]





  0%|▎                                                                               | 10/2264 [00:03<18:33,  2.03it/s]

  7%|█████▍                                                                         | 157/2264 [01:14<18:06,  1.94it/s]





  7%|█████▌                                                                         | 158/2264 [01:14<18:25,  1.91it/s]





  7%|█████▌                                                                         | 159/2264 [01:15<18:47,  1.87it/s]





  7%|█████▌                                                                         | 160/2264 [01:15<19:01,  1.84it/s]





  7%|█████▌                                                                         | 161/2264 [01:16<19:03,  1.84it/s]





  7%|█████▋                                                                         | 162/2264 [01:16<19:17,  1.82it/s]





  7%|█████▋                                                                         | 163/2264 [01:17<19:24,  1.80it/s]





  7%|█████▋                                                                         | 164/2264 [01:18<19:20,  1.81it/s]







 13%|██████████▎                                                                    | 295/2264 [02:25<19:18,  1.70it/s]





 13%|██████████▍                                                                    | 298/2264 [02:26<15:32,  2.11it/s]





 13%|██████████▍                                                                    | 299/2264 [02:26<16:25,  1.99it/s]





 13%|██████████▍                                                                    | 300/2264 [02:27<17:01,  1.92it/s]





 13%|██████████▌                                                                    | 301/2264 [02:27<17:26,  1.88it/s]





 13%|██████████▌                                                                    | 302/2264 [02:28<17:51,  1.83it/s]





 13%|██████████▌                                                                    | 303/2264 [02:29<18:04,  1.81it/s]





 14%|██████████▋                                                                    | 306/2264 [02:29<14:25,  2.26it/s]







 20%|████████████████                                                               | 460/2264 [03:39<17:05,  1.76it/s]





 20%|████████████████                                                               | 461/2264 [03:39<17:20,  1.73it/s]





 20%|████████████████                                                               | 462/2264 [03:40<16:58,  1.77it/s]





 20%|████████████████▏                                                              | 463/2264 [03:40<16:44,  1.79it/s]





 20%|████████████████▏                                                              | 464/2264 [03:41<16:32,  1.81it/s]





 21%|████████████████▎                                                              | 466/2264 [03:41<13:59,  2.14it/s]





 21%|████████████████▎                                                              | 467/2264 [03:42<14:59,  2.00it/s]





 21%|████████████████▎                                                              | 468/2264 [03:43<15:24,  1.94it/s]







 27%|█████████████████████                                                          | 603/2264 [04:52<12:39,  2.19it/s]





 27%|█████████████████████                                                          | 604/2264 [04:53<13:36,  2.03it/s]





 27%|█████████████████████                                                          | 605/2264 [04:53<14:21,  1.93it/s]





 27%|█████████████████████▏                                                         | 606/2264 [04:54<14:29,  1.91it/s]





 27%|█████████████████████▏                                                         | 608/2264 [04:54<12:20,  2.24it/s]





 27%|█████████████████████▎                                                         | 609/2264 [04:55<13:07,  2.10it/s]





 27%|█████████████████████▎                                                         | 611/2264 [04:55<11:25,  2.41it/s]





 27%|█████████████████████▎                                                         | 612/2264 [04:56<12:24,  2.22it/s]







 34%|██████████████████████████▉                                                    | 773/2264 [06:07<13:28,  1.84it/s]





 34%|███████████████████████████                                                    | 774/2264 [06:07<13:36,  1.82it/s]





 34%|███████████████████████████                                                    | 775/2264 [06:08<13:35,  1.83it/s]





 34%|███████████████████████████                                                    | 776/2264 [06:08<13:28,  1.84it/s]





 34%|███████████████████████████                                                    | 777/2264 [06:09<13:22,  1.85it/s]





 34%|███████████████████████████▏                                                   | 778/2264 [06:09<13:19,  1.86it/s]





 34%|███████████████████████████▏                                                   | 779/2264 [06:10<13:16,  1.86it/s]





 34%|███████████████████████████▏                                                   | 780/2264 [06:10<13:16,  1.86it/s]







 41%|████████████████████████████████▌                                              | 933/2264 [07:18<11:58,  1.85it/s]





 41%|████████████████████████████████▌                                              | 934/2264 [07:18<11:57,  1.85it/s]





 41%|████████████████████████████████▋                                              | 935/2264 [07:19<12:01,  1.84it/s]





 41%|████████████████████████████████▋                                              | 936/2264 [07:19<12:01,  1.84it/s]





 41%|████████████████████████████████▋                                              | 937/2264 [07:20<12:02,  1.84it/s]





 41%|████████████████████████████████▋                                              | 938/2264 [07:20<12:00,  1.84it/s]





 41%|████████████████████████████████▊                                              | 939/2264 [07:21<11:58,  1.84it/s]





 42%|████████████████████████████████▉                                              | 943/2264 [07:21<09:15,  2.38it/s]







 48%|█████████████████████████████████████                                         | 1077/2264 [08:32<10:58,  1.80it/s]





 48%|█████████████████████████████████████▏                                        | 1078/2264 [08:33<10:55,  1.81it/s]





 48%|█████████████████████████████████████▏                                        | 1079/2264 [08:33<10:55,  1.81it/s]





 48%|█████████████████████████████████████▏                                        | 1080/2264 [08:34<11:08,  1.77it/s]





 48%|█████████████████████████████████████▏                                        | 1081/2264 [08:34<11:04,  1.78it/s]





 48%|█████████████████████████████████████▎                                        | 1082/2264 [08:35<11:13,  1.76it/s]





 48%|█████████████████████████████████████▎                                        | 1083/2264 [08:36<11:14,  1.75it/s]





 48%|█████████████████████████████████████▎                                        | 1084/2264 [08:36<11:06,  1.77it/s]







 55%|██████████████████████████████████████████▌                                   | 1236/2264 [09:45<09:24,  1.82it/s]





 55%|██████████████████████████████████████████▌                                   | 1237/2264 [09:46<09:21,  1.83it/s]





 55%|██████████████████████████████████████████▋                                   | 1238/2264 [09:47<09:19,  1.83it/s]





 55%|██████████████████████████████████████████▋                                   | 1240/2264 [09:47<07:54,  2.16it/s]





 55%|██████████████████████████████████████████▊                                   | 1244/2264 [09:48<06:15,  2.72it/s]





 55%|██████████████████████████████████████████▉                                   | 1245/2264 [09:48<07:56,  2.14it/s]





 55%|██████████████████████████████████████████▉                                   | 1246/2264 [09:49<09:07,  1.86it/s]





 55%|██████████████████████████████████████████▉                                   | 1247/2264 [09:50<10:45,  1.58it/s]







 61%|███████████████████████████████████████████████▊                              | 1387/2264 [11:04<08:11,  1.79it/s]





 61%|███████████████████████████████████████████████▊                              | 1388/2264 [11:05<08:06,  1.80it/s]





 61%|███████████████████████████████████████████████▊                              | 1389/2264 [11:05<08:02,  1.81it/s]





 61%|███████████████████████████████████████████████▉                              | 1390/2264 [11:06<07:58,  1.83it/s]





 61%|███████████████████████████████████████████████▉                              | 1391/2264 [11:06<07:57,  1.83it/s]





 61%|███████████████████████████████████████████████▉                              | 1392/2264 [11:07<08:00,  1.82it/s]





 62%|███████████████████████████████████████████████▉                              | 1393/2264 [11:07<08:24,  1.73it/s]





 62%|████████████████████████████████████████████████                              | 1394/2264 [11:08<08:15,  1.75it/s]







 68%|█████████████████████████████████████████████████████▏                        | 1544/2264 [12:19<06:51,  1.75it/s]





 68%|█████████████████████████████████████████████████████▏                        | 1545/2264 [12:20<06:57,  1.72it/s]





 68%|█████████████████████████████████████████████████████▎                        | 1546/2264 [12:21<07:03,  1.69it/s]





 68%|█████████████████████████████████████████████████████▎                        | 1547/2264 [12:21<07:00,  1.71it/s]





 68%|█████████████████████████████████████████████████████▎                        | 1548/2264 [12:22<06:56,  1.72it/s]





 68%|█████████████████████████████████████████████████████▍                        | 1550/2264 [12:22<05:53,  2.02it/s]





 69%|█████████████████████████████████████████████████████▍                        | 1551/2264 [12:23<06:14,  1.91it/s]





 69%|█████████████████████████████████████████████████████▍                        | 1552/2264 [12:23<06:34,  1.81it/s]







 74%|█████████████████████████████████████████████████████████▉                    | 1680/2264 [13:37<06:01,  1.62it/s]





 74%|█████████████████████████████████████████████████████████▉                    | 1681/2264 [13:37<05:45,  1.69it/s]





 74%|█████████████████████████████████████████████████████████▉                    | 1682/2264 [13:38<05:34,  1.74it/s]





 74%|█████████████████████████████████████████████████████████▉                    | 1683/2264 [13:38<05:26,  1.78it/s]





 74%|██████████████████████████████████████████████████████████                    | 1684/2264 [13:39<05:21,  1.80it/s]





 74%|██████████████████████████████████████████████████████████                    | 1685/2264 [13:39<05:17,  1.82it/s]





 74%|██████████████████████████████████████████████████████████                    | 1686/2264 [13:40<05:15,  1.83it/s]





 75%|██████████████████████████████████████████████████████████                    | 1687/2264 [13:40<05:12,  1.85it/s]







 81%|███████████████████████████████████████████████████████████████▎              | 1836/2264 [14:49<03:59,  1.78it/s]





 81%|███████████████████████████████████████████████████████████████▎              | 1837/2264 [14:50<03:58,  1.79it/s]





 81%|███████████████████████████████████████████████████████████████▎              | 1838/2264 [14:50<03:58,  1.79it/s]





 81%|███████████████████████████████████████████████████████████████▎              | 1839/2264 [14:51<03:55,  1.81it/s]





 81%|███████████████████████████████████████████████████████████████▍              | 1840/2264 [14:51<03:53,  1.82it/s]





 81%|███████████████████████████████████████████████████████████████▍              | 1841/2264 [14:52<03:52,  1.82it/s]





 81%|███████████████████████████████████████████████████████████████▍              | 1842/2264 [14:53<03:50,  1.83it/s]





 81%|███████████████████████████████████████████████████████████████▍              | 1843/2264 [14:53<03:49,  1.84it/s]







 88%|████████████████████████████████████████████████████████████████████▎         | 1981/2264 [16:04<02:18,  2.05it/s]





 88%|████████████████████████████████████████████████████████████████████▎         | 1982/2264 [16:04<02:23,  1.96it/s]





 88%|████████████████████████████████████████████████████████████████████▎         | 1983/2264 [16:05<02:27,  1.91it/s]





 88%|████████████████████████████████████████████████████████████████████▎         | 1984/2264 [16:06<02:32,  1.83it/s]





 88%|████████████████████████████████████████████████████████████████████▍         | 1985/2264 [16:06<02:35,  1.79it/s]





 88%|████████████████████████████████████████████████████████████████████▍         | 1986/2264 [16:07<02:39,  1.75it/s]





 88%|████████████████████████████████████████████████████████████████████▍         | 1987/2264 [16:07<02:39,  1.74it/s]





 88%|████████████████████████████████████████████████████████████████████▍         | 1988/2264 [16:08<02:39,  1.73it/s]







 96%|██████████████████████████████████████████████████████████████████████████▋   | 2169/2264 [17:18<00:54,  1.74it/s]





 96%|██████████████████████████████████████████████████████████████████████████▊   | 2170/2264 [17:18<00:53,  1.76it/s]





 96%|██████████████████████████████████████████████████████████████████████████▊   | 2171/2264 [17:19<00:52,  1.76it/s]





 96%|██████████████████████████████████████████████████████████████████████████▊   | 2172/2264 [17:19<00:51,  1.78it/s]





 96%|██████████████████████████████████████████████████████████████████████████▉   | 2174/2264 [17:20<00:43,  2.09it/s]





 96%|██████████████████████████████████████████████████████████████████████████▉   | 2175/2264 [17:20<00:44,  1.99it/s]





 96%|██████████████████████████████████████████████████████████████████████████▉   | 2176/2264 [17:21<00:48,  1.83it/s]





 96%|███████████████████████████████████████████████████████████████████████████   | 2177/2264 [17:22<00:49,  1.77it/s]







In [12]:
with open('bfssol.pickle', 'rb') as handle:
    bfssol = pickle.load(handle)

Now we want to create a dict that has as key every node of the graph and as values every shortespath from the nodes of the input category.

In [13]:
finaldict = {}

for diz in (bfssol):
    for vert, value in diz.items():
        if vert not in finaldict:
            finaldict[vert] = [value] 
        else:
            finaldict[vert].append(value)

In [14]:
with open('finaldict.pickle', 'wb') as handle:
    pickle.dump(finaldict, handle, protocol=pickle.HIGHEST_PROTOCOL)
    

In [6]:
with open('finaldict.pickle', 'rb') as handle:
    finaldict = pickle.load(handle)

Now we can finally build our blockranking:

In [9]:
blockranking=[(0, inputcat)]
blockranking

[(0, 'Year_of_birth_unknown')]

In [10]:
lung= len(categories[inputcat])
for cat in tqdm(categories):
    if cat!=inputcat:

        paths=[]
        catnodes=categories[cat]
        for node in catnodes:
            if node in finaldict:
                paths+=(finaldict[node])
                imppaths= lung-len(finaldict[node])
                paths+=([float('inf')]*imppaths)
            else:
                paths+=([float('inf')]*lung)
        median=np.median(paths)
        if median==float('inf'):
            median=10000+paths.count(float('inf'))
        blockranking.append(( median, cat))


100%|██████████████████████████████████████████████████████████████████████████████████| 35/35 [01:57<00:00,  1.11it/s]


In [11]:
print(sorted(blockranking))

[(0, 'Year_of_birth_unknown'), (6.0, 'American_film_actors'), (6.0, 'American_films'), (6.0, 'American_television_actors'), (6.0, 'British_films'), (6.0, 'English-language_films'), (6.0, 'English_television_actors'), (7.0, 'American_Jews'), (7.0, 'Article_Feedback_Pilot'), (7.0, 'Black-and-white_films'), (7.0, 'Indian_films'), (7.0, 'Members_of_the_United_Kingdom_Parliament_for_English_constituencies'), (7.0, 'People_from_New_York_City'), (8.0, 'Debut_albums'), (8.0, 'English-language_albums'), (8.0, 'Fellows_of_the_Royal_Society'), (8.0, 'Rivers_of_Romania'), (9.0, 'Place_of_birth_missing_(living_people)'), (9.0, 'Windows_games'), (10.0, 'American_military_personnel_of_World_War_II'), (10.0, 'Living_people'), (10.0, 'The_Football_League_players'), (11.0, 'English_cricketers'), (11.0, 'Harvard_University_alumni'), (13.0, 'English_footballers'), (4971590, 'Association_football_goalkeepers'), (5502511, 'Association_football_defenders'), (5657119, 'Major_League_Baseball_pitchers'), (57220

In [135]:
len(categories[inputcat])

2264

In [167]:
categories.keys()

dict_keys(['English_footballers', 'The_Football_League_players', 'Association_football_forwards', 'Association_football_goalkeepers', 'Association_football_midfielders', 'Association_football_defenders', 'Living_people', 'Year_of_birth_unknown', 'Harvard_University_alumni', 'Major_League_Baseball_pitchers', 'Members_of_the_United_Kingdom_Parliament_for_English_constituencies', 'Indian_films', 'Year_of_death_missing', 'English_cricketers', 'Year_of_birth_missing_(living_people)', 'Rivers_of_Romania', 'Main_Belt_asteroids', 'Asteroids_named_for_people', 'English-language_albums', 'English_television_actors', 'British_films', 'English-language_films', 'American_films', 'Fellows_of_the_Royal_Society', 'People_from_New_York_City', 'American_Jews', 'American_television_actors', 'American_film_actors', 'Debut_albums', 'Black-and-white_films', 'Year_of_birth_missing', 'Place_of_birth_missing_(living_people)', 'Article_Feedback_Pilot', 'American_military_personnel_of_World_War_II', 'Windows_gam

In [12]:
numeri= [1,2,3]

In [20]:
queue=[]

In [18]:
queue.extend(numeri)

In [22]:
queue=numeri

In [23]:
queue

[1, 2, 3]


* Once you obtain the $"block_{RANKING}"$ vector, you want to sort the nodes in each category. The way you should sort them is explained by this example:

	*	Suppose the categories order, given from the previous point, is <img src="https://latex.codecogs.com/gif.latex?C_0,&space;C_1,&space;C_2" title="C_0, C_1, C_2" />


__[STEP1]__ Compute subgraph induced by <img src="https://latex.codecogs.com/gif.latex?C_0" title="C_0" />. For each node compute the sum of the weigths of the in-edges.

 <img src="https://latex.codecogs.com/gif.latex?score_{article_i}&space;=&space;\sum_{i&space;\in&space;in-edges}&space;w_i" title="score_{article_i} = \sum_{j \in in-edges(article_i)} w_j" />

__[STEP2]__ Extend the graph to the nodes that belong to <img src="https://latex.codecogs.com/gif.latex?C_1" title="C_1" />. Thus, for each article in <img src="https://latex.codecogs.com/gif.latex?C_1" title="C_1" /> compute the score as before. __Note__ that the in-edges coming from the previous category, <img src="https://latex.codecogs.com/gif.latex?C_0" title="C_0" />, have as weights the score of the node that sends the edge.


__[STEP3]__ Repeat Step2 up to the last category of the ranking. In the last step of the example you clearly see the weight update of the edge coming from node *E*.
	
![alt text](imgs/algorithm.PNG)
