## Self study 4

In this self study we implement pagerank, and test it on some real graphs.

In [None]:
import numpy as np
import networkx as nx

We are using a social network consisting of 71 lawyers. A description of the network and the original data can be found here:

http://moreno.ss.uci.edu/data.html#lazega

Of the three different relationships included in the data we will only be using the 'friendship' relation. This is a directed relationship, i.e., friends(a,b) does not necessarily imply friends(b,a) according to the data.

We load a version of the Lazega network data that only contains the 'friends' edges:

In [None]:
lazega=nx.readwrite.graphml.read_graphml('lazega.gml')

The nodes in the graph have the attributes "Practice" "Age" "Seniority" "Office" "Gender" "Status" . To obtain a dictionary with the values for a specified attribute for all nodes, we can use:

In [None]:
nx.get_node_attributes(lazega,'Office')

The following is a little helper function that returns an array of attribute values of nodes according to the order in which nodes are returned by the G.nodes() function

In [None]:
def get_att_array(G,att_name):
    ret_array=np.zeros(nx.number_of_nodes(G))
    for i,n in enumerate(G.nodes()):
        ret_array[i]=G.nodes[n][att_name]
    return(ret_array)

We can use this to plot the graph using the Kamada Kawai layout algorithm, and nodes colored according to the Office attribute (Note: if you run the following command once, you may only get a deprecation warning, run it again to get the plot ...):

In [None]:
nx.draw_kamada_kawai(lazega,with_labels=True,node_color=get_att_array(lazega,'Office'))

**Task 1:** Implement the computation of PageRank on the Lazega graph (or on any NetworkX graph in general -- this makes no real difference). At the end, each node should have an additional attribute 'PR' whose value is equal to the PageRank of the node.

Some useful functions: to set an attribute value for a single node, one can use (a bit counterintuitively) the add_node function:

In [None]:
lazega.add_node(3,PR=0.29)

nx.get_node_attributes(lazega,'PR')

To set an attribute for all nodes, one can use the set_node_attributes functions, which takes a dictionary of node:value pairs to define the attribute:


In [None]:
newatt = {}
newatt['3']={'newatt':0.8}
newatt['15']={'newatt':1.2}



nx.set_node_attributes(lazega,newatt)
nx.get_node_attributes(lazega,'newatt')

To get all the neighbors of a node reachable by outgoing edges, we can use:

In [None]:
nodes = ('5')
for e in nx.edges(lazega, nbunch=nodes):
    print(e)

**Task 2** The definition of PageRank depends on the parameter 'd' that determines the relative weight of the 'teleportation' transitions (random restart of the random walk at an arbitrary node). Try your PR computations with different settings of 'd' and compare the results.

**Task 3** What happens when you drop the directions of the edges? Verify that now the PageRank reduces to the degree centrality of nodes.

To obtain an undirected version, you can use:

In [None]:
u_lazega=lazega.to_undirected()

**Task 3** To face a somewhat bigger challenge, use the Stanford web graph you can download from http://snap.stanford.edu/data/web-Stanford.html Does your implementation scale to this size of graph?

You can read the web-Stanford.txt file using:

In [None]:
stanford=nx.read_edgelist('web-Stanford.txt',nodetype=int,create_using=nx.DiGraph()) 