# Social Computing - Summer 2018

# Exercise 2 - Centrality
Centrality is a key concept in social network analysis. It measures the importance or influence of a certain node/edge in a network. The interpretation of importance or influence, however, depends on the type of centrality and the application for which it is measured. Different types of centrality were discussed in the lecture: Degree Centrality, Closeness Centrality, Betweenness Centrality, and Eigenvector Centrality.
In this exercise, you are going to implement different centrality algorithms.

### Introduction Problem: The Krackhardt Kite Graph
We will use the Krackhardt Kite for the first exercise. As you know from exercise 1, the Krackhardt Kite is a simple connected graph: unweighted and undirected. The following figure illustrates the Krackhardt Kite. <p>

<b>Plot the graph to make sure that all packages are correctly installed. Then calculate the degree centrality of the Krackhardt Kite Graph (just a list of 10 values - one for each node). You can use the implemented function of the igraph library. </b> <p>

Optional: Look at the graph and the list with the degree centrality values. Can you identify which node has which degree centrality?

<b>Important Note:</b><br>
Remember from exercise 1: You need to install Cairo graphics library for Python to enable the igraph plotting functionality on IPython Notebook
http://cairographics.org/pycairo/ (installation tutorial can be found on Piazza)

In [None]:
import igraph
#Import the Graph
krackhardt_kite = igraph.Graph.Famous('Krackhardt_Kite') # Connected, Unweighted, undirected social network
#Formatting the Graph
visual_style = {}
visual_style["vertex_size"] = 30
visual_style["bbox"] = (300, 300)
visual_style["margin"] = 20
#Plot the Graph
igraph.plot(krackhardt_kite, **visual_style)
# If the graph is plotted below, all libaries are correctly installed

#TODO Calculate the degree centrality for the 10 nodes in the graph.
print(krackhardt_kite.degree())
print(krackhardt_kite.closeness())
print(krackhardt_kite.betweenness())

## Problem 1.1 Degree Centrality
Now you are working with an anonymized social network represented by the three files: FriendshipNetwork.graphml, FriendshipNetwork_Like.graphml, FriendshipNetwork_Comment.graphml. You can find them on Piazza. <p>
The nodes in all of these graphs are user profiles. The edges are the friend relationships (FriendshipNetwork.graphml), the exchanged likes (FriendshipNetwork_Like.graphml), and the comments written to others (FriendshipNetwork_Comment.graphml). <p>

<b>Your task in this exercise is to calculate the degree centrality of all the nodes in the three graphs. Also plot at least one of the graphs. </b><p>

Degree centrality of a graph node is the number of edges (incoming and outgoing) of that node.
Using igraph, write a Python program that calculates the degree centrality of each node in the three graphs. The output should be a list of integers (nodes with a centrality of 0 do not need to be listed, but can be). <p>

It is interesting to notice that the degree centrality for the three graphs can be quite different - even though the graphs are extracted from the same social network.

In [1]:
from igraph import *

#import your Graph here:
friends = Graph.Read_Ncol("friendship.csv")
likes = Graph.Read_Ncol("friendship_like.csv")
comments = Graph.Read_Ncol("friendship_comment.csv")

#visual formatation for the Graph
visual_style = {}
visual_style["vertex_size"] = 30
visual_style["vertex_label"] = friends.vs["name"]
visual_style["bbox"] = (800, 800)
visual_style["margin"] = 50

#insert your code here
def calc(g):
    degrees = []
    adjlist = g.get_adjlist()
    for i in range(0, g.vcount()):
        sum = 0
        for j in range(0, len(adjlist)):
            if i == j:
                sum += len(adjlist[j])
            elif i in adjlist[j]:
                sum += 1
        degrees.append(sum)
    return degrees

#plot one of the graphs
plot(friends, **visual_style)
print(calc(friends))
visual_style["vertex_label"] = likes.vs["name"]
plot(likes, **visual_style)
print(calc(likes))
visual_style["vertex_label"] = comments.vs["name"]
plot(comments, **visual_style)
print(calc(comments))

ImportError: No module named igraph

## Problem 1.2 Closeness Centrality
Closeness centrality measures how close a node is to the other nodes in the graph. This is calculated via the sum of distances from that node to all the other nodes in the graph. <p>
<b>Write a python program that computes the closeness centrality for each node in the Like-Graph </b> (FriendshipNetwork_Like.graphml) from our social network! The output should be a list where each item contains the value of the closeness centrality of a node.

<p><b>Remember:</b></p>

<li>Calculating the shortest paths is a common problem, maybe there is a pre-defined function for that?</li>
<li>The formula for the calculation can be found in the documentation or on Wikipedia</li>
<li>You are <b>not allowed</b> to use the pre-defined function closeness(), but you can use it as in inspiration</li>
<li>The edges of the Like-Graph have weights: the number of likes that were sent from one node to the other.</li>
<li>When you calculate the shortest path you need to take the weights of the edges into account </li>
<li>You can print the node namelist with print <b>YOURGRAPHNAME.vs["name"]</b></li>
<li>You can print the edge list with <b>print YOURGRAPHNAME</b></li>


In [None]:
from igraph import *

likes = Graph.Read_Ncol("friendship_like.csv")

#visual style for the Graph
visual_style = {}
visual_style["vertex_size"] = 30
visual_style["vertex_label"] = likes.vs["name"]
visual_style["bbox"] = (800, 800)
visual_style["margin"] = 50

#TODO: Import graph and define Graph Style

#TODO: Calculate all shortest paths of each node

#TODO: Calculate a list where each item contains the value of the closeness centrality of a node
centralities = []
vcount = likes.vcount()
paths = likes.shortest_paths_dijkstra()
for i in range(0, vcount):
    sum = 0
    for j in paths[i]:
        if isinstance(j, (int, long)):
            sum += j
        else:
            sum += 1
    centralities.append(float(vcount - 1) / sum)

print(centralities)

## Problem 1.3 Betweenness centrality

Betweenness centrality measures centrality based on shortest paths. For every pair of vertices in a graph there exists a shortest path between the vertices such that either the number of edges that the path passes through (for undirected graphs) or the sum of the weights of the edges (for directed graphs) is minimized. <p>
Vertices with high betweenness may have considerable influence within a network by virtue of their control over information passing between others. <p>

This time you are working with the FriendshipNetwork.graphml. <p>

<b>Calculate the betweenness centrality with the help of the pre-defined functios of the igraph library. Interpret the resulting values based on two exemplary nodes. </b> To do that pick two nodes and explain how their betweenness centrality links to the graph structure. Name the two nodes that you discussed (and their betweenness centrality). (Do not write more than 5 sentences)

<p><b>Remember:</b></p>

<li>You can print the node namelist with print <b>YOURGRAPHNAME.vs["name"]</b></li>


In [26]:
from igraph import *

# use FriendshipNetwork.graphml
friends = Graph.Read_Ncol("friendship.csv")

#visual formatation for the Graph
visual_style = {}
visual_style["vertex_size"] = 30
visual_style["vertex_label"] = friends.vs["name"]
visual_style["bbox"] = (800, 800)
visual_style["margin"] = 50

print(friends.betweenness())

[189.10428813734046, 635.6578088806694, 39.675072150072154, 12.599633699633696, 774.3561377396086, 33.19813194234538, 397.83715893421197, 295.43924827905386, 13555.268982404457, 3735.640341293793, 4348.50282720292, 169.64285885479734, 14.801828341749927, 91.63172512231128, 5.964917932309236, 39.51563041476725, 229.11186371100166, 0.0, 576.308624203481, 1135.971146818244, 147.77372600052914, 19.052319902319898, 154.38494955564306, 44.896280461028766, 22.839263163798304, 11.828455123308059, 0.0, 1144.4619413623464, 0.0, 382.4065392071154, 2536.1030560634417, 6.257936507936509, 259.6841686091686, 1118.8969322469438, 535.3241079970754, 42.69780724916402, 59.808964553018285, 427.14038894312137, 245.25159366777015, 10.287276612276612, 491.0561165919653, 31.41537100595521, 11.182047107980363, 197.97066254450704, 850.8977070825855, 171.7534041517975, 0.0, 322.07485390277157, 76.20812023775368, 294.3676210775497, 404.5309202146149, 181.9877413891069, 367.1997631774569, 0.0, 70.45786598566868, 1

I chose vertices with labels 4067 and 261. 261 has a better centrality value because it is in the middle and has a more possibility that to be on shortest paths because of that. 4067 is one of the nodes that are far to the center and don't have many connections so it has a lower centraliy value (in fact 0).