##### Let's change gears and talk about Game of thrones or shall I say Network of Thrones.

It is suprising right? What is the relationship between a fatansy TV show/novel and network science or python(it's not related to a dragon).

![](images/got.png)

Andrew J. Beveridge, an associate professor of mathematics at Macalester College, and Jie Shan, an undergraduate created a network from the book A Storm of Swords by extracting relationships between characters to find out the most important characters in the book(or GoT).

The dataset is publicly avaiable for the 5 books at https://github.com/mathbeveridge/asoiaf. This is an interaction network and were created by connecting two characters whenever their names (or nicknames) appeared within 15 words of one another in one of the books. The edge weight corresponds to the number of interactions. 

Credits:

Blog: https://networkofthrones.wordpress.com

Math Horizons Article: https://www.maa.org/sites/default/files/pdf/Mathhorizons/NetworkofThrones%20%281%29.pdf

In [None]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import community
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

In [None]:
book1 = pd.read_csv('data/asoiaf-book1-edges.csv')
book2 = pd.read_csv('data/asoiaf-book2-edges.csv')
book3 = pd.read_csv('data/asoiaf-book3-edges.csv')
book4 = pd.read_csv('data/asoiaf-book4-edges.csv')
book5 = pd.read_csv('data/asoiaf-book5-edges.csv')

In [None]:
G_book1 = nx.Graph()
G_book2 = nx.Graph()
G_book3 = nx.Graph()
G_book4 = nx.Graph()
G_book5 = nx.Graph()

In [None]:
for row in book1.iterrows():
    G_book1.add_edge(row[1]['Source'], row[1]['Target'], weight=row[1]['weight'], book=row[1]['book'])
for row in book2.iterrows():
    G_book2.add_edge(row[1]['Source'], row[1]['Target'], weight=row[1]['weight'], book=row[1]['book'])
for row in book3.iterrows():
    G_book3.add_edge(row[1]['Source'], row[1]['Target'], weight=row[1]['weight'], book=row[1]['book'])
for row in book4.iterrows():
    G_book4.add_edge(row[1]['Source'], row[1]['Target'], weight=row[1]['weight'], book=row[1]['book'])
for row in book5.iterrows():
    G_book5.add_edge(row[1]['Source'], row[1]['Target'], weight=row[1]['weight'], book=row[1]['book'])

In [None]:
# Have a look at the edges of book 1 with data parameter True


### Finding the most important node i.e character in these networks.

We'll compare different centralities to find the importance of nodes in this network. There is no one right way of calaculating it, every approach has a different meaning. Let's start with degree centrality which is defined by degree of a node divided by a noramlising factor n-1 where n is the number of nodes.

In [None]:
# Use G.neighbors(node) to look at the neighbors of Jaime-Lannister


#### nx.degree_centrality(graph) returns a dictionary where keys are the nodes and values are the corresponsing degree centrality. Let's find the five most important character according to degree centrality.

In [None]:
sorted(nx.degree_centrality(G_book1).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
# Plot a histogram of degree centrality


### Exercise

Create a new centrality measure, weighted_degree_centrality(Graph, weight) which takes in Graph and the weight attribute and returns a weighted degree centrality dictionary. Weighted degree is calculated by summing the weight of the all edges of a node and normalise(divide) the weighted degree by the total weight of the graph(sum of weighted degrees of all nodes) and find the top five characters according to this measure.

In [None]:
def weighted_degree_centrality(G, weight):
    pass

##### Betweeness centrality 
From Wikipedia:
For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.


In [None]:
# unweighted betweenness_centrality

sorted(nx.betweenness_centrality(G_book1).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
sorted(nx.betweenness_centrality(G_book1, weight='weight').items(), key=lambda x:x[1], reverse=True)[0:10]

#### PageRank
The billion dollar algorithm, PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

In [None]:
# by default weight attribute in pagerank is weight, so we use weight=None to find the unweighted results

sorted(nx.pagerank(G_book1, weight=None).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
# find the weighted pagerank of this graph



### Is there a correlation between these techniques?

#### Exercise

Find the correlation between these three techniques.

In [None]:
cor = pd.DataFrame.from_records([nx.pagerank_numpy(G_book1, weight='weight'), nx.betweenness_centrality(G_book1, weight='weight'), weighted_degree_centrality(G_book1, 'weight')])

In [None]:
cor

#### What can we infer from this correlation matrix between these three methods?

In [None]:
cor.T.corr()

Till now we have been analysing only the first book, but what about the other 4 books? We can now look at the evolution of this character interaction network that adds temporality to this network.

In [None]:
evol = [weighted_degree_centrality(graph, 'weight') for graph in [G_book1, G_book2, G_book3, G_book4, G_book5]]

In [None]:
# create a dataframe from evol and fill N/A entries with 0
evol_df = pd.DataFrame.from_records(evol).fillna(0)
evol_df

In [None]:
pd.DataFrame.from_records(evol).max(axis=0).sort_values(ascending=False)[0:10]

##### Exercise

Plot the evolution of weighted degree centrality of the above mentioned characters over the 5 books, and repeat the same exercise for betweenness centrality.

Where is Stannis Baratheon in degree centrality measure? Not even in top 10. Strange?

#### Communitty detection in Networks
A network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally.

We will use louvain community detection algorithm to find the modules in our graph.

In [None]:
partition = community.best_partition(G_book1)
size = float(len(set(partition.values())))
pos = nx.spring_layout(G_book1)
count = 0.
for com in set(partition.values()) :
    count = count + 1.
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == com]
    nx.draw_networkx_nodes(G_book1, pos, list_nodes, node_size = 20,
                                node_color = str(count / size))


nx.draw_networkx_edges(G_book1, pos, alpha=0.5)
plt.show()

In [None]:
d = {}
for character, par in partition.items():
    if par in d:
        d[par].append(character)
    else:
        d[par] = [character]
d

In [None]:
nx.draw(nx.subgraph(G_book1, d[1]))

In [None]:
nx.density(G_book1)

In [None]:
nx.density(nx.subgraph(G_book1, d[1]))

In [None]:
nx.density(nx.subgraph(G_book1, d[1]))/nx.density(G_book1)

#### Exercise 

Find the most important node in the partitions according to pagerank, degree centrality and betweenness centrality of the nodes.