# Game Of Thrones Network Analysis
> "This section will generate and investigate the Game Of Thrones Network based on the characters and their interactions."

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [network, characters]
- hide: true
- search_exclude: false
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2

In [1]:
#hide
import networkx as nx
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import plotly.express as px
import json
import requests
import os
import contextlib
import powerlaw
import plotly.graph_objects as go


import nltk
from community import community_louvain
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
nltk.download('punkt')
nltk.download('wordnet')
G = nx.read_gpickle("/work/got_G.gpickle")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


<h1 align="center">How is the network generated?</h1>  

This section will explain the methodology to produce the network of Game Of Thrones characters. This section will be divided into two parts, as the first part generates the network across all seasons and the second part will explain how the network is generated for each season. 

<h3 align="center"> Part 1 </h3>

The character network describing interactions between the characters across all seasons will be described in this part. To accomplish this, the [Game Of Thrones Wiki-page](https://gameofthrones.fandom.com/wiki/Game_of_Thrones_Wiki) is scraped. The site contains a section for each season describing all the characters present in each season. By iterating through all these pages on the wiki-page it was possible to extract characters by the use of regular expressions. Having the list of each characters present across all seasons, the next task would be to extract information about each character, and find the links between all characters. 

In the next part we, again, utilize the Wiki-page of each character, again using the [Game Of Thrones Wiki-page](https://gameofthrones.fandom.com/wiki/Game_of_Thrones_Wiki) API to extract information of each character. Through the use of regular expressions it was possible to extract information about the allegiance, culture, religion, appearances and status of each character as this is listed in a fact-box on each character page (though some pages did not contain this information). Further, it was possible to find links between characters, as each page is linked to other characters that they somehow interact with. 

As we wanted to put emphasis on how strong the link was between individual characters, as some might only be connected once and therefore not have a very large impact, we also saved the frequency of each unique connection between characters. 

The network information had now been collected so each character have assigned the following attributes were available: 

- Allegiance 
- Culture
- Religion
- Status (dead, alive, uncertain)
- Appearances 
- Which other characters it is linked to
- Frequency of link

We are now ready to generate the visualizations of the network, and this will be presented a little later in this section. 

<h3 align="center"> Part 2 </h3>

This part will explain how the networks were generated for each season, which would make it possible to investigate how the interactions between houses, cultures, characters etc. evovle through the series. 
Similar to the previous part we utilize the Wiki-page API to iterate through each season and extract the characters present in each season. 

When extracting information for each character present in each season we did not extract the whole character page but only the part describing their presence in each season. Each character page contains a fact-box, a short introduction, and then a couple of sections describing their presence in each season (if present in multiple seasons). This could be utilized to extract interactions for each season. 

We,again, include information about allegiance, culture and religion, which we extract from the network generated in the previous part. Again the frequency of each link is included. 

In [2]:
#hide
print("Number of edges ", G.number_of_edges())
print("Number of nodes ", G.number_of_nodes())

Number of edges  3085
Number of nodes  162


<h1 align="center">Degree distribution </h1>  

The network across all season contains 162 nodes ie. characters which is linked by 3085 edges. Having generated the network we are going to investigate some of the properties of the network.

We are starting out by examining the degree distribution of the network, as the network is a directed network this would include both the in degree and out degree distribution. The in-degree is the number of inward edges from a node to the given node and vice versa for the out-degree. The degree describes the number of edges, and can contain information describing the characters connectiveness in the network. 

In [None]:
#hide_input
in_degrees = [G.in_degree(n) for n in G.nodes()]
out_degrees = [G.out_degree(n) for n in G.nodes()]

df = pd.DataFrame({
    "Degree" : np.concatenate((in_degrees,out_degrees)),
    "Degree type" : np.concatenate((["In-degree"]*len(in_degrees), ["Out-degree"]*len(out_degrees))),
})

#Compute in- and out degree of nodes having role == Ally
in_degrees = [[n,G.in_degree(n)] for n, d in G.nodes(data = True)]
out_degrees= [[n,G.out_degree(n)] for n, d in G.nodes(data = True)]

#Sort and select only top 5 most connected:
top5_in_degrees = sorted(in_degrees, key = lambda i: i[1], reverse = True)
top5_out_degrees= sorted(out_degrees, key = lambda i: i[1], reverse = True)

fig = px.histogram(df, x="Degree",color = 'Degree type', marginal="box", title = "Degree distribution")
fig.show()

From the figure above it can be seen that the in-degree and out-degree distribution are very similar, and appears to come from the same distribution. It should be noted from both the histogram and the boxplot that the in-degree distribution has more extreme point but at the same time has a lower median. Further it can be seen that the out-degree (ie. outgoing links) appear to have a higher peak around 5-9 degree compared to in-degree. It should also be noticed that the network is very dense, and all characters have at least a couple of other characters they interact with. From the degree distribution we can find the top-5 most connected characters based on in- and out-degree which is presented in the tables below. 

In [None]:
#hide_input
fig = go.Figure(data=[go.Table(header=dict(values=['<b>Character name</b>', '<b>In degree</b>','<b>Character name</b>', '<b>Out degree</b>']),
                 cells=dict(values=[
                     [name[0].replace('_',' ') for name in top5_in_degrees], 
                     [name[1] for name in top5_in_degrees],
                     [name[0].replace('_', ' ') for name in top5_out_degrees], 
                     [name[1] for name in top5_out_degrees]]))
                     ])
fig.update_layout(
    height=310,
    showlegend=False,
    title_text = "Most connected characters based on in- and out-degree"
)

fig.show()

From the table above it appears that *Jon Snow* is the most connected character based on in-degree. Also it should be noted that all five characters are main characters in Game Of Thrones, and therefore it would make sense that they are well connected in the network. Further, all but *Eddard Stark* are characters that are appearing in most episodes (see [Basic Statistics](https://mikkelmathiasen23.github.io/GameOfThrones_Network/02_Basic_Statistics/)). 

*Eddark Stark* dies quite early in the series, and it might be a surprise that he is one of the most connected characters, but as this is based on in-degree this could be due to many of the others characters pages references Eddard. This would make sense as his children probably talks about him/mentions him and therefore make him very connected compared to many other characters. 


Based on out-degree again *Jon Snow* is the most connected character, but we see that *Eddard Stark*, *Daeneras Targaryen* and *Cersei Lannister* are replaced by *Sansa Stark, Arya Stark* and *Jamie Lannister* which also are very well connected characters and also appears as main characters in the series. 

<h1 align="center">Network graph</h1>  

The network generated are now going to be presented in the interactive figure below. The network contains multiple settings for the user to choose. Firstly, it is possible to choose whether the network should be based on all text based from the character pages ie. all season, or whether the user wants to inspect the network based on a separate season. Further, different overlays can be chosen based on the character attributes namely: religion, allegiance or culture. 

When selecting a character one will be presented with an image of the character, and a short description of the character attributes, a link to the wiki-page and the most frequent used words by the character based on TF-IDF which will be explained in the [Text Analysis section](https://mikkelmathiasen23.github.io/GameOfThrones_Network/textanalysis/).

<iframe src="https://gameofthronesnetworknx.herokuapp.com/" height = "1200" width = "1000"> </iframe>

<h1 align="center">Attribute relationships</h1>  

Next we are going to investigate how the attributes relate and interact with each other, also it could be that some attributes eg. religion is more well connected than others. We will start out by examining the allegiance attribute. 

In [None]:
#hide_input
def most_connected_on_attribute(att, n, restrict = False):
    attribute = nx.get_node_attributes(G,att)

    # Get connection between houses and also find most connected houses: 
    connection_dict = {}
    house_connection = {}
    for ef, et in list(G.edges()):
        a_from = attribute[ef]
        a_to = attribute[et]
        
        if a_from == "": continue
        if a_to == "" : continue
        if "No known" in a_from: continue
        if "No known" in a_to: continue
        if a_from in connection_dict.keys():
            house_connection[a_from] += 1
            if a_to in connection_dict[a_from]:
                connection_dict[a_from][a_to] += 1
            else:
                connection_dict[a_from][a_to] = 1
        else:
            house_connection[a_from] = 1
            connection_dict[a_from] = {a_to : 1}
    most_connected_houses = sorted(house_connection, key = lambda i: house_connection[i],reverse = True)[:n]
    tmp = [[house, house_connection[house]] for house in most_connected_houses]
    df_count = pd.DataFrame(tmp, columns = [att.capitalize(), 'Count'])
    fig_count = px.bar(df_count, x = att.capitalize(), y = 'Count', title = "Most connected "+ att)
    
    df_connection = pd.DataFrame.from_dict(connection_dict)
    df_connection = df_connection.fillna(0)

    if restrict:
        df_connection = df_connection.loc[most_connected_houses,most_connected_houses]
        
    fig_heatmap = px.imshow(df_connection, title = "Heatmap of connection between " + str(att)+"s")

    return fig_count, df_count, fig_heatmap, df_connection
    
fig_count, df_count, fig_heatmap, df_connection = most_connected_on_attribute("allegiance",8 ,True)
fig_heatmap.show()

The figure above shows the top 10 most connected allegiances in Game Of Thrones, and it can clearly be seen that *House Stark* is the most connected allegiance, but a lot of the connectivity derives from the interaction with their own allegiance. Further, it can be seen that *House Stark* is well connected with *House Lannister*, *Night's Watch* and *House Bolton*. The connection with *House Lannister* and *Night's Watch* can easily be explained by eg. *Ned Starks'* work as the Kings Hand but also *Sansa Stark* being married with Joffrey. Further, *Jon Snow* from the *House Stark* allegiance are becoming part of *Night's Watch* can explain this interaction.

Generally, it can also be seen that the allegiances interacts with it-self most, compared to interaction with other allegiances. 

Next, we will look into how the religions interact, and from the figure below, it can clearly be seen that the two main religions are the religions that mainly interact with each other, which does not come as a surprise. 


In [None]:
#hide_input
fig_count, df_count, fig_heatmap, df_connection = most_connected_on_attribute("religion",5 ,True)
fig_heatmap.show()

In the figure below, the connection between cultures are investigated. Here it can be seen that the *Andals* are the most connected culture, followed by *Northmen*. Furthermore, it can be seen that these two cultures interact alot. 

It can be seen that the third most connected culture is *Valyrians* which does not mainly interact with themselves, but instead are most connected to *Andals*. 

In [None]:
#hide_input
fig_count, df_count, fig_heatmap, df_connection = most_connected_on_attribute("culture",3 ,True)
fig_heatmap.show()

<h1 align="center">Network assortivity and centrality</h1>  

In this section we are going to investigate some of the network properties of the character network across all seasons. In order to further investigate network properties we are going to compute assortivity and centrality of the network. 

From the table below it can be seen that religion has the highest assortivity score, which would indicate that this attribute is the best to distinquish the characters from each other. It should be noted that none of the scores are very high indicating, that the characters are linked in a more complex pattern, or based on another attribute. 


In [None]:
#hide_input
A, assor = [], []
attributes = ["religion", "appearances", "culture", "allegiance"]
for attribute in attributes: 
    A.append(attribute.capitalize())
    assor.append(np.round(nx.attribute_assortativity_coefficient(G, attribute),3))

fig = go.Figure(data=[go.Table(header=dict(values=['<b>Attribute</b>', '<b>Assortivity</b>']),
                 cells=dict(values=[A, assor]))
                     ])
fig.update_layout(
        height=300,
        showlegend=False,
        title_text = "Assortivity score of each character attribute"
    )
fig.show()

We are further going to investigate the closeness centrality of each node in the network, which measures the reciprocal sum of shortests paths from the given node to all other nodes. If a node therefore has a high closeness centrality score this means that the node is close to the rest and vice versa. This could give us an indication of well connected characters and further important characters. 

In the table below the closeness centrality score is computed for all characters and sorted in descending order. As expected it can be seen that *Jon Snow, Daenerys Targaryen* and *Tyrion Lannister* are some of the characters close to the others. Further, *Stannis Baratheon* has a high centrality score, which would make sense as he is involved both as an heir to the throne, but also his involment with the red priest *Mellisandre*. 

Further, *Gregor Clegane, Eddard Stark, Bronn* and the *Night King* has a high centrality score, which does not come as a surprise as these are key characters in the story, and interacts with many characters. 

In [None]:
#hide_input
sort_cent = sorted(nx.algorithms.centrality.closeness_centrality(G).items(), key=lambda kv: kv[1], reverse = True)
fig = go.Figure(data=[go.Table(header=dict(values=['<b>Character</b>', '<b>Centrality score (closeness)</b>']),
                 cells=dict(values=[
                     [name[0].replace('_', ' ') for name in sort_cent],
                      [np.round(name[1],3) for name in sort_cent]
                      ]))
                     ])
fig.update_layout(
        height=310,
        showlegend=False,
        title_text = "Centralitiy score of each character"
    )
fig.show()

<h1 align="center">Most connected characters</h1>  

In this section we are going to investigate which characters are the most connected characters in each season utilizing the networks for each season. This will be done based on in- and out-degree as in one of the earlier sections in this part of the website. 



In [None]:
#hide
tables = {}
for i in range(1,9):
    G = nx.read_gpickle("/work/got_G_s"+str(i)+".gpickle")
    centrality = nx.closeness_centrality(G)
    
    att = [[n.replace('_',' '),G.in_degree(n), G.out_degree(n), np.round(centrality[n],3)] for n, d in G.nodes(data = True)]

    df = pd.DataFrame(att)
    df.columns = ['Character name', 'In degree', 'Out degree', 'Closeness centrality']
    tables[i] = df
import pickle

with open('tables.pickle', 'wb') as handle:
    pickle.dump(tables, handle, protocol=pickle.HIGHEST_PROTOCOL)

In the interactive figure below it is possible to investigate in-, out-degrees and closeness centrality score for each character in each season. This makes it possible to investigate how connected the characters are in each season, and this might give an indication of which characters that are most important in each season. 

It is possible to search for specific characters, and sort the values. This will be done in both the table and the bar-plots below. Furthermore, attributes and characters can be deleted if needed. 

<iframe src="https://gameofthronestables.herokuapp.com/" width = "1000" height = "1000"> </iframe>

From the table it can be seen that the most connected and maybe important characters in season 1 might be *Eddard Stark* and *Robert Baratheon*, which makes perfect sense as Eddard is present a lot in season 1, both in Winterfell but also when he becomes the Kings hand. Robert is also very central in season 1, as he rules as King and dies when hunting - *killed by a pig*, as he says it. 

Season 2, here *Joffrey Baratheon*, the new king, rules, and *Robb Stark* goes to war as he wants revenge for his fathers execution in season 1. 

Later, in season 7 *Jon Snow*  and *Daenerys Targaryen* becomes key players as the winter is approaching and the focus moves from Kings Landing to the wall and the fight against the dead. 

<h1 align="center">Subconclusion</h1>
  
In this section we have investigated the Game Of Thrones character network both across the full series but also in each season. We have found the most connected (and maybe most important) character across the full story but also in each season. This was done by investigating the properties of the network which both include the centralitiy and degree of each character. 

Further, by investigating the character attributes we could dive into how these interacts with each other, and this did reveal patterns in which religions, allegiances and cultures that mainly interact. 

Lastly, it was discovered through computation of assortivity of each attribute that these standing alone did not describe how the network was organized and that other methods would be needed. Later on we are going to try to detect communities in the network without using these attributes alone. 