# 1.7 Network Graph Visualization

#### The purpose of this script is to use our previously generated 20th century event entity relationship document and turn it into a network graph which will showcase the relationships between counries during this period. Using centrality functions we'll be able to figure out which countries held more importance* during the 20th century (as far as the original wiki text is concerned)

#### Script is broken down into the following sections:
- 1 - Library and Dataset Importing
- 2 - Networkx network graph
- 3 - Visualizations using Pyvis & Ledien algorithm to identify possible communities within the nodes
- 4 - Centrality Charts (Degree, Closeness, Betweeness)


### Step 1: Importing Libraries & Data

In [None]:
!pip install pyvis
!pip install cdlib
!pip install leidenalg



In [None]:
import pandas as pd
import numpy as np
import networkx as nx
import os
import matplotlib.pyplot as plt
import pyvis
from pyvis.network import Network
import seaborn as sns
import cdlib

In [None]:
# Import relationship data

relationship_df = pd.read_csv('country_relationship.csv', index_col = 0)
relationship_df.head()

### Step 2: Networkx Graphs - Limited customization

In [None]:
# Create a graph from a pandas dataframe

G = nx.from_pandas_edgelist(relationship_df, 
                            source = "source", 
                            target = "target", 
                            edge_attr = "value", 
                            create_using = nx.Graph())

In [None]:
h = plt.figure(figsize = (7,7))
pos = nx.kamada_kawai_layout(G)
nx.draw(G, with_labels = True, node_color = 'red', edge_cmap = plt.cm.Blues, pos = pos)
plt.show() 

In [None]:
h.savefig('networkx_plt_countries.png')

### Step 3: Pyvis and Leiden visualizations

In [None]:
# Define net object

net = Network()
net = Network(notebook = True,width="900px", height="800px", bgcolor='#222222', font_color='white')

# Define nodes from the G object
node_degree = dict(G.degree)

# Setting up node size attributes
nx.set_node_attributes(G, node_degree, 'size')
net.from_nx(G)
net.show_buttons(filter_ = True)
net.repulsion()
net.show("Country.html")

In [None]:
from cdlib import algorithms
coms = algorithms.leiden(G)

In [None]:
type(coms)

In [None]:
new_com = coms.to_node_community_map()
new_com

In [None]:
# Put the output from new_com into a dictionary

dict_com = {k:v[0] for k,v in new_com.items()}
dict_com

In [None]:
# Rerun network graph with the communities as an argument

nx.set_node_attributes(G, dict_com, 'group')
com_net = Network(notebook = True, width="1300px", height="700px", bgcolor='#222222', font_color='white')
com_net.from_nx(G)
com_net.show_buttons(filter_ = True)
com_net.repulsion()
com_net.show("country_communities_leiden.html")

#### Interestingly the communities which the Leiden algorithm have identified are fairly successful in how they demonstrate groupings between countries by thigns like alliances and conflicts. as an example, The United states is shown to have it's own connections to specific events which it partook in, such as vietnam war & the koren wars - this community is distinct from the community of european countries which presumably were impacted by WW2 & events that took place in the USSR.

#### There are definetly some events or connections which have not been made for one reason or another, but these are probably due to how the data has been wrangled, rather than the algorithm itslef (e.g, limited connections between the US and Iran, the US and Japan etc.)

### Step 4: Defining the most important countries during the 20th Century by reviewing Centrality values

In [None]:
# Degree centrality

degree_dict = nx.degree_centrality(G)
degree_dict

In [None]:
degree_df = pd.DataFrame(degree_dict.items(), columns=['country','centrality'])
degree_df

In [None]:
degree_df.sort_values(by = ['centrality'], ascending=False, inplace = True)

In [None]:
# Plot the degree centrality

plt.figure(figsize = (10, 11))
with sns.dark_palette("xkcd:blue", 22):
    sns.barplot(x = "centrality", y = "country",
    saturation = 0.9, data = degree_df).set_title("20th Century Countries - degree centrality")

#### The output of this degree centrality measurement is inline with expectations - Based on the prior html output, we understand that Germany has by far, the largest qty of connections within the network graph. Based on common historic knowledge, Germany is a key player in 20th entury events, so it's understandable that it's positioned as one of the key figures. There data limitations which are made obvious by this output, as 'allied' forces such as the UK, & USA, fail to make an impact despite their prevelance in events.

In [None]:
# Closeness centrality

closeness_dict = nx.closeness_centrality(G)
closeness_df = pd.DataFrame(closeness_dict.items(), columns=['country','centrality'])

In [None]:
closeness_df.sort_values(by = ['centrality'], ascending=False, inplace = True)

In [None]:
plt.figure(figsize = (10,11))
with sns.dark_palette("xkcd:blue", 22):
    sns.barplot(x = "centrality", y = "country",
    saturation = 0.9, data = closeness_df).set_title("20th Century Countries - closeness centrality")

#### Closeness, tells a slightly different story to degree. Whilst there is still a prevailing consensus of Germany / Japan's influence, this output indicates that overall there isn't a clear definition of 1 country exerting significantly more sway (all values under 0.4). Whilst some countries may appear in more than 1 event during the period, I think the fact that there are multiple countries which hover around the 0.2-3 range, it's telling that the period is made up of many fragmented events which may have an overlap with a country here or there.

In [None]:
# Betweenness centrality

betweenness_dict = nx.betweenness_centrality(G)
betweennes_df = pd.DataFrame(betweenness_dict.items(), columns=['country','centrality'])

In [None]:
betweennes_df.sort_values(by = ['centrality'], ascending=False, inplace = True)

In [None]:
plt.figure(figsize = (10,11))
with sns.dark_palette("xkcd:blue", 22):
    sns.barplot(x = "centrality", y = "country",
    saturation = 0.9, data = betweennes_df).set_title("20th Century Countries - betweenness centrality")

#### Finally, Betweeness corroberates the findings of the closeness chart. Germany & Japan are still acknowledged to be the primary countries that form part of the bridges between nodes. Compared to some of the examples given in the CF readings, where the spread of data between relationships is much more concentrated, there is a lot of variance between the qty & strength of connections in this DF. As an example of what I mean, There is a section in the original network graph containing a community of African countries which appear to be somewhat isolated and distinct from other communities/assumed events. Because of how distinct some of these communities are, I believe the output in this betweeness chart aren't as drastic as initially expected. there are a large qty of countries which poses 'influence' or 'importance' (between 0.1-2, atleast as far as this DF goes) because each self-contained community seems to have it's own important players. Japan & Germany exert the most pressure (in my opinion) because of the size of their community & their placement.