# Remarks:

- Save your adjacency with an evident name in the sub folder "<b><i>./Data/Adjacencies/</i></b>" and update list next cell.
- Only overwrite these if you're sure of what you're bringing
- If you change Imports, make sure the rest works
- Please check your results

### Adjacencies available:
- **adjacency_hyperlinks**: constructed with every category and links based on hyperlinks. **This is directed and it is normal!** If you need it otherwise, symmetrise it and save it in another csv. 
- ...

# Imports:

In [None]:
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import copy

from pygsp import graphs, filters
from scipy import sparse

%matplotlib inline

# Part I: Structure of the Graph with Hyperlink Connections between sites

## Import Data

In [None]:
Nodes_Linked = pd.read_csv("./Data/Nodes_Linked.csv", sep='\t', encoding= 'utf-16')
All_Nodes = pd.read_csv("./Data/All_Nodes.csv", sep='\t', encoding= 'utf-16')


## Make Adjacency (great again)

In [None]:
All_Nodes.reset_index(level=0, inplace=True)
All_Nodes = All_Nodes.rename(columns={'index':'node_idx'})


# Create a conversion table from name to node index.
name2idx = All_Nodes[['node_idx', 'Node']]
name2idx = name2idx.set_index('Node')

Nodes_Linked = Nodes_Linked.join(name2idx, on='Nodes')
Nodes_Linked = Nodes_Linked.join(name2idx, on='Links', rsuffix='_target')
Nodes_Linked_Full = Nodes_Linked.copy(deep=True)
Nodes_Linked = Nodes_Linked.drop(columns=['Nodes', 'Links', 'Node_Category'])
Nodes_Linked['node_idx'] = Nodes_Linked['node_idx'].astype(int)
Nodes_Linked['node_idx_target'] = Nodes_Linked['node_idx_target'].astype(int)

<b>Check if any value is Nan!</b>

In [None]:
Nodes_Linked.isnull().any().any()

<b>Great! Now build the Adjacency Matrix</b>:

In [None]:
n_nodes = len(All_Nodes)
print("Number of nodes ", n_nodes)
adjacency = np.zeros((n_nodes, n_nodes), dtype=int)
for idx, row in Nodes_Linked.iterrows():
    if np.isnan(row.node_idx_target):
        continue
    i, j = int(row.node_idx), int(row.node_idx_target)
    adjacency[i, j] = 1

<b>Set Diagonal to 0</b>:

In [None]:
Sum = 0
for i in range(n_nodes):
    Sum += adjacency[i,i]
    adjacency[i,i] = 0
print("Sum of values on the diagonal was " +str(Sum)+". Now it's 0.")

**Display:**

In [None]:
fig = plt.figure(figsize = (15,8))
ax1 = fig.add_subplot(1,2,1)
ax1.spy(adjacency, markersize=1)
ax1.set_title('Adjacency Matrix')
ax2= fig.add_subplot(1,2,2)
ax2.spy(adjacency[700:, 700:], markersize=1)
ax2.set_title('Adjacency Matrix Zoomed on [700:,700:]')

plt.show()

print("Diagonal on the left? Example adjacency(760, 792) = " + str(adjacency[760, 792]) +\
      ". Corresponds to link (" + str(All_Nodes.iloc[760,1])+"," + str(All_Nodes.iloc[792,1])+").")

We can clearly observe that the first 732 entries are players, connecting to about anything. They are then followed by the 32 countries taking part in the world cup, only connecting themselves and their respective national teams (though some other teams may appear in their sport history due to some notable event. This is the case for Iceland for example, as can be seen below). Finally, the national teams connect to everyone (and themselves heavily, since the history of matches maps this).

In [None]:
Nodes_Linked_Full.iloc[[24642, 24643, 24651], :]

## Save the Adjacency

In [None]:
if (0):
    df_adjacency = pd.DataFrame(adjacency)
    df_adjacency.to_csv('./Data/Adjacencies/adjacency_hyperlinks.csv')

## Let's make the matrix sparse and display the graph associated

Check that it is indeed connected (not isolated component). 

In [None]:
adjacency_sparsed = sparse.csr_matrix(adjacency)

In [None]:
G = graphs.Graph(adjacency_sparsed)
print('{} nodes, {} edges'.format(G.N, G.Ne))

print('Connected: {}'.format(G.is_connected()))
print('Directed: {}'.format(G.is_directed()))
fig = plt.figure(figsize = (15,8))
plt.hist(G.d)
plt.title('Degree Distribution of the Graph')
plt.xlabel('Degree Value')
plt.ylabel('Number of node in that range')
plt.show()

print("Maximum of " +str(G.d[783])+ " corresponds to " + str(All_Nodes.iloc[783, 1]) +".")

So we do have a connected and directed graph. Note that the average number of connection is quite high (27532/800 = 34.42)! The extremum of 122 corresponds to the Croatian national football team (https://en.wikipedia.org/wiki/Croatia_national_football_team) which has an amazingly complete page.

In [None]:
fig = plt.figure(figsize=(10, 10))
G.set_coordinates()
G.plot()

plt.show()