### Task 2: Temporal Network Analysis
* Degree (in-degree/out-degree)
* Diameter
* Dyads and Reciprocity

A temporal network is goint to be analysed which consists of 678907 vertices and 4729035 edges, where each edge has time information associated with it. Some of the edges have the same source and target vertex, but are association with different timestamps. These timestamps are going to be used in this temporal netwrok analysis to asses and evaluate the evolution of grapg properties by analyzing the changes of the devised measuredby by exploring different time intervals and scales which define time windows to be used.

### <font color="darkgreen">Imports, configuation and preprocessing</font>

In [1]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
import math
import igraph
import datetime

In [2]:
temp_nw = pd.read_table("./data/tgraph_real_wikiedithyperlinks.txt", header = None, sep = " ",
                       names = ["src", "trg", "start", "end"])

In [3]:
temp_nw.head()

Unnamed: 0,src,trg,start,end
0,1,6,1028243088,1120917090
1,1,8,1029885647,1136791625
2,1,9,1029885647,1136791625
3,1,3,1028243088,1143227562
4,1,3,1146727453,1148998304


In [5]:
temp_nw['start'] = pd.to_datetime(temp_nw['start'], unit = 's') #convert Unix timestamps to date time, utc = 0
temp_nw['end'] = pd.to_datetime(temp_nw['end'], unit = 's')

In [6]:
temp_nw.head()

Unnamed: 0,src,trg,start,end
0,1,6,2002-08-01 23:04:48,2005-07-09 13:51:30
1,1,8,2002-08-20 23:20:47,2006-01-09 07:27:05
2,1,9,2002-08-20 23:20:47,2006-01-09 07:27:05
3,1,3,2002-08-01 23:04:48,2006-03-24 19:12:42
4,1,3,2006-05-04 07:24:13,2006-05-30 14:11:44


We assume that the most changes will be tracked over course of months or years. Let's create a timestamp for every year: $T_i = $ 2002$+i$-01-01 00:00:00.

In [7]:
# Create timestamps
dTime={}
for x in range(0,11):
    year = 2002 + x
    T = datetime.datetime(year, 1, 1, 0, 0, 0)
    dTime["T{0}".format(x)]=T
dTime

{'T0': datetime.datetime(2002, 1, 1, 0, 0),
 'T1': datetime.datetime(2003, 1, 1, 0, 0),
 'T10': datetime.datetime(2012, 1, 1, 0, 0),
 'T2': datetime.datetime(2004, 1, 1, 0, 0),
 'T3': datetime.datetime(2005, 1, 1, 0, 0),
 'T4': datetime.datetime(2006, 1, 1, 0, 0),
 'T5': datetime.datetime(2007, 1, 1, 0, 0),
 'T6': datetime.datetime(2008, 1, 1, 0, 0),
 'T7': datetime.datetime(2009, 1, 1, 0, 0),
 'T8': datetime.datetime(2010, 1, 1, 0, 0),
 'T9': datetime.datetime(2011, 1, 1, 0, 0)}

Now that we have these timestamps, let's generate some graphs, $G_0, G_1, G_2, G_3, G_4, G_5, G_6, G_7, G_8, G_9$, each containing only the edges that appear between $T_i$ and $T_{i+1}$ for $G_i$.

In [9]:
# Create DataFrames for every timestamp
dfG={}
for x in range(0,10):
    T0 = "T{0}".format(x)
    T1 = "T{0}".format(x+1)
    T0 = dTime.get(T0)
    T1 = dTime.get(T1)
    df_G = temp_nw.copy()
    df_G = df_G[((df_G['start'] >= T0) & (df_G['start'] < T1)) | ((df_G['end'] > T0) & (df_G['end'] <= T1)) | 
                ((df_G['start'] < T0) & (df_G['end'] > T1))]
    dfG["df_G{0}".format(x)]=df_G
    print("df_G{0}".format(x) + ": " + str(df_G.shape))

df_G0: (56459, 4)
df_G1: (210603, 4)
df_G2: (502719, 4)
df_G3: (1115759, 4)
df_G4: (1685662, 4)
df_G5: (1758664, 4)
df_G6: (1618639, 4)
df_G7: (1331689, 4)
df_G8: (960662, 4)
df_G9: (443777, 4)


In [10]:
#Create a DataFrame with the amount of edges during every period, so we can visualize it.
lstEdges = list()
lstTime = list()
for x in range(0,10):
    T = "{0}".format(2002+x) + " - " + "{0}".format(2002+x+1) 
    lstTime.append(T)
    df = "df_G{0}".format(x); df = dfG.get(df)
    lstEdges.append(df.shape[0])
edges = pd.DataFrame({'Time': lstTime, 'Edges': lstEdges})
cols = edges.columns.tolist()
cols = cols[-1:] + cols[:-1]
edges = edges[cols]
edges

Unnamed: 0,Time,Edges
0,2002 - 2003,56459
1,2003 - 2004,210603
2,2004 - 2005,502719
3,2005 - 2006,1115759
4,2006 - 2007,1685662
5,2007 - 2008,1758664
6,2008 - 2009,1618639
7,2009 - 2010,1331689
8,2010 - 2011,960662
9,2011 - 2012,443777


In [11]:
# Initialize a directed graph G from each of the given pandas dataframes.
dG={}
for x in range(0,10):
    G = "df_G{0}".format(x)
    G = dfG.get(G)
    G = nx.from_pandas_dataframe(G, 'src', 'trg', edge_attr=None, create_using=nx.DiGraph())
    dG["G{0}".format(x)]=G

#### Directed Graph used for temporal network analysis (igraph)

Since we also need some functionalities of the $igraph$ library, we are going to make the same graphs in $igraph$.

In [32]:
for x in range(0,10):
    G = "G{0}".format(x); G = dG.get(G)
    nx.write_gml(G, 'data/gml_G{0}.gml'.format(x))

G_i0 = igraph.Graph.Read_GML('data/gml_G0.gml')
G_i1 = igraph.Graph.Read_GML('data/gml_G1.gml')
G_i2 = igraph.Graph.Read_GML('data/gml_G2.gml')
G_i3 = igraph.Graph.Read_GML('data/gml_G3.gml')
G_i4 = igraph.Graph.Read_GML('data/gml_G4.gml')
G_i5 = igraph.Graph.Read_GML('data/gml_G5.gml')
G_i6 = igraph.Graph.Read_GML('data/gml_G6.gml')
G_i7 = igraph.Graph.Read_GML('data/gml_G7.gml')
G_i8 = igraph.Graph.Read_GML('data/gml_G8.gml')
G_i9 = igraph.Graph.Read_GML('data/gml_G9.gml')

In [45]:
len(list(G_i0.vs)) == len(list(dG.get('G0'))) # just a random check

True

### <font color="darkgreen">1. Temporal Network Analysis:</font> Degree distribution

In the static network analysis we have some interesting correlations between the degree distribution and certain pecularities of (group of) nodes and the graph in general. In this section we are going to state some hypotheses based on the context of the to be analyzed temporal data based and the insights we gained in the static network analysis. Since we are considering a network of pages with the nodes representing pages while the links represent links between the pages of a certain website/domain we expect there to be notable changes in time intervals of a year. This assumption was confirmed when we looked at the changes in the introduction of this task.

### <font color="darkgreen">2. Temporal Network Analysis:</font> Diameter

### <font color="darkgreen">3. Temporal Network Analysis:</font> Dyads and Reciprocity