# Stochastic Blockmodeling of Layered Graphs
**Data**: Time-stamped movements of about 200 users in eastern Germany as measured through GPS, collapsed into 50 or 100 network nodes.

**Analysis**: Detection of mobility patternsusing stochastic blockmodeling using <font face='Courier'>graph-tool</font>.
## Dependencies

In [1]:
import numpy as np
import pandas as pd
import pickle
from graph_tool.all import *

## Data Preparation
Load edgelist for network with either 50 or 100 nodes. Each directed link corresponds to a user's movement from one point in the region (network node) to another:

In [2]:
edgelist = pd.read_csv('../data/edgelist_chemnitz_dec4_distTrue_cluAgglomerative50.txt', sep='\t')
#edgelist = pd.read_csv('../data/edgelist_chemnitz_dec4_distTrue_cluAgglomerative100.txt', sep='\t')

Get the hour of the week for the beginning of each movement. Monday 0am is hour 0, Sunday 11pm is hour 167:

In [3]:
edgelist['time_begin'] = pd.to_datetime(edgelist['time_begin'])
edgelist['time_end'] = pd.to_datetime(edgelist['time_end'])

In [4]:
edgelist['hourofweek'] = edgelist['time_begin'].dt.dayofweek*24+edgelist['time_begin'].dt.hour

In [5]:
edgelist.head()

Unnamed: 0,time_begin,time_end,cluster_id_begin,cluster_id_end,type,speed,hourofweek
0,2015-11-04 10:12:28,2015-11-04 11:06:35,0,0,foot,0.64,58
1,2015-11-04 11:06:35,2015-11-04 11:06:43,0,0,car,785.12,59
2,2015-11-04 12:56:46,2015-11-04 12:57:59,0,0,foot,2.59,60
3,2015-11-04 12:57:59,2015-11-04 13:11:34,0,0,foot,1.95,60
4,2015-11-04 13:11:34,2015-11-04 13:14:24,0,0,car,22.64,61


Sample edgelist to speed up computation:

In [6]:
sample = 100
edgelist = edgelist.ix[np.random.choice(edgelist.index.values, sample)]

Subset edgelist to work with:

In [7]:
edgelist_hourofweek = edgelist[['cluster_id_begin', 'cluster_id_end', 'hourofweek']]

In [8]:
edgelist_hourofweek.head()

Unnamed: 0,cluster_id_begin,cluster_id_end,hourofweek
6030,0,0,118
11801,6,6,82
171,0,0,83
19875,17,17,109
7449,0,0,33


Above table contains multiple entries (same edge at same time). Get weighted edge list:

In [9]:
edgelist_hourofweek_weight = edgelist_hourofweek.groupby(['cluster_id_begin', 'cluster_id_end', 'hourofweek']).size().reset_index()
edgelist_hourofweek_weight.rename(columns={0:'weight'}, inplace=True)

In [10]:
edgelist_hourofweek_weight.head()

Unnamed: 0,cluster_id_begin,cluster_id_end,hourofweek,weight
0,0,0,9,1
1,0,0,11,1
2,0,0,12,2
3,0,0,16,1
4,0,0,19,1


Construct multigraph with <font face='Courier'>hourofweek</font> as edge property:

In [11]:
g = Graph(directed=True)
hourofweek = g.new_edge_property('int')
g.add_edge_list(edgelist_hourofweek.values, eprops=[hourofweek])
g.edge_properties['hourofweek'] = hourofweek

In [12]:
print(g)

<Graph object, directed, with 37 vertices and 100 edges at 0x7fc3bbe34400>


In [13]:
g.list_properties()

hourofweek     (edge)    (type: int32_t)


Construct multigraph with <font face='Courier'>hourofweek</font> and <font face='Courier'>weight</font> as edge properties:

In [14]:
g_weight = Graph(directed=True)
hourofweek = g_weight.new_edge_property('int')
weight = g_weight.new_edge_property('int')
g_weight.add_edge_list(edgelist_hourofweek_weight.values, eprops=[hourofweek, weight])
g_weight.edge_properties['hourofweek'] = hourofweek
g_weight.edge_properties['weight'] = weight

<font color='red'>Q: Is the weighted graph constructed correctly?

In [15]:
print(g_weight)

<Graph object, directed, with 37 vertices and 95 edges at 0x7fc3bbe349e8>


In [16]:
g_weight.list_properties()

hourofweek     (edge)    (type: int32_t)
weight         (edge)    (type: int32_t)


## Data Analysis
The goal is to perform an analysis as in fig.8 of "<a href='https://arxiv.org/abs/1504.02381'>Inferring the mesoscale structure of layered, edge-valued, and time-varying networks"</a> (Peixoto, 2015). The idea is to use the hour of the week as the time of weighted edges and detect the change points in aggregate mobility.
### Are layers informative?
To test this, we (a) infer the nested blockmodel for a weighted graph with collapsed edge time, (b) infer the nested blockmodel for a weighted graph with edge time as layers, and (c) compare the posterior odds ratio. Degrees are assumed to be correlated.
#### (a) Nested blockmodel for weighted graph with collapsed edge time

In [17]:
state_nested_covariates = minimize_nested_blockmodel_dl(g_weight, layers=True, state_args=dict(ec=g_weight.ep.hourofweek, layers=False, recs=[g_weight.ep.weight], rec_types=['discrete-poisson']), deg_corr=True)
#pickle.dump(state_nested_covariates, open('state_nested_covariates.pickle', 'wb'))

In [18]:
#state_nested_covariates = pickle.load(open('state_nested_covariates.pickle', 'rb'))

<font color='red'>Q: To use both edge properties (hourofday and weight), can the module be called like that?
<font color='red'>Q: Are the covariate types correct ('discrete-poisson')?
<font color='red'>Q: I'm confused about the two layers parameters. Is it that the first one specifies that observed data has edge weights and the second (state_arg) specifies if that information is collapsed?
A: "Note the different meanings of the two 'layers' parameters below: The first enables the use of LayeredBlockState, and the second selects the 'edge layers' version (instead of 'edge covariates')." (https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#layered-networks)

#### (b) Nested blockmodel for weighted graph with edge time as layers


In [19]:
state_nested_layers = minimize_nested_blockmodel_dl(g_weight, layers=True, state_args=dict(ec=g_weight.ep.hourofweek, layers=True, recs=[g_weight.ep.weight], rec_types=['discrete-poisson']), deg_corr=True)
#pickle.dump(state_nested_layers, open('state_nested_layers.pickle', 'wb'))

In [20]:
#state_nested_layers = pickle.load(open('state_nested_layers.pickle', 'rb'))

#### (c) Model selection

In [21]:
state_nested_covariates.entropy()-state_nested_layers.entropy()

-1307.7894563209875

<font color='red'>Q: Is this done correctly?
<font color='red'>Q: If the result is negative, then layers are not informative?

### Change Point Extraction

<font color='red'>Q: How can I extract the bin / change point information from state_nested_layers?

### Overlapping Blocks

<font color='red'>Q: Having read Peixoto (2015), It's still not clear to me what the overlap option does or how it's called. For what purpose and how could it be integrated in this analysis?