In [1]:
# import numpy as np
# import pandas as pd
import json

from ahyper import utils, annotated_hypergraph
# from itertools import groupby
# from collections import Counter


In [2]:
with open('data/enron_hypergraph_annotated.json') as file:
    data = json.load(file)

roles = ['cc', 'from', 'to']

# add unique ID to each edge:
for i in range(len(data)):
    data[i]['eid'] = -(i+1)

In [3]:
data[0:2]

[{'cc': [],
  'date': '1998-11-13 12:07:00',
  'eid': -1,
  'from': [67],
  'to': [108]},
 {'cc': [],
  'date': '1998-11-19 15:19:00',
  'eid': -2,
  'from': [67],
  'to': [73]}]

# Construct an Annotated Hypergraph

In [4]:
A = annotated_hypergraph.annotated_hypergraph(data, roles)

First, `A` stores lists of the node and edge ids. Nodes are numbered from $0$ to $n-1$. Edges are numbered from $-1$ to $-m$. 

In [5]:
A.get_node_list()[0:10]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8, 10])

In [6]:
A.get_edge_list()[0:10]

array([-10504, -10503, -10502, -10501, -10500, -10499, -10498, -10497,
       -10496, -10495])

We can also get the node degree sequence, optionally broken down by role: 

In [7]:
# degree sequence
A.node_degrees() # get all node degrees (totals)
A.node_degrees(by_role = True)[4] # get the role-degrees of node 4 

{'cc': 0, 'from': 12, 'to': 5}

Similarly, we can get the edge dimension sequence, again optionally broken down by role: 

In [8]:
# edge dimension sequence

A.edge_dimensions() # get all edge dimensions (totals)
A.edge_dimensions(by_role = True)[-5] # get the role-dimensions of edge -5

{'cc': 2, 'from': 1, 'to': 0}

Internally, `A` is representing the data as an annotated node-edge incidence list. It's convenient to think of this as the edge-list of the bipartite graph in which each edge is labeled with a name and a role. It is possible to access the list directly: 

In [9]:
A.get_IL()[0:10]

[[67, 'from', -1, '1998-11-13 12:07:00'],
 [108, 'to', -1, '1998-11-13 12:07:00'],
 [67, 'from', -2, '1998-11-19 15:19:00'],
 [73, 'to', -2, '1998-11-19 15:19:00'],
 [73, 'cc', -3, '1998-11-19 16:24:00'],
 [67, 'from', -3, '1998-11-19 16:24:00'],
 [108, 'cc', -4, '1998-11-24 10:23:00'],
 [96, 'cc', -4, '1998-11-24 10:23:00'],
 [22, 'cc', -4, '1998-11-24 10:23:00'],
 [67, 'from', -4, '1998-11-24 10:23:00']]

# Stub-Labeled MCMC 

We can define a simple version of stub-labeled Markov Chain Monte Carlo in this space, which essentially amounts to swapping edges of the bipartite graph in such a way that edges can only be swapped if their roles agree. This MCMC algorithm preserves degree sequence and edge dimension sequence, including the `by_role` variants. 

# Check for preservation of node degrees and edge dimensions

In [10]:
d0 = A.node_degrees(by_role = True)
k0 = A.edge_dimensions(by_role = True)

In [11]:
A.stub_labeled_MCMC(n_steps = 100000)
A.get_IL()[0:10] # not the same list as above

[[67, 'from', -1, '1998-11-13 12:07:00'],
 [57, 'to', -1, '1998-11-13 12:07:00'],
 [108, 'from', -2, '1998-11-19 15:19:00'],
 [54, 'to', -2, '1998-11-19 15:19:00'],
 [53, 'cc', -3, '1998-11-19 16:24:00'],
 [114, 'from', -3, '1998-11-19 16:24:00'],
 [100, 'cc', -4, '1998-11-24 10:23:00'],
 [86, 'cc', -4, '1998-11-24 10:23:00'],
 [66, 'cc', -4, '1998-11-24 10:23:00'],
 [66, 'from', -4, '1998-11-24 10:23:00']]

In [12]:
d = A.node_degrees(by_role = True)
k = A.edge_dimensions(by_role = True)

In [13]:
d0 == d, k0 == k # but the degree and dimension sequences are preserved

(True, True)

# Output

You can read out data from `A` either as a list of dicts ("records", suitable for output as json) or as an incidence list. 

In [14]:
A.get_records()[0:2]

[{'cc': [],
  'date': '1998-11-13 12:07:00',
  'eid': -1,
  'from': [67],
  'to': [57]},
 {'cc': [],
  'date': '1998-11-19 15:19:00',
  'eid': -2,
  'from': [108],
  'to': [54]}]

In [15]:
A.get_IL()[0:4]

[[67, 'from', -1, '1998-11-13 12:07:00'],
 [57, 'to', -1, '1998-11-13 12:07:00'],
 [108, 'from', -2, '1998-11-19 15:19:00'],
 [54, 'to', -2, '1998-11-19 15:19:00']]

# Next?

Possible next steps for this software include refactoring the internals under pandas and implementation of alternative MCMC schemes, possibly including vertex-labeled ones. 