We analyze the dataset 'CollegeMsg' available from the SNAP repository: http://snap.stanford.edu/data/index.html

Downloaded from here: https://snap.stanford.edu/data/CollegeMsg.html

The data consists private messages sent on an online social network at the University of California, Irvine for a time frame of 193 days.  Each row corresponds to an edge, indicating a message sent from the source node (user) to the target node (user) at a specific time.
Time is specified as Unix timestamp (seconds since the epoch). Unix timestamps represent the number of seconds that have passed since January 1, 1970 (known as the Unix epoch).

let's start by having a look to how the data look like


In [1]:
#import libraries 
import sys
import pandas as pd
import networkx as nx
import os

module_path = '/Users/mariafedericanorelli/Desktop/humannetworkscience/drafts/FromData2Graphs'
sys.path.append(os.path.abspath(module_path))
from fromData2Graphs import txt_to_csv, read_csv, normalize_graph, create_temporal_graph, save_graph

In [2]:
_DATACACHE = "~/polygraphs-cache/data"

In [3]:

#convert from txt to csv  
input_file = '/Users/mariafedericanorelli/Desktop/humannetworkscience/datasets/CollegeMsg.txt'
output_file = '/Users/mariafedericanorelli/Desktop/humannetworkscience/datasets/CollegeMsg.csv'
txt_to_csv(input_file, output_file)

Conversion completed. CSV file saved at: /Users/mariafedericanorelli/Desktop/humannetworkscience/datasets/CollegeMsg.csv


In [4]:
#read it 
messages, source, dest, timestamps = read_csv(output_file, source_column_name = 'SRC', destination_column_name= 'DST', timestamp_column_name='timestamp') 

messages.head()

Unnamed: 0,SRC,DST,timestamp
0,3,4,1082155839
1,5,2,1082414391
2,6,7,1082439619
3,8,7,1082439756
4,9,10,1082440403


preprocess the data to have to the format we need (source, target, timestamp)

In [5]:
#create the graph 
G = create_temporal_graph(source, dest, timestamps)

The temporal graph has 1899 nodes and 20295 edges


In [6]:
#normalize the graph
normalized_graph = normalize_graph(G)
print(f"The graph has {len(normalized_graph.nodes())} nodes and {len(normalized_graph.edges())} edges")

The graph has 1899 nodes and 20295 edges


In [7]:
# check for Unix timestamps 
counter = 0  # Initialize a counter to limit the output
max_edges = 4  # Set the maximum number of edges you want to print

print("Inspecting edge attributes...")

for u, v, data in normalized_graph.edges(data=True):
    print(f"Edge from {u} to {v} - Attributes: {data}")
    
    # Check if 'timestamp' is present in the edge attributes
    timestamp = data.get('timestamp')
    if timestamp is not None:
        print(f"Edge ({u}, {v}) - Timestamp: {timestamp}")
    else:
        print("No timestamp found in this edge.")
        
    counter += 1  # Increment the counter
    if counter >= max_edges:  # Stop if the counter reaches the limit
        break

Inspecting edge attributes...
Edge from 0 to 1 - Attributes: {'timestamp': 1082155839}
Edge (0, 1) - Timestamp: 1082155839
Edge from 0 to 2 - Attributes: {'timestamp': 1082624696}
Edge (0, 2) - Timestamp: 1082624696
Edge from 0 to 3 - Attributes: {'timestamp': 1088378565}
Edge (0, 3) - Timestamp: 1088378565
Edge from 0 to 4 - Attributes: {'timestamp': 1082751801}
Edge (0, 4) - Timestamp: 1082751801


In [8]:
#save graph
destination_folder = '/Users/mariafedericanorelli/Desktop/humannetworkscience/graphs'
save_graph( normalized_graph, "gml", destination_folder, 'College_Message_NG')

Graph saved as /Users/mariafedericanorelli/Desktop/humannetworkscience/graphs/College_Message_NG.gml


In [9]:
#check node ids are integers 
gml_file = '/Users/mariafedericanorelli/Desktop/humannetworkscience/graphs/College_Message_NG.gml'
Graph = nx.read_gml(gml_file)
# Function to check if all node IDs are integers
def check_node_ids_are_integers(graph):
    for node in graph.nodes():
        # Check if node ID can be cast to an integer
        try:
            int(node)
        except ValueError:
            print(f"Node ID '{node}' is not an integer.")
            return False
    print("All node IDs are integers.")
    return True

# Run the check
check_node_ids_are_integers(Graph)

All node IDs are integers.


True

In [10]:
# double check for Unix timestamps 
counter = 0  # Initialize a counter to limit the output
max_edges = 4  # Set the maximum number of edges you want to print

print("Inspecting edge attributes...")

for u, v, data in Graph.edges(data=True):
    print(f"Edge from {u} to {v} - Attributes: {data}")
    
    # Check if 'timestamp' is present in the edge attributes
    timestamp = data.get('timestamp')
    if timestamp is not None:
        print(f"Edge ({u}, {v}) - Timestamp: {timestamp}")
    else:
        print("No timestamp found in this edge.")
        
    counter += 1  # Increment the counter
    if counter >= max_edges:  # Stop if the counter reaches the limit
        break

Inspecting edge attributes...
Edge from 0 to 1 - Attributes: {'timestamp': 1082155839}
Edge (0, 1) - Timestamp: 1082155839
Edge from 0 to 2 - Attributes: {'timestamp': 1082624696}
Edge (0, 2) - Timestamp: 1082624696
Edge from 0 to 3 - Attributes: {'timestamp': 1088378565}
Edge (0, 3) - Timestamp: 1088378565
Edge from 0 to 4 - Attributes: {'timestamp': 1082751801}
Edge (0, 4) - Timestamp: 1082751801


Running the Simulation

In Polygraphs, we run simulation on graphs using a configuration file with specific parameters. To adapt the software to temporal networks, the core idea we had is to use the timestamp data to create subgraphs that represent snapshots of the network at different time intervals (e.g., weeks or months) and run simulations on these subgraphs.

The logic is to swicth subgraphs based on time steps. So, for each subgraph, the simulation will run for a specified number of steps before switching to the next subgraph. wE want temporal continuity between simulations: each interval's simulation should continue from where the previous one left off. This means that the beliefs (node states) at the end of one simulation should be the starting beliefs for the next interval's simulation.

We need to add a new parameter 'interval' to our configuration file to specify the interval for subgraph creation (e.g. 'week', 'month')




In [11]:
#check th interval creation.
from polygraphs.graphs import create_subgraphs 

create_subgraphs (Graph, 'month')

Converted NetworkX graph to DGL graph.
Total edges: 20295
Created subgraph for interval 2004-04: 1516 edges.
Graph ID for interval 2004-04: 6179508496
SubGraph ID for interval 2004-04: 6120272080
Edge (0, 1) - Timestamp: 1082155839
Edge (0, 1011) - Timestamp: 1082624696
Edge (0, 1233) - Timestamp: 1082751801
Edge (0, 1344) - Timestamp: 1082832584
Edge (0, 1677) - Timestamp: 1083040089
Created subgraph for interval 2004-05: 12208 edges.
Graph ID for interval 2004-05: 6179508496
SubGraph ID for interval 2004-05: 6280721552
Edge (0, 1566) - Timestamp: 1083900742
Edge (0, 224) - Timestamp: 1083655411
Edge (0, 446) - Timestamp: 1083808953
Edge (0, 557) - Timestamp: 1083809094
Edge (0, 668) - Timestamp: 1083901316
Created subgraph for interval 2004-06: 3259 edges.
Graph ID for interval 2004-06: 6179508496
SubGraph ID for interval 2004-06: 6280972560
Edge (0, 1122) - Timestamp: 1088378565
Edge (0, 1234) - Timestamp: 1086234548
Edge (0, 1245) - Timestamp: 1086233742
Edge (0, 1267) - Timestamp:

[(Period('2004-04', 'M'),
  Graph(num_nodes=1899, num_edges=1516,
        ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}
        edata_schemes={'timestamp': Scheme(shape=(), dtype=torch.int64)})),
 (Period('2004-05', 'M'),
  Graph(num_nodes=1899, num_edges=12208,
        ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}
        edata_schemes={'timestamp': Scheme(shape=(), dtype=torch.int64)})),
 (Period('2004-06', 'M'),
  Graph(num_nodes=1899, num_edges=3259,
        ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}
        edata_schemes={'timestamp': Scheme(shape=(), dtype=torch.int64)})),
 (Period('2004-07', 'M'),
  Graph(num_nodes=1899, num_edges=1309,
        ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}
        edata_schemes={'timestamp': Scheme(shape=(), dtype=torch.int64)})),
 (Period('2004-08', 'M'),
  Graph(num_nodes=1899, num_edges=887,
        ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}
        edata_schemes={'timesta

In [12]:
#path to configuration file
conf_path = '/Users/mariafedericanorelli/Desktop/humannetworkscience/polygraphs/configs/College_Message.yaml'

# Add the path to polygraphs module
sys.path.append('/Users/mariafedericanorelli/Desktop/humannetworkscience/polygraphs')

# test_yaml_loading.py
from polygraphs.hyperparameters import PolyGraphHyperParameters

# see is the interval is the one we want 
params = PolyGraphHyperParameters.fromYAML(conf_path)

print("Interval loaded from config:", params.interval)

Interval loaded from config: month


In [13]:
#simulation
!polygraphs -f "{conf_path}"

Timestamps added to the graph.
Total edges: 20295
Created subgraph for interval 2004-04: 1516 edges.
Graph ID for interval 2004-04: 6974342480
SubGraph ID for interval 2004-04: 7004064144
Edge (0, 1) - Timestamp: 1082155839
Edge (0, 2) - Timestamp: 1082624696
Edge (0, 4) - Timestamp: 1082751801
Edge (0, 5) - Timestamp: 1082832584
Edge (0, 8) - Timestamp: 1083040089
Created subgraph for interval 2004-05: 12208 edges.
Graph ID for interval 2004-05: 6974342480
SubGraph ID for interval 2004-05: 7077250000
Edge (0, 7) - Timestamp: 1083900742
Edge (0, 12) - Timestamp: 1083655411
Edge (0, 14) - Timestamp: 1083808953
Edge (0, 15) - Timestamp: 1083809094
Edge (0, 16) - Timestamp: 1083901316
Created subgraph for interval 2004-06: 3259 edges.
Graph ID for interval 2004-06: 6974342480
SubGraph ID for interval 2004-06: 7076689296
Edge (0, 3) - Timestamp: 1088378565
Edge (0, 40) - Timestamp: 1086234548
Edge (0, 41) - Timestamp: 1086233742
Edge (0, 43) - Timestamp: 1086291360
Edge (0, 46) - Timestamp

let's run the same simulation but on the static version of the graph, removing the timestamps

In [14]:
from fromData2Graphs import create_graph
G_static = create_graph (source, dest)

#normalize it
normalized_graph = normalize_graph(G_static)

#save graph
destination_folder = '/Users/mariafedericanorelli/Desktop/humannetworkscience/graphs'
save_graph( normalized_graph, "gml", destination_folder, 'College_Message_SG')

#simulate
conf_path_S = '/Users/mariafedericanorelli/Desktop/humannetworkscience/polygraphs/configs/College_Message_SG.yaml'
#simulate
!polygraphs -f "{conf_path_S}"

The graph has 1899 nodes and 20295 edges
Graph saved as /Users/mariafedericanorelli/Desktop/humannetworkscience/graphs/College_Message_SG.gml
No timestamps found, running as a static graph without temporal information.
Beliefs at step 1: A/B = 895/1004
[MON] Interval None: step 0001 Ksteps/s   0.00 A/B 0.47/0.53
Beliefs at step 100: A/B = 388/1511
[MON] Interval None: step 0100 Ksteps/s   0.07 A/B 0.20/0.80
Beliefs at step 200: A/B = 229/1670
[MON] Interval None: step 0200 Ksteps/s   0.07 A/B 0.12/0.88
Beliefs at step 300: A/B = 139/1760
[MON] Interval None: step 0300 Ksteps/s   0.07 A/B 0.07/0.93
 INFO polygraphs> Sim #0001:    300 steps    4.57s; action: ? undefined: 0 converged: 0 polarized: 0 
Bye.
