## Example Notebook of the Social Thermodynamics Library

### Introduction
In this notebook, we will go through the main functionality of the Social Thermodynamics Library (STDL).
The library helps to analyse graphs that evolve over time by projecting one layer onto another and subsequently calculating the distances between entities of the projected layer.

### Usage

In [6]:
#!conda install --yes --file requirements.txt

In [7]:
from STDL import STDC

In [8]:
stdc = STDC()

Unless data is passed as an argument to the class, STDC generates data randomly.

In [9]:
stdc.raw_data

Unnamed: 0,id1,id2,timestamp
0,L1_1,L2_4,2020-01-02 05:15:19.318326
1,L1_1,L2_12,2020-01-03 15:04:28.753555
2,L1_2,L2_6,2020-01-03 21:37:44.036459
3,L1_2,L2_12,2020-01-05 14:46:08.367102
4,L1_1,L2_2,2020-01-07 15:56:30.055236
...,...,...,...
995,L1_1,L2_18,2024-12-27 13:44:06.306210
996,L1_0,L2_15,2024-12-27 16:47:29.445331
997,L1_2,L2_8,2024-12-30 04:01:00.821555
998,L1_1,L2_14,2024-12-30 04:19:53.729693


We are working with 2 layers, where the first layer has 3 unique entities and the second has 20. The timestamps of the interaction between the two layers range from 2020-2025.

In [10]:
stdc.calculate_timeframe()

Unnamed: 0,id1,id2,timestamp,timeframe
0,L1_1,L2_4,2020-01-02 05:15:19.318326,2020
1,L1_1,L2_12,2020-01-03 15:04:28.753555,2020
2,L1_2,L2_6,2020-01-03 21:37:44.036459,2020
3,L1_2,L2_12,2020-01-05 14:46:08.367102,2020
4,L1_1,L2_2,2020-01-07 15:56:30.055236,2020
...,...,...,...,...
995,L1_1,L2_18,2024-12-27 13:44:06.306210,2024
996,L1_0,L2_15,2024-12-27 16:47:29.445331,2024
997,L1_2,L2_8,2024-12-30 04:01:00.821555,2024
998,L1_1,L2_14,2024-12-30 04:19:53.729693,2024


Using the function calculate_timeframe(), we allocate the timestamps to user specified bins. By default, the bins are yearly. This function is usually not called explicitely, but rather happens within the functions that follow. For demonstration purposes, it has been shown in this notebook as well.

By default the filter_always_present argument in this function is set to True. This retains the entities of layer one, that have been active in each of the timeframes.

Be aware of the function filter_always_present when initialising the STDC using time_type = 'intrinsic'.

In [11]:
stdc.calculate_biadjacency_matrix()

Unnamed: 0_level_0,id2,L2_0,L2_1,L2_10,L2_11,L2_12,L2_13,L2_14,L2_15,L2_16,L2_17,L2_18,L2_19,L2_2,L2_3,L2_4,L2_5,L2_6,L2_7,L2_8,L2_9
id1,timeframe,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
L1_0,2020,5.0,6.0,3.0,1.0,2.0,5.0,5.0,2.0,6.0,3.0,2.0,5.0,4.0,1.0,0.0,4.0,6.0,6.0,4.0,3.0
L1_0,2021,2.0,3.0,3.0,2.0,1.0,1.0,2.0,4.0,4.0,0.0,9.0,3.0,4.0,1.0,4.0,4.0,3.0,3.0,4.0,5.0
L1_0,2022,1.0,3.0,4.0,8.0,4.0,0.0,1.0,4.0,7.0,4.0,3.0,3.0,4.0,9.0,2.0,1.0,6.0,0.0,0.0,2.0
L1_0,2023,4.0,1.0,6.0,4.0,6.0,2.0,4.0,4.0,1.0,3.0,2.0,6.0,4.0,4.0,2.0,2.0,4.0,7.0,1.0,4.0
L1_0,2024,3.0,5.0,2.0,4.0,1.0,2.0,4.0,4.0,5.0,3.0,1.0,3.0,2.0,5.0,3.0,5.0,2.0,3.0,1.0,3.0
L1_1,2020,5.0,7.0,1.0,6.0,3.0,3.0,2.0,3.0,3.0,5.0,2.0,3.0,6.0,2.0,1.0,5.0,1.0,4.0,3.0,2.0
L1_1,2021,2.0,3.0,5.0,1.0,4.0,0.0,4.0,5.0,5.0,3.0,3.0,4.0,3.0,4.0,4.0,2.0,1.0,2.0,2.0,3.0
L1_1,2022,4.0,4.0,3.0,4.0,4.0,1.0,5.0,2.0,1.0,2.0,5.0,2.0,6.0,1.0,3.0,4.0,5.0,3.0,7.0,3.0
L1_1,2023,4.0,0.0,4.0,2.0,6.0,5.0,2.0,5.0,3.0,6.0,6.0,2.0,3.0,4.0,2.0,0.0,6.0,3.0,4.0,3.0
L1_1,2024,3.0,1.0,3.0,2.0,3.0,1.0,2.0,4.0,5.0,5.0,2.0,3.0,1.0,5.0,5.0,7.0,9.0,4.0,3.0,3.0


The function calculate_biadjacency_matrix() outputs a matrix of n x m, where a row is by default the count of the interactions between an entity of one layer in a certain timeframe, by all other entities in the other layer.

In [12]:
stdc.calculate_positions()

Unnamed: 0_level_0,id1,L1_0,L1_1,L1_2
id1,timeframe,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
L1_0,2020,0.0,0.159273,0.266871
L1_1,2020,0.159273,0.0,0.352768
L1_2,2020,0.266871,0.352768,0.0
L1_0,2021,0.0,0.192433,0.254844
L1_1,2021,0.192433,0.0,0.116217
L1_2,2021,0.254844,0.116217,0.0
L1_0,2022,0.0,0.358947,0.309701
L1_1,2022,0.358947,0.0,0.209311
L1_2,2022,0.309701,0.209311,0.0
L1_0,2023,0.0,0.165319,0.241803


The function calculate_positions() outputs a (n x timeframes) x m matrix which calculates the by default the cosine distances between the entities of the projected layer. By default, we calculate 'relative' comparison, which considers the distances between the entities within the same timeframe.

In [13]:
stdc_r = STDC(dimensions=2)
stdc_r.calculate_reduced_positions()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
id1,timeframe,Unnamed: 2_level_1,Unnamed: 3_level_1
L1_0,2020,-0.254798,0.068845
L1_1,2020,0.107004,-0.19621
L1_2,2020,0.177052,0.142098
L1_0,2021,-0.232295,-0.084397
L1_1,2021,0.021433,-0.114685
L1_2,2021,0.225955,0.069064
L1_0,2022,-0.188824,0.079558
L1_1,2022,0.080983,-0.16595
L1_2,2022,0.11269,0.13247
L1_0,2023,-0.206678,-0.02302


Function calculate_reduced_positions() works in the same fashion as calculate_positions() but implements dimensionality reduction, by default using PCA. In order to be able to calculate the reduced positions, the number of desired dimensions has to be specified in the initialisation.

In [14]:
stdc_r.calculate_aligned_reduced_positions()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,0,1
id1,t1,t2,Unnamed: 3_level_1,Unnamed: 4_level_1
L1_0,2020,2021,-0.243546,-0.007776
L1_0,2021,2022,-0.210559,-0.00242
L1_0,2022,2023,-0.197751,0.028269
L1_0,2023,2024,-0.176465,0.058588
L1_1,2020,2021,0.064218,-0.155448
L1_1,2021,2022,0.051208,-0.140318
L1_1,2022,2023,0.04914,-0.159134
L1_1,2023,2024,0.048232,-0.167568
L1_2,2020,2021,0.201503,0.105581
L1_2,2021,2022,0.169322,0.100767


calculate_aligned_reduced_positions() calculates the average reduced dimensionality distance between two consecutive timeframes.

In [15]:
stdc.calculate_graphs()

{'2020': <Graph object, undirected, with 3 vertices and 3 edges, 1 internal edge property, at 0x195b3ae40>,
 '2021': <Graph object, undirected, with 3 vertices and 3 edges, 1 internal edge property, at 0x19598ec10>,
 '2022': <Graph object, undirected, with 3 vertices and 3 edges, 1 internal edge property, at 0x19598f110>,
 '2023': <Graph object, undirected, with 3 vertices and 3 edges, 1 internal edge property, at 0x195b54510>,
 '2024': <Graph object, undirected, with 3 vertices and 3 edges, 1 internal edge property, at 0x195b54c30>}

STDC implements the graph-tool library to calculate graphs.

In [16]:
stdc.calculate_communities()

{'2020': <VertexPropertyMap object with value type 'int32_t', for Graph 0x195b3ae40, at 0x193f63980>,
 '2021': <VertexPropertyMap object with value type 'int32_t', for Graph 0x19598ec10, at 0x193f63d40>,
 '2022': <VertexPropertyMap object with value type 'int32_t', for Graph 0x19598f110, at 0x194beadd0>,
 '2023': <VertexPropertyMap object with value type 'int32_t', for Graph 0x195b54510, at 0x194beaa50>,
 '2024': <VertexPropertyMap object with value type 'int32_t', for Graph 0x195b54c30, at 0x195af8870>}

STDC also offers the possibility to calculate communities.
The user can choose from the:
- Stochastic Block Modelling (SBM) algorithm using graph-tool
- Leiden algorithm using iGraph

In [17]:
stdc.calculate_modularities()

Unnamed: 0,timeframe,modularity
0,2020,0.0
1,2021,0.0
2,2022,0.0
3,2023,0.0
4,2024,0.0


For calculating the modularity, the graph-tool package is being implemented once again.

In [18]:
stdc.calculate_aligned_modularities()

Unnamed: 0,t1,t2,modularity
0,2020,2021,0.0
1,2021,2022,0.0
2,2022,2023,0.0
3,2023,2024,0.0


calculate_aligned_modularities() returns the average between two modularities of two consecutive timeframes.

In [19]:
stdc.calculate_velocities()

Unnamed: 0_level_0,Unnamed: 1_level_0,id1,L1_0,L1_1,L1_2
id1,t1,t2,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
L1_0,2020,2021,0.0,0.033161,-0.012028
L1_0,2021,2022,0.0,0.166514,0.054857
L1_0,2022,2023,0.0,-0.193627,-0.067898
L1_0,2023,2024,0.0,-0.004616,-0.083844
L1_1,2020,2021,0.033161,0.0,-0.23655
L1_1,2021,2022,0.166514,0.0,0.093094
L1_1,2022,2023,-0.193627,0.0,-0.027479
L1_1,2023,2024,-0.004616,0.0,0.002857
L1_2,2020,2021,-0.012028,-0.23655,0.0
L1_2,2021,2022,0.054857,0.093094,0.0


Using the function calculate_velocities(), STDC gives the opportunity to calculate the speed between two timeframes.

### Statistics

In [20]:
stdc.calculate_basic_ts_stats()

(id1            L1_0                      L1_1                      L1_2  \
                mean       var count      mean       var count      mean   
 t1   t2                                                                   
 2020 2021  0.145570  0.017699     3  0.136782  0.014892     3  0.165117   
 2021 2022  0.185987  0.025954     3  0.146151  0.019208     3  0.148345   
 2022 2023  0.179295  0.024156     3  0.152568  0.018565     3  0.157108   
 2023 2024  0.120964  0.011314     3  0.115424  0.010095     3  0.127714   
 
 id1                        
                 var count  
 t1   t2                    
 2020 2021  0.020621     3  
 2021 2022  0.020075     3  
 2022 2023  0.020119     3  
 2023 2024  0.012302     3  ,
 id1            L1_0                      L1_1                      L1_2  \
                mean       var count      mean       var count      mean   
 t1   t2                                                                   
 2020 2021  0.007044  0.000548    

calculate_basic_ts_stats() outputs a dataframe with mean, variance and count of the entities in each two consecutive timeframes.

In [21]:
#stdc.calculate_thermodyn_ts_stats()

### Visualisations

In [22]:
#stdc.plot_center_of_mass_trajectory()

In [24]:
#stdc.plot_reduced_positions_animation()

The function plot_reduced_positions_animation() uses the reduced dimensionality positions and creates an evolution over all timeframes. To help with interpretation, the function has an option to pass as an argument a dictionary of labels in order to track specific entities in the evolution.

### Final Words
The STDL library is an efficient tool that simplifies temporal graph analysis. It serves as a foundation for further analyses in the area of network science.

### References
- T. P. Peixoto, Descriptive Vs. Inferential Community Detection in Networks: Pitfalls, Myths and Half-Truths, Elements in the Structure and Dynamics of Complex Networks (2023).
- Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.