## Example Notebook of the Social Thermodynamics Library

### Introduction
In this notebook, we will go through the main functionality of the Social Thermodynamics Library (STDL).
The library helps to analyse graphs that evolve over time by projecting one layer onto another and subsequently calculating the distances between entities of the projected layer.

### Usage

In [5]:
#!conda install --yes --file requirements.txt

In [1]:
from STDL import STDC

In [2]:
stdc = STDC()

Unless data is passed as an argument to the class, STDC generates data randomly.

In [3]:
stdc.raw_data

Unnamed: 0,id1,id2,timestamp
0,L1_1,L2_4,2020-01-01 23:03:57.094705
1,L1_2,L2_6,2020-01-07 01:11:33.244152
2,L1_2,L2_11,2020-01-08 12:50:44.423282
3,L1_1,L2_4,2020-01-10 02:40:00.663417
4,L1_1,L2_8,2020-01-10 18:50:06.493508
...,...,...,...
995,L1_2,L2_7,2024-12-19 21:43:28.636094
996,L1_0,L2_5,2024-12-28 12:10:53.474220
997,L1_2,L2_17,2024-12-29 07:19:30.327448
998,L1_2,L2_13,2024-12-30 14:03:48.482698


We are working with 2 layers, where the first layer has 3 unique entities and the second has 20. The timestamps of the interaction between the two layers range from 2020-2025.

In [4]:
stdc.calculate_timeframe()

Unnamed: 0,id1,id2,timestamp,timeframe
0,L1_1,L2_4,2020-01-01 23:03:57.094705,2020
1,L1_2,L2_6,2020-01-07 01:11:33.244152,2020
2,L1_2,L2_11,2020-01-08 12:50:44.423282,2020
3,L1_1,L2_4,2020-01-10 02:40:00.663417,2020
4,L1_1,L2_8,2020-01-10 18:50:06.493508,2020
...,...,...,...,...
995,L1_2,L2_7,2024-12-19 21:43:28.636094,2024
996,L1_0,L2_5,2024-12-28 12:10:53.474220,2024
997,L1_2,L2_17,2024-12-29 07:19:30.327448,2024
998,L1_2,L2_13,2024-12-30 14:03:48.482698,2024


Using the function calculate_timeframe(), we allocate the timestamps to user specified bins. By default, the bins are yearly. This function is usually not called explicitely, but rather happens within the functions that follow. For demonstration purposes, it has been shown in this notebook as well.

By default the filter_always_present argument in this function is set to True. This retains the entities of layer one, that have been active in each of the timeframes.

Be aware of the function filter_always_present when initialising the STDC using time_type = 'intrinsic'.

In [5]:
stdc.calculate_biadjacency_matrix()

Unnamed: 0_level_0,id2,L2_0,L2_1,L2_10,L2_11,L2_12,L2_13,L2_14,L2_15,L2_16,L2_17,L2_18,L2_19,L2_2,L2_3,L2_4,L2_5,L2_6,L2_7,L2_8,L2_9
id1,timeframe,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
L1_0,2020,2.0,5.0,4.0,3.0,2.0,4.0,3.0,4.0,3.0,6.0,2.0,4.0,2.0,5.0,4.0,1.0,5.0,6.0,2.0,5.0
L1_0,2021,1.0,1.0,2.0,3.0,2.0,3.0,0.0,2.0,3.0,6.0,4.0,4.0,6.0,5.0,3.0,0.0,4.0,4.0,1.0,8.0
L1_0,2022,2.0,3.0,3.0,7.0,3.0,3.0,2.0,1.0,1.0,2.0,6.0,2.0,2.0,3.0,4.0,4.0,2.0,1.0,5.0,4.0
L1_0,2023,2.0,3.0,4.0,7.0,0.0,3.0,1.0,6.0,3.0,1.0,0.0,6.0,3.0,2.0,2.0,1.0,4.0,1.0,5.0,6.0
L1_0,2024,2.0,8.0,6.0,0.0,5.0,2.0,2.0,4.0,3.0,3.0,3.0,3.0,3.0,4.0,4.0,3.0,2.0,3.0,3.0,3.0
L1_1,2020,1.0,3.0,5.0,2.0,2.0,4.0,2.0,1.0,6.0,6.0,3.0,3.0,1.0,5.0,5.0,3.0,4.0,4.0,9.0,4.0
L1_1,2021,3.0,2.0,3.0,6.0,2.0,4.0,5.0,4.0,4.0,2.0,3.0,3.0,1.0,6.0,4.0,0.0,5.0,3.0,4.0,5.0
L1_1,2022,1.0,2.0,2.0,5.0,7.0,1.0,3.0,4.0,4.0,5.0,3.0,5.0,3.0,4.0,4.0,3.0,4.0,2.0,4.0,3.0
L1_1,2023,7.0,6.0,3.0,2.0,0.0,3.0,5.0,3.0,5.0,5.0,1.0,2.0,2.0,1.0,5.0,4.0,3.0,6.0,2.0,0.0
L1_1,2024,5.0,3.0,4.0,6.0,4.0,1.0,6.0,1.0,1.0,4.0,4.0,5.0,3.0,2.0,2.0,1.0,2.0,0.0,4.0,4.0


The function calculate_biadjacency_matrix() outputs a matrix of n x m, where a row is by default the count of the interactions between an entity of one layer in a certain timeframe, by all other entities in the other layer.

In [6]:
stdc.calculate_positions()

Unnamed: 0_level_0,id1,L1_0,L1_1,L1_2
id1,timeframe,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
L1_0,2020,0.0,0.136481,0.292848
L1_1,2020,0.136481,0.0,0.280484
L1_2,2020,0.292848,0.280484,0.0
L1_0,2021,0.0,0.197758,0.291148
L1_1,2021,0.197758,0.0,0.13755
L1_2,2021,0.291148,0.13755,0.0
L1_0,2022,0.0,0.15916,0.195775
L1_1,2022,0.15916,0.0,0.19829
L1_2,2022,0.195775,0.19829,0.0
L1_0,2023,0.0,0.381783,0.373517


The function calculate_positions() outputs a (n x timeframes) x m matrix which calculates the by default the cosine distances between the entities of the projected layer. By default, we calculate 'relative' comparison, which considers the distances between the entities within the same timeframe.

In [7]:
stdc_r = STDC(dimensions=2)
stdc_r.calculate_reduced_positions()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
id1,timeframe,Unnamed: 2_level_1,Unnamed: 3_level_1
L1_0,2020,0.141561,0.089104
L1_1,2020,-0.196469,0.017768
L1_2,2020,0.084484,-0.116628
L1_0,2021,0.037908,0.173652
L1_1,2021,-0.123062,0.014191
L1_2,2021,-0.020709,-0.166895
L1_0,2022,0.135694,0.094662
L1_1,2022,-0.239588,0.138476
L1_2,2022,0.191076,-0.127643
L1_0,2023,0.132557,0.081242


Function calculate_reduced_positions() works in the same fashion as calculate_positions() but implements dimensionality reduction, by default using PCA. In order to be able to calculate the reduced positions, the number of desired dimensions has to be specified in the initialisation.

In [8]:
stdc_r.calculate_aligned_reduced_positions()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,0,1
id1,t1,t2,Unnamed: 3_level_1,Unnamed: 4_level_1
L1_0,2020,2021,0.089734,0.131378
L1_0,2021,2022,0.086801,0.134157
L1_0,2022,2023,0.134125,0.087952
L1_0,2023,2024,0.148027,0.100611
L1_1,2020,2021,-0.159766,0.01598
L1_1,2021,2022,-0.181325,0.076333
L1_1,2022,2023,-0.204776,0.059793
L1_1,2023,2024,-0.186325,-0.036976
L1_2,2020,2021,0.031888,-0.141762
L1_2,2021,2022,0.085184,-0.147269


calculate_aligned_reduced_positions() calculates the average reduced dimensionality distance between two consecutive timeframes.

In [None]:
stdc.calculate_graphs()

STDC implements the graph-tool library to calculate graphs.

In [None]:
stdc.calculate_communities()

STDC also offers the possibility to calculate communities.
The user can choose from the:
- Stochastic Block Modelling (SBM) algorithm using graph-tool
- Leiden algorithm using iGraph

In [11]:
stdc.calculate_modularities()

Unnamed: 0,timeframe,modularity
0,2020,0.0
1,2021,0.0
2,2022,0.0
3,2023,-3.352917e-16
4,2024,0.0


For calculating the modularity, the graph-tool package is being implemented once again.

In [12]:
stdc.calculate_aligned_modularities()

Unnamed: 0,t1,t2,modularity
0,2020,2021,0.0
1,2021,2022,0.0
2,2022,2023,-1.676459e-16
3,2023,2024,-1.676459e-16


calculate_aligned_modularities() returns the average between two modularities of two consecutive timeframes.

In [13]:
stdc.calculate_velocities()

Unnamed: 0_level_0,Unnamed: 1_level_0,id1,L1_0,L1_1,L1_2
id1,t1,t2,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
L1_0,2020,2021,0.0,0.061277,-0.0017
L1_0,2021,2022,0.0,-0.038598,-0.095373
L1_0,2022,2023,0.0,0.222623,0.177742
L1_0,2023,2024,0.0,-0.12552,-0.158749
L1_1,2020,2021,0.061277,0.0,-0.142933
L1_1,2021,2022,-0.038598,0.0,0.06074
L1_1,2022,2023,0.222623,0.0,0.085702
L1_1,2023,2024,-0.12552,0.0,-0.039996
L1_2,2020,2021,-0.0017,-0.142933,0.0
L1_2,2021,2022,-0.095373,0.06074,0.0


Using the function calculate_velocities(), STDC gives the opportunity to calculate the speed between two timeframes.

### Statistics

In [None]:
stdc.calculate_basic_ts_stats()

calculate_basic_ts_stats() outputs a dataframe with mean, variance and count of the entities in each two consecutive timeframes.

In [63]:
#stdc.calculate_thermodyn_ts_stats()

### Visualisations

In [21]:
#stdc.plot_center_of_mass_trajectory()

In [22]:
stdc.plot_reduced_positions_animation()

The function plot_reduced_positions_animation() uses the reduced dimensionality positions and creates an evolution over all timeframes. To help with interpretation, the function has an option to pass as an argument a dictionary of labels in order to track specific entities in the evolution. 

### Final Words
The STDL library is an efficient tool that simplifies temporal graph analysis. It serves as a foundation for further analyses in the area of network science.

### References
- T. P. Peixoto, Descriptive Vs. Inferential Community Detection in Networks: Pitfalls, Myths and Half-Truths, Elements in the Structure and Dynamics of Complex Networks (2023).
- Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.