# Graph-based Digital Inequality Analysis: TGAT Workflow

## Overview

We aim to explore the use of **Temporal Graph Attention Networks (TGAT)** for predicting changes in clusters of countries within the domain of digital inequality.

- **Objective:** Forecast how clusters of countries evolve over time based on historical data about digital access, usage, and other related indicators.
- **Why TGAT:** TGAT is designed to handle **temporal graph data**, capturing both structural dependencies (edges between nodes) and temporal dynamics (how these relationships change over time). We will try to achieve better performance on this model in comparsion with other's models (baseline, GCN).
- **Dataset:** The dataset consists of country-level indicators over multiple years, represented as a **dynamic graph** where nodes are countries, edges represent relationships between consecutive years, and node features encode digital inequality metrics.
- **Challenge:** We want the model to predict **future cluster changes**, which requires capturing subtle temporal patterns in addition to structural information.

In the following sections, we attempt to implement and train a TGAT model to address this prediction task.

---

### Step 0: Installing needed libraries

In [1]:
!pip install torch_geometric

[0m

### 1. Loading the Graph Data
We start by loading the preprocessed graph data, which represents countries as nodes and their similarity based on digital indicators as edges. The graph was previously created using k-nearest neighbors on the features extracted from multiple digital and socio-economic indicators per country.

The graph is stored as a PyTorch Geometric `Data` object, including:
- `x` — node features matrix (`num_nodes × num_features`)
- `edge_index` — edge list in COO format
- `y` — target labels for clustering or classification

In [None]:
import torch
import pandas as pd

data = torch.load("../digital_inequality_graph.pt", weights_only=False)
print(data)

  import torch_geometric.typing
  import torch_geometric.typing


Data(x=[734, 123], edge_index=[2, 14680])


In [None]:
from sklearn.preprocessing import StandardScaler
import numpy as np

# Read the file with dataset
df = pd.read_csv("../cleaned_final_dataset.csv")

# Do the pivot: each country+year as a row, indicators as columns
pivot_df = df.pivot_table(
    index=["Economy", "Year"],
    columns="Indicator",
    values="Value"
).reset_index()

# === 3. Creating binary masks (has_data for each indicator) ===
mask_df = pivot_df.drop(columns=["Economy", "Year"]).notna().astype(int)
mask_df = mask_df.add_prefix("mask_")

# === 4. Index of digital backwardness ===
pivot_df["digital_backwards_index"] = (
    pivot_df.drop(columns=["Economy", "Year"]).isna().sum(axis=1) /
    pivot_df.drop(columns=["Economy", "Year"]).shape[1]
)

# === 5. Imputation with a "fine" ===
# We take the minimum value (or the 5th percentile) so that NaN turns into "low development"
numeric_part = pivot_df.drop(columns=["Economy", "Year", "digital_backwards_index"])
min_values = numeric_part.min(skipna=True)

imputed = numeric_part.apply(lambda col: col.fillna(min_values[col.name]))

# === 6. Normalization ===
scaler = StandardScaler()
X_imputed = scaler.fit_transform(imputed)

# === 7. Collecting the final signs ===
features = np.concatenate([
    X_imputed,                             # normalized values of indicators
    mask_df.values,                        # binary masks
    pivot_df[["digital_backwards_index"]].values  # index of backwardness
], axis=1)

# === 8. Crreate node_id ===
pivot_df["node_id"] = range(len(pivot_df))



years = pivot_df.sort_values("node_id")["Year"].values
countries = pivot_df.sort_values("node_id")["Economy"].values

print("Years shape:", years.shape)
print("Sample:", years[:20])
print("Unique years:", np.unique(years))

Years shape: (734,)
Sample: [2019 2020 2021 2022 2023 2018 2019 2020 2021 2022 2023 2014 2015 2016
 2017 2018 2019 2020 2021 2022]
Unique years: [2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024]


### 2. Clustering / Target Assignment

For the supervised GCN task, we define a target `y` for each node.
The target is based on clustering countries by their digital profiles. Clusters allow us to identify:

* Lagging regions (low digital development)
* Advanced regions (high digital adoption)
* Transitional regions (shifting between clusters over time)

This target enables the GCN to learn embeddings that reflect similarity in digital adoption patterns.

Then we divide the graph into **parts (subgraphs)** in terms of year.

In [6]:
import torch
from torch_geometric.data import Data
from sklearn.cluster import KMeans

num_clusters = 3  # можно менять
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
clusters = kmeans.fit_predict(data.x.numpy())
data.y = torch.tensor(clusters, dtype=torch.long)  # save in data

unique_years = np.unique(years)
year_to_graph = {}
years_tensor = torch.from_numpy(years)

for yr in unique_years:
    # Nodes of current year
    nodes = torch.where(years_tensor == yr)[0]

    # Mapping old_node_id -> new_local_id
    mapping = {int(n): i for i, n in enumerate(nodes.tolist())}

    # Filter the edges, remain only the edges related to the current year
    mask = [
        mapping.get(int(src), None) is not None and
        mapping.get(int(dst), None) is not None
        for src, dst in data.edge_index.t().tolist()
    ]

    edge_index_year = data.edge_index[:, mask]

    new_edges = []
    for src, dst in edge_index_year.t().tolist():
        new_edges.append([mapping[int(src)], mapping[int(dst)]])

    edge_index_year = torch.tensor(new_edges).t().contiguous()

    # Current year features
    x_year = data.x[nodes]
    # Label
    y_year = data.y[nodes]

    # Create subgraph
    year_to_graph[yr] = Data(
        x=x_year,
        edge_index=edge_index_year,
        y=y_year,
        node_ids=nodes
    )

    print(f"Year {yr}: nodes={x_year.shape[0]}, edges={edge_index_year.shape[1]}")

Year 2014: nodes=60, edges=608
Year 2015: nodes=66, edges=536
Year 2016: nodes=68, edges=350
Year 2017: nodes=74, edges=346
Year 2018: nodes=69, edges=344
Year 2019: nodes=79, edges=394
Year 2020: nodes=69, edges=374
Year 2021: nodes=68, edges=352
Year 2022: nodes=68, edges=382
Year 2023: nodes=68, edges=372
Year 2024: nodes=45, edges=216


### 3. Consolidate Node Features and Labels

In the first step, we build **global arrays** for node features, labels, and timestamps across all years.  

- `global_x`: concatenated node features from all yearly graphs  
- `global_y`: corresponding node labels  
- `global_timestamps`: the year each node belongs to  

We also create a **mapping from local node indices to global indices** (`node_id_map`) to uniquely identify each node across all years. This allows us to track countries and their cluster assignments over time.  

In [7]:
import numpy as np
import torch

# We get a list of all the years
years_sorted = sorted(year_to_graph.keys())

global_x = []
global_y = []
global_timestamps = []
node_id_map = {}  # (year, local_idx) -> global_idx
current_id = 0

for yr in years_sorted:
    g = year_to_graph[yr]
    n = g.x.shape[0]

    for i in range(n):
        node_id_map[(yr, i)] = current_id
        global_x.append(g.x[i].numpy())
        global_y.append(g.y[i].item())
        global_timestamps.append(yr)  # timestamp = year
        current_id += 1

global_x = np.vstack(global_x)
global_y = np.array(global_y)
global_timestamps = np.array(global_timestamps)

print("Total nodes:", len(global_x))

Total nodes: 734


### 4. Build Edge List with Temporal Information

Next, we construct the **temporal edge list**:

1. **Intra-year edges:**
   For each year, we add edges between nodes based on that year's graph. The timestamps for these edges correspond to the year.

2. **Inter-year edges (cross-year events):**
   We connect the same country across consecutive years to capture temporal evolution. This is crucial for predicting changes in cluster membership over time.

We store:

* `src_list` and `dst_list`: source and destination node indices for edges
* `time_list`: the timestamp for each edge

This prepares the graph for the TGAT model, which can leverage both **structural and temporal information**.

In [8]:
src_list = []
dst_list = []
time_list = []

for yr in years_sorted:
    g = year_to_graph[yr]
    ei = g.edge_index.numpy()

    for s, d in zip(ei[0], ei[1]):
        gs = node_id_map[(yr, s)]
        gd = node_id_map[(yr, d)]
        src_list.append(gs)
        dst_list.append(gd)
        time_list.append(yr)

# Crossyear events
for country in pivot_df["Economy"].unique():
    country_nodes = pivot_df[pivot_df["Economy"] == country].sort_values("Year")
    ids = country_nodes["node_id"].values  # local node_id in global_x
    years_country = country_nodes["Year"].values

    for i in range(len(ids) - 1):
        # global indices
        src = node_id_map[(years_country[i], i)]
        dst = node_id_map[(years_country[i+1], i+1)]

        src_list.append(src)
        dst_list.append(dst)
        time_list.append(years_country[i+1])  # timestamp

In [9]:
for country in pivot_df["Economy"].unique():
    nodes = pivot_df[pivot_df["Economy"] == country].sort_values("Year")["node_id"].tolist()
    for i in range(len(nodes)-1):
        src_list.append(nodes[i])
        dst_list.append(nodes[i+1])
        time_list.append(years[nodes[i+1]])

print(src_list)
print(dst_list)
print(time_list)

src = torch.tensor(src_list, dtype=torch.long)
dst = torch.tensor(dst_list, dtype=torch.long)
ts = torch.tensor(time_list, dtype=torch.float)
x = torch.tensor(global_x, dtype=torch.float)
y = torch.tensor(global_y, dtype=torch.long)

[0, 0, 0, 0, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 6, 6, 6, 6, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 20, 20, 20, 20, 20, 20, 20, 21, 22, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 25, 26, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 35, 36, 36, 36, 36, 36, 36, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 40, 40, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 42, 43, 44, 44, 44, 44, 44, 45, 46, 46, 47, 48, 48, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 51, 52, 53, 53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 55, 56, 57, 57, 58, 58, 58, 58, 58, 58, 58, 4, 17, 50, 20, 7, 45, 40, 13, 32, 17, 49, 29, 50, 10, 38, 42, 

---

### 5. Model Architecture: TGAT

Our TGAT implementation uses **TransformerConv layers** from PyTorch Geometric. Temporal information is encoded with a small **MLP**, which is concatenated to the node features before message passing. The network consists of:

- Two TransformerConv layers with multi-head attention
- Temporal encoding MLP
- Final MLP for classification into cluster labels

This allows the model to learn **both the structure of the graph and temporal evolution** of nodes.


In [73]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import TransformerConv


class TGAT(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()

        self.hidden = hidden_channels

        # Encode time into hidden_channels size
        self.time_mlp = nn.Sequential(
            nn.Linear(1, hidden_channels),
            nn.ReLU(),
            nn.Linear(hidden_channels, hidden_channels)
        )

        # Input to GNN will be: x_dim + hidden_dim
        conv_in = in_channels + hidden_channels

        self.conv1 = TransformerConv(
            conv_in, hidden_channels, heads=2, dropout=0.1, concat=False
        )
        self.conv2 = TransformerConv(
            hidden_channels, hidden_channels, heads=2, dropout=0.1, concat=False
        )

        self.mlp = nn.Sequential(
            nn.Linear(hidden_channels, hidden_channels),
            nn.ReLU(),
            nn.Linear(hidden_channels, out_channels)
        )

    def forward(self, x, edge_index, t):
        src, dst = edge_index

        # Temporal encoding for each edge
        t = t.unsqueeze(-1)                    # [E, 1]
        t_enc = self.time_mlp(t)               # [E, H]

        # Convert edge-level times to node-level
        t_nodes = torch.zeros(x.size(0), self.hidden, device=x.device)
        t_nodes[src] = t_enc
        t_nodes[dst] = t_enc

        # Final node input into TransformerConv
        x_in = torch.cat([x, t_nodes], dim=1)  # [N, in + H]

        # GNN layers
        h = self.conv1(x_in, edge_index)
        h = F.relu(h)
        h = self.conv2(h, edge_index)
        h = F.relu(h)

        out = self.mlp(h)
        return out

---

### 6. Train/Test Split

Edges are split based on timestamps:  

- **Training set:** edges from the first 80% of timestamps  
- **Testing set:** edges from the last 20% of timestamps  

Node labels remain global, and the temporal edge information is preserved in the model input.

In [68]:
edge_index = torch.stack([src, dst], dim=0)

# split по времени
time_threshold = torch.quantile(ts, 0.8)  # 80% train, 20% test

train_edges = (ts <= time_threshold)
test_edges  = (ts > time_threshold)

edge_index_train = edge_index[:, train_edges]
edge_index_test  = edge_index[:, test_edges]

ts_train = ts[train_edges]
ts_test  = ts[test_edges]

# labels просто остаются глобальными


### 7. Model Initialization
We initialize the **TGAT model**, optimizer, and loss function, and move all tensors (`x`, `y`, edge indices, timestamps) to the appropriate device (CPU/GPU).


In [74]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = TGAT(
    in_channels=x.size(1),
    hidden_channels=64,
    out_channels=len(torch.unique(y))
).to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

x = x.to(device)
y = y.to(device)
edge_index_train = edge_index_train.to(device)
ts_train = ts_train.to(device)
edge_index_test = edge_index_test.to(device)
ts_test = ts_test.to(device)

### 8. Training and Evaluation Loop

We define two functions:

- `train_epoch()`: performs one epoch of training on the temporal graph
- `eval_epoch()`: evaluates the model on the test edges, computing accuracy

The model is trained for 50 epochs, and we monitor **loss** and **test accuracy** periodically to observe performance.

---

By combining temporal and structural information, TGAT is expected to capture **how countries evolve in the digital inequality space**, potentially providing insights into emerging trends and cluster shifts over time.

In [76]:
def train_epoch():
    model.train()
    optimizer.zero_grad()

    out = model(x, edge_index_train, ts_train)
    loss = loss_fn(out, y)
    loss.backward()
    optimizer.step()

    return loss.item()


def eval_epoch():
    model.eval()
    with torch.no_grad():
        out = model(x, edge_index_test, ts_test)
        pred = out.argmax(dim=1)
        acc = (pred == y).float().mean().item()
    return acc

for epoch in range(1, 51):
    loss = train_epoch()
    acc  = eval_epoch()
    if epoch %5 == 0:
        print(f"Epoch {epoch:02d} | Loss: {loss:.4f} | Test Acc: {acc:.4f}")

Epoch 05 | Loss: 1.0057 | Test Acc: 0.6717
Epoch 10 | Loss: 1.0249 | Test Acc: 0.6621
Epoch 15 | Loss: 1.0401 | Test Acc: 0.6662
Epoch 20 | Loss: 1.0007 | Test Acc: 0.6703
Epoch 25 | Loss: 0.9934 | Test Acc: 0.6689
Epoch 30 | Loss: 0.9578 | Test Acc: 0.6689
Epoch 35 | Loss: 0.9586 | Test Acc: 0.7003
Epoch 40 | Loss: 0.9843 | Test Acc: 0.7003
Epoch 45 | Loss: 0.9609 | Test Acc: 0.7016
Epoch 50 | Loss: 0.9427 | Test Acc: 0.6444


## TGAT Training Summary

Training progress:

| Epoch | Loss   | Test Accuracy |
|-------|--------|---------------|
| 05    | 1.0057 | 0.6717        |
| 10    | 1.0249 | 0.6621        |
| 15    | 1.0401 | 0.6662        |
| 20    | 1.0007 | 0.6703        |
| 25    | 0.9934 | 0.6689        |
| 30    | 0.9578 | 0.6689        |
| 35    | 0.9586 | 0.7003        |
| 40    | 0.9843 | 0.7003        |
| 45    | 0.9609 | 0.7016        |
| 50    | 0.9427 | 0.6444        |

### Observations

- We were unable to build a **stable and effective** TGAT model on our dataset.
- Both loss and test accuracy remain low, and training is unstable.
- The TGAT model in this implementation **does not capture the target behavior** of cluster evolution over time.
- **Decision:** We will leave TGAT aside and switch to a more reliable **GCN** architecture for further usage as a main model.

---