# Overview of graph unsupervised embedding methods

For graphs we will consider different types of embedding algorithms.

Overall there are these main types of algorithms

1. Chapter: Node Embedding Algorithms

- Traditional: PCA, MDS, Laplacian Eigenmaps

- Random Walk-based: DeepWalk, Node2Vec, LINE

- Neural Network-based: GCN, GraphSAGE, GAT

- Matrix Factorization: GraRep, HOPE

- Probabilistic: VGAE, Deep Graph Infomax

- Structural: struc2vec

2. Chapter: Edge Embedding Algorithms

- Operator-based (Hadamard, average, etc.) on node embeddings

- Explicit edge embedding methods

3. Chapter: Whole Graph Embedding Algorithms

- Graph Kernels: Weisfeiler-Lehman, Graphlet

- Neural Methods: Graph2Vec, DGCNN, Readout functions in GNNs



In [None]:

#!pip install karateclub


Collecting karateclub
  Downloading karateclub-1.3.3.tar.gz (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.5/64.5 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting numpy<1.23.0 (from karateclub)
  Downloading numpy-1.22.4.zip (11.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.5/11.5 MB[0m [31m71.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting networkx<2.7 (from karateclub)
  Downloading networkx-2.6.3-py3-none-any.whl.metadata (5.0 kB)
Collecting pygsp (from karateclub)
  Downloading PyGSP-0.5.1-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting pandas<=1.3.5 (from karateclub)
  Downloading pandas-1.3.5.tar.gz (4.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.7/4.7 MB[0m 

In [None]:



from karateclub import Graph2Vec
import networkx as nx

# Assume 'graphs' is a list of networkx.Graph objects
model = Graph2Vec(dimensions=64, wl_iterations=2)
model.fit(graphs)
embeddings = model.get_embedding()



Compare Embeddings using different metrics


In [None]:


from sklearn.metrics.pairwise import euclidean_distances
distance_matrix = euclidean_distances(embeddings)

# Chapter 3.
## Graph embeddings

## Embedding methods using GNNs


1. Whole Graph Embedding Algorithms

2. Graph Kernels: Weisfeiler-Lehman, Graphlet

3. Neural Methods: Graph2Vec, DGCNN, Readout functions in GNNs

**Input graph.**
In this algorithm we run the code on the graph, which we either set externally below as G
or generate it using random networks library e.g.

G = nx.erdos_renyi_graph(n, p).

## Methods for embeddings comparison

We have several graphs of size N nodes and we embed it into space of size m using torch_geometric.

Then we want to estimate the distance between those graphs using various metrics (Euclidean distance and other metrics) to get a classification of graphs in m-dimensional space.

In [2]:

!pip install torch_geometric



Collecting torch_geometric
  Downloading torch_geometric-2.6.1-py3-none-any.whl.metadata (63 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/63.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.1/63.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Downloading torch_geometric-2.6.1-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torch_geometric
Successfully installed torch_geometric-2.6.1


Convert Graphs and Define Model:
python
Copy

In [4]:


from torch_geometric.data import Data
import torch



In [17]:
import networkx as nx

n = 100  # Number of nodes
p = 0.1  # Probability of edge creation
G = nx.erdos_renyi_graph(n, p)

pyg_graph1 = Data(x=torch.ones(G.number_of_nodes(), 1),
              edge_index=torch.tensor(list(G.edges)).t().contiguous())


n = 100  # Number of nodes
p = 0.5  # Probability of edge creation
G = nx.erdos_renyi_graph(n, p)

pyg_graph2 = Data(x=torch.ones(G.number_of_nodes(), 1),
              edge_index=torch.tensor(list(G.edges)).t().contiguous())


# make a loop through set of graphs

# Run simulations in the loop

In [None]:
# load graphs in the loop


for graph in graphs:
    # Extract data
    x = graph.x  # Node features [N, input_dim]
    edge_index = graph.edge_index
    batch = torch.zeros(x.size(0), dtype=torch.long)  # Single graph in batch


# Loading graph from real world data

We can download them from osmnx graphs library.

In [12]:
!pip install osmnx

Collecting osmnx
  Downloading osmnx-2.0.1-py3-none-any.whl.metadata (4.9 kB)
Downloading osmnx-2.0.1-py3-none-any.whl (99 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.6/99.6 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: osmnx
Successfully installed osmnx-2.0.1


In [13]:

import osmnx as ox
import shapely.geometry as sg
from tqdm import tqdm


G = ox.graph_from_place("Sochi, Russia")
#G = nx.read_graphml("graph_R62145.graphml")
G = nx.MultiGraph(G)
G.graph["crs"] = "epsg:4326"

pyg_graph = Data(x=torch.ones(G.number_of_nodes(), 1),
              edge_index=torch.tensor(list(G.edges)).t().contiguous())

In [None]:

#Convert networkx graphs to PyG Data objects
pyg_graphs = [Data(x=torch.ones(G.number_of_nodes(), 1),
              edge_index=torch.tensor(list(G.edges)).t().contiguous())
             for G in graphs]

# GCN for embeddings

We can use GCN for generating the embeddings.
Based on the hidden dimension size we will get different coordinate system.
Embedding contains the topological information.

In [6]:

# Define a simple GNN with global mean pooling
from torch_geometric.nn import global_mean_pool, GCNConv
import torch.nn as nn

class GNNEmbedder(nn.Module):
    def __init__(self, hidden_dim=64):
        super().__init__()
        self.conv1 = GCNConv(1, hidden_dim)
        self.pool = global_mean_pool

    def forward(self, data):
        x, edge_index, batch = data.x, data.edge_index, data.batch
        x = self.conv1(x, edge_index).relu()
        return self.pool(x, batch)

model = GNNEmbedder()


# Save and extract embeddings

We can now save our embeddings as a tensor (vector).
This output can be then combined with other network properties (persistence diagrams).


In [19]:
# now running the embedding

graph = pyg_graph1
with torch.no_grad():
  graph_emb1 = model(graph)



graph = pyg_graph2
with torch.no_grad():
  graph_emb2 = model(graph)


In [10]:
print(graph_emb.shape)
print(graph_emb)

torch.Size([1, 64])
tensor([[0.0000, 0.0000, 0.3854, 0.0000, 0.0496, 0.0000, 0.0000, 0.2420, 0.1587,
         0.2035, 0.2714, 0.0000, 0.0000, 0.0431, 0.0000, 0.0000, 0.0000, 0.0263,
         0.0881, 0.2244, 0.0000, 0.1510, 0.3728, 0.2263, 0.0000, 0.2913, 0.0000,
         0.2768, 0.4002, 0.0137, 0.0000, 0.0740, 0.3970, 0.4024, 0.0000, 0.0000,
         0.0000, 0.3510, 0.2094, 0.0000, 0.0000, 0.3934, 0.0000, 0.0000, 0.1000,
         0.0841, 0.0279, 0.3629, 0.1274, 0.0000, 0.0000, 0.0000, 0.1879, 0.2561,
         0.2662, 0.0000, 0.1142, 0.1643, 0.0000, 0.0000, 0.2277, 0.0815, 0.0000,
         0.2532]])


If we need to apply this to several graphs, then we need to do that in the look applying graph_emb method.


In [1]:

embeddings = []
for graph in pyg_graph:
    graph.batch = torch.zeros(graph.num_nodes, dtype=torch.long)
    with torch.no_grad():
        emb = model(graph)
    embeddings.append(emb.numpy())

# save all the embeddings to one big stack
embeddings = np.vstack(embeddings)

# Topological and structural values
We compute network measures.

In [None]:
import networkx as nx

def compute_network_measures(G):
    measures = {
        "avg_degree": np.mean(list(dict(G.degree()).values())),
        "clustering_coeff": nx.average_clustering(G),
        "diameter": nx.diameter(G) if nx.is_connected(G) else 0,
    }
    return np.array(list(measures.values()))  # Shape: [num_measures]

# Classification based on the embeddings and topological features together

We now want to use the classification method where the input has the format of the vector

      embedding output, vector  = (1, ... 0, ... 1)

      ripser output, matrix = (N=0 (), N=1 (), N=2 ())

      topological features of the graph, vector = ()


We use the hybrid classification method to detect and classify our graphs into subclusters.

In [None]:


import torch.nn as nn

class HybridClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_classes):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, num_classes)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

# Usage:
model = HybridClassifier(input_dim=normalized_features.shape[1], hidden_dim=64, num_classes=10)