### Autoencoders

An autoencoder uses a neural network to transform an input into a low-dimensional representation of the input, just like we've been doing with node embedding. A _graph autoencoder_ (GAE) uses the graph convolutional neural networks discussed in the last tutorial and specifies an error function to be used to update the hidden layer weights.

The forward pass of a GCN _encodes_ data and forms a node embedding. What is done with the embedding is called _decoding_, and the form of decoding is determined by what goal you are trying to accomplish with the graph neural network (GNN). 

---

#### GAE Encoding and Decoding

A graph autoencoder (GAE) uses convolution to encode an input in a useful embedding. In a GAE, the decoder tries to reconstruct the input. I was trained as an engineer so I think of encoding as data compression. A good compression algorithm does two things:
 - it represents the original object using fewer "bits" than in the original object
 - it can be used to reconstruct the original object, hopefully with little loss of accuracy 
 
 The encoder compresses the object, and the decoder decompresses itFormally speaking, the process of turning the original object into the compressed representation is called __encoding__, and the process of turning the compressed representation into an approximation of the original object is called __decoding__.  The key insight is that the network tries to reconstruct its input from the low-dimensional embedding, as illustrated in the following figure.

<img src="figures/autoencoder.png" alt="eneral autoencoder architecture" width = "600">

The compressed representation is often called a _latent representation_. One meaning of the word "latent" is "hidden or dormant". The idea is that the compressed representation captures the fundamental structures hidden in the original object while getting rid of the parts of the original object that aren't necessary. It's like saying we've found the small representation hidden by the large representation of the original object. In the graph machine learning context, the output of a graph autoencoder is called a _node embedding_ instead of the _latent representation_. I hope that the reason is fairly obvious: we are using the GCN to find a low-dimensional embedding of the original network structure. 

The difference between the decoded output and input gives us an error signal that can be used to update the weights used in the GCN. More generally, the error is a measure the difference between and output and the desired output, and this error is _back propagated_ through the network updating the weights in the hidden layers.

---


### Graph Autoencoder Pipeline

We'll now discuss the data processing pipeline for a GAE. The goal will be to review the pytorch.geometric tools used to implement a GAE at the same time that we mathematically define the error function.
This part of the tutorial relies heavily on the [presentation from Antonio Longa](https://antoniolonga.github.io/Pytorch_geometric_tutorials/posts/post6.html), and the [Jupyter notebook used in the presentation](https://github.com/AntonioLonga/PytorchGeometricTutorial/blob/main/Tutorial6/Tutorial6.ipynb). 


__Put the graph into the pytorch data structure__

These examples will implement the GCN using pytorch. 
We'll demonstrate the steps using the pinwheel graph, first without features and then with features.

In [None]:
#############
## CELL 0  ##
#############

import networkx as nx
import matplotlib.pyplot as plt

pinwheel_graph = nx.Graph()
pinwheel_graph.add_nodes_from([0,1,2,3,4,5,6,7,8,9])
pinwheel_graph.add_edges_from([(0,1),(0,2),(1,2),(3,4),(3,5),(4,5),(6,7),(6,8),(7,8),(0,9),(3,9),(6,9)])
plt.figure()
nx.draw_networkx(pinwheel_graph, 
                 pos = nx.nx_pydot.graphviz_layout(pinwheel_graph,prog="neato"),
                 node_color='y',
                 node_size=800,
                 font_size=14,
                 alpha = 0.7)
_ = plt.axis('off')

We'll be using a graph convolutional neural network to construct our graph autoencoder. Since GCNs need to remember each node's feature when they are blended with their neighbors' features, we need to add self loops. 

Here's a copy of ChatGPT-4o on this topic.

**Question:** When inputting a graph into a pytorch.geometric structure to be used in a graph autoencoder, do I need to add self loops or is that take care of automatically when I specify the GCN layers

**Answer:** Yes, you generally need to explicitly add self-loops to your edge_index before passing the graph into the model — unless you’re using a GCN variant that adds them internally.

- PyG’s GCNConv does not automatically add self-loops by default.
- The original GCN formulation from Kipf & Welling (2016) requires self-loops so that a node sees its own features during message passing.

You should use:

`from torch_geometric.utils import add_self_loops
edge_index, _ = add_self_loops(edge_index, num_nodes=data.num_nodes)`

I'll add them explicitly for the pinwheel network and then use the method above in the example of the karate network below.

In [None]:
#############
## CELL 1  ##
#############

pinwheel_graph.add_edges_from([(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)])
plt.figure()
nx.draw_networkx(pinwheel_graph, 
                 pos = nx.nx_pydot.graphviz_layout(pinwheel_graph,prog="neato"),
                 node_color='y',
                 node_size=800,
                 font_size=14,
                 alpha = 0.7)
_ = plt.axis('off')

Pytorch has a utility that converts a networkx graph into the pytorch _data_ type.

In [None]:
#############
## CELL 2  ##
#############
import numpy as np
from torch_geometric.utils import from_networkx
from torch import Tensor
from torch_geometric.data import Data as PyGData
data: PyGData = from_networkx(pinwheel_graph)      # convert to pytorch data structure
print(f"Pinwheel graph as pytorch data structure is\n\t {data}")

Each data object has an edge_index member, which we access using the standard way of referencing object members.

```
    data.edge_index
```

The edge_index represents _graph connectivity_. This is essentially the edge set of the graph. The first index contains the ordered edge pair. The second index is the number of edges. Each edge is directed, which means that an undirected graph with an edge $\{u,v\}$ appears wtice in the edge_index: once as $(u,v)$ and again as $(v,u)$.


Let's look at the edge index data.

In [None]:
#############
## CELL 3  ##
#############
for i in {0,1,2,3}:
    print(f"edge is {data.edge_index[:,i]}")

Edges are represented as tensors, which for the purposes of CS 575 can be thought of simply as vectors.


We'll need inputs to the graph autoencoder, which we'll represent using _one-hot encoding_.

In [None]:
#############
## CELL 4  ##
#############
data.x = Tensor(np.eye(len(pinwheel_graph)))  # labels features using one-hop encoding
for node in [0,1,2,3,9]:
    print(f'x values node {node}: \n\t{data.x[node]}')
data.num_node_features = len(data.x)

This encoding simply says that the feature of each node is a row vector that is zero everywhere except in the column corresponding to the node number.

In [None]:
print(data.num_node_features)

__Defining the Encoder__

The graph autoencoder that we'll implement is based on equations (3) and (4) from

```
    Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., & Zhang, C. (2018). Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407.
```

The encoder consists of two GCN layers and one decoder layer. 

<img src="figures/graph_autoencoder.png" alt="Graph autoencoder example" width = "500">

The encoder has two hidden layers. For the first layer, $h1$ is computed using the matrix multiplications defined above 

$$ H^{(1)} = \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} X W$$

and then passing through a rectified linear unit

$$ f_{\rm ReLu}\left(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} X W \right) $$

The rectified linear unit sets everything less than zero to zero and anything else to its original value,

$$ f_{\rm ReLu}(x) = \left\{ \begin{array}{rl} 0 & {\rm if\ } x < 0 \\ x & {\rm otherwise}\end{array}\right. $$

The second layer computes the embedding. Recall that a node embedding was a mapping from the set of graph nodes to a $d$-dimensional real-valued vector
$$ f: V\rightarrow \mathbb{R}^d $$ 

Let ${\mathbf z}_i = f({\mathbf x}_i)$, where ${\mathbf x}_i$ is the one-hot encoding for node $i$. ${\mathbf z}$ is computed using

$$ Z = \tilde{D}^{-1/2} \tilde{H^{(1)}} \tilde{D}^{-1/2} X W$$

We don't "squash" the output of the second hidden layer because we want the vector output from that layer.

---
---


### Build a GAE for the Pinwheel Network

As in the GCN tutorial, we define the GAE. Let's construct one with two hidden layers and one output layer. Each layer will use a graph convolutional neural network.

In [None]:
#############
## CELL 5  ##
#############
from torch_geometric.nn import GAE
import torch
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

class Encoder_With_Three_Layers(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder_With_Three_Layers, self).__init__()
        self.hidden_layer_1 = GCNConv(in_channels, 8 * out_channels)
        self.hidden_layer_2 = GCNConv(8 * out_channels, 2 * out_channels)
        self.output_layer = GCNConv(2 * out_channels, out_channels)
        
    def forward(self, x, edge_index): 
        h1 = self.hidden_layer_1(x, edge_index)
        squashed_h1 = F.relu(h1)    
        h2 = self.hidden_layer_2(squashed_h1, edge_index)
        squashed_h2 = F.relu(h2)    
        z = self.output_layer(squashed_h2, edge_index)
        return z

We can run a _forward pass_ and compute the output for a given input. As in the previous tutorial about GCNs, the first step is instantiating the model, which initializes the weights randomly. The second step is doing all the multiplying and squashing. Let's inspect the output 

__Implementation__

We'll implement this using pytorch geometric. 
The pytorch pipeline for training a neural network has the following stages:
 - _forward_: Run a forward pass of the network, which takes an input and generates the output.
 - _loss_: Compute an error, which is some form of difference between what the network generated and what we wanted to compute. 
 - _backward_: Propagate the loss back through the network to update the weights.
 - _optimize_: Let whatever (stochastic gradient descent) optimizer being used know to take a step in the direction of the gradient

Since we are implementing a graph autoencoder, we need to define an encoder and a decoder, both of which will be classes that the pytorch pipeline will use. 


__Encoder__

Let's begin by defining the encoder. This class must have define initializer and what happens on the forward step in the pipeline. The initializer function defines the network structure, which inherits from the torch.nn.Module parent class. The basic building block is the GCNConv class, which implements the matrix multiplications 

$$ \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} X W$$

The forward function says how to use the convolution blocks. Each convolution function has two inputs:
 - the output from the previous layer, which is the feature vector $x$ for the first hidden layer.
 - the adjacency matrix, which is specified as the edge_index

 Let's start with a model that works. We can then do some variations on this model to see when it breaks.

In [None]:
#############
## CELL 6  ##
#############

# parameters
# out_channels is the number of output channels for the final layer
# in_channels is the number of input channels for the first layer
# num_features is the number of features for each node
# in_channels = data.num_node_features
out_channels = 4
num_features = data.num_features

# Instantiate the model
# The model is a GAE (Graph Autoencoder) with three layers
# The Encoder_With_Three_Layers class is used to define the encoder part of the GAE
if 'model' in globals(): 
    del model
model = GAE(Encoder_With_Three_Layers(num_features, out_channels))

# Forward pass
with torch.no_grad():               # Disable gradient tracking for inference
    # Get node embeddings from the model
    # Note: data is a PyG Data object with attributes x (node features) and edge_index (graph connectivity)
    # The model returns a tensor of shape (num_nodes, out_dim)
     # where num_nodes is the number of nodes in the graph and out_dim is the final embedding size
    node_embeddings = model(data.x, data.edge_index)  # Shape: [10, 4]

print(f"The node embeddings has {len(pinwheel_graph)} rows, one for each node in the graph")
print(f"and {out_channels} columns, one for each output channel")
print(f"The node embeddings are:\n{node_embeddings}")
print(f"The node embeddings are of shape {node_embeddings.shape}")

__Node Similarity__

Recall that the goal is to take node $i$, call it $u_i$, and compute a real-valued vector representation of the node, call it ${\mathbf z}_i$. We want the embedding to satisfy the property that two similar nodes end up close to each other in the embedding space. Each ${\mathbf z}_i$ is stored as a row in the `node_embeddings` object, which means that ${\mathbf z}_i\in \mathbb{R}^d$ for the embedding dimension $d=4$. Thus, the embedding of node $0$ is the first row of node_embeddings

In [None]:
#############
## CELL 7  ##
#############

print(f"Node {0}'s embedding is {node_embeddings[0]}")
print(f"Node {1}'s embedding is {node_embeddings[1]}")
print(f"Node {2}'s embedding is {node_embeddings[2]}")

Recall that the process of creating node embeddings is to find a function $f:V\mapsto\mathbb{R}^d$ such that if two vertices are similar than the distance between their embeddings is small. 

- if ${\rm sim}(u_i, u_j)$ is high then we want ${\mathbf z}_i$ to be near ${\mathbf z}_j$.

For a graph convolutional autoencoder, we need to define what we mean both by _similar_ and by _near_. 

- We'll use adjacency to define _similar_, so two nodes are similar if $A_{ij}=1$.
- We'll use cosine similarity as the metric for _near_, so we want ${\mathbf z}_i{\mathbf z}j^T$ to be high.

The product ${\mathbf z}_i{\mathbf z}j^T$ is called the _inner product_, since each embedding vector is a row vector.

Vertices 1 and 2 are similar to each other since they are in the same community and have the same role. Let's look at the inner product.


In [None]:
#############
## CELL 8  ##
#############

print(f"The inner product for nodes 1 and 2 is {torch.dot(node_embeddings[1], node_embeddings[2])}")

Vertices 1 and 4 are not as similar to each other since they are in different communities, so we want their inner product bo be high.

In [None]:
#############
## CELL 9  ##
#############
 
print(f"The inner product for nodes 1 and 4 is {torch.dot(node_embeddings[1], node_embeddings[4])}")

Vertices 1 and 9 are not similar to each other since they are in different communities and have different roles, so we want their inner product to be high.

In [None]:
##############
## CELL 10  ##
##############

print(f"The inner product for nodes 1 and 9 is {torch.dot(node_embeddings[1], node_embeddings[9])}")

The inner product values are probably about the same since the weights in the GCN are random, so that leads us to computing and backpropagating error.


__Sigmoid: Refining Definition of "Near"__

The maximum value of the cosine between two vectors is 1, and the minimum value for the cosine between two vectors is -1. And since we aren't dividing by the length of ${\mathbf z}_i$ and ${\mathbf z}_j$ like we technically have to do if we want the vector product to represent actual cosine, we aren't guaranteed that ${\mathbf z}_i{\mathbf z}_j^T$ will even be between $-1$ and $1$. To fix this, we'll pass this product through the sigmoid function, which means that _near_ is defined as

$$ \sigma({\mathbf z}_i{\mathbf z}_j^T) = \frac{1}{1+e^{-{\mathbf z}_i{\mathbf z}_j^T}}$$

which squashes the values of _near_ so that they are always between $0$ and $1$. In other words, we'll approximate $A_{ij}$ by $\sigma({\mathbf z}_i{\mathbf z}_j^T)$.  The decoder does this computation for us.

---

#### Learning Pipeline

Let's state what we've done so far clearly and then discuss how to implement this in pytorch.geometric.

- if ${\rm sim}(u_i, u_j)$ is high then $\sigma({\mathbf z}_i{\mathbf z}_j^T)$ should be high (since "near" in vector space means high cosine similarity).

We have to tell three things to pytorch.geometric so that it can learn new weights in the GCN layers.
- how error is defined
- how error is used to update weights
- what has to happen to backpropagate the error to actually update the weights

We'll now discuss each of these pytorch.geometric components, showing code and mathematical definition. Understanding the error requires us to understand what a _decoder_ is, so we'll begin with that definition.

**Decoder**

This tutorial has mentioned the term _decoder_ before, but to better understand how this term applies to graph neural networks  I've copied a portion of Figure 1 from

```
    Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., & Zhang, C. (2018). Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407.
```

<img src="figures/GAE_Architecture.png" alt="Graph autoencoder example" width = "800">

- The leftmost box shows the graph represented as the adjacency matrix $A$ and the feature vectors represented as the matrix $X$. For a PyG data structure called `data`, $A$ and $X$ are stored in `data.edges` and `data.x`, respectively.
- The second-to-left box represents the series of graph convolutional layers that transform $A$ and $X$ to the embedding $Z$. 
- I don't know what the middle box is doing and it's not important for our purposes so I won't take time to reread the paper.
- The second-to-right box takes the inner product ${\mathbf z}_i{\mathbf z}_j^T$ for all nodes, which can be done for all node embeddings all at once by just doing the product of the embedding matrix $Z$ with its transpose. The product is then squashed through the sigmoid function.
- The rightmost box shows a new graph and its adjacency matrix, denoted by $A'$. I'm going to denote this new adjacency matrix by $\hat{A}$ because I like using the little hat to denote an approximatio nof something. The rightmost box is saying that we want to use the embedding stored in $Z$ to produce an approximation of the original adjacency matrix. 

The second-to-right and rightmost boxes _decode_ the embedding, which means that they take the low dimensional representation and try to turn it back into the original adjacency matrix.

Let's look at the model we've built and see if it matches the pieces of the figure.

In [None]:
###############
## Cell 10.5 ##
###############

print(model)

The encoder has the two hidden layers and the one output layer. The decoder maps the embedding to $\hat{A}$. It is called `InnerProductDecoder` because it computes the inner product $ZZ^T$. Notice that we didn't explicitly define the decoder anyplace when we defined our model. Rather, when we constructed the model using 

`model = GAE(Encoder_With_Three_Layers(num_features, out_channels))`

we implicitly defined the decoder by convention since graph autoencoders use inner product decoders by definition.

Before defining an error function, let's take a look at $ZZ^T$ and at $\sigma(ZZ^T)$ to see how close $\hat{A}$ is to $A$.

In [None]:
#############
## Cell 11 ##
#############

# Compute and inspect product of node embeddings
zzT = node_embeddings@node_embeddings.T
print(f"ZZ^T = has {zzT.shape[0]} rows and {zzT.shape[1]} columns")
print(f"ZZ^T = \n{zzT}")
print(f"Upper left corner of ZZ^T = \n{zzT[0:3,0:3]}")  # print first three rows and columns

In [None]:
#############
## Cell 12 ##
#############

import torch

# Pass through sigmoid and inspect
A_pred = torch.sigmoid(zzT)

print(f"A_pred = \n{A_pred}")
print(f"Upper left corner of A_pred = \n{A_pred[0:3,0:3]}")  # print first three rows and columns

Let's compare this to the actual adjacency matrix. $\hat{A}$ shouldn't be very close to $A$ since all we've done is convolve the one-hot feature vector through a couple of graph convolutional layers with random weights.

In [None]:
#############
## Cell 13 ##
#############

from scipy.sparse import csr_matrix
A_sparse: csr_matrix = nx.adjacency_matrix(pinwheel_graph)
A: np.ndarray = A_sparse.todense()  # convert sparse matrix to dense

print(f"A = \n{A}")
print(f"Upper left corner of A = \n{A[0:3,0:3]}")  # print first three rows and columns

Yep, not close. We want to use the difference between the actual adjacency matrix and the estimate of the adjacency matrix to update the weights in the GCNs. In other words, we want to use the difference between the actual adjacenchy matrix and the _reconstructed_ adjacency matrix as an error and then backpropagate that error to update the weights of the GCN.

**Error = Reconstruction Loss**

Now that we understand how to define the encoder and that the decoder is defined implicitly as $\sigma(ZZ^T)$ we can talk about how to tell pytorch.geometric to use the differences between $A$ and $\hat{A}$ to update the weights. 

When we constructed the model by instantiating a GAE, we inherited an error function.  The error function is called `recon_loss` because we are computing the difference between the matrix $A$ and the matrix reconstructed from the embedding through the inner product, which we denoted by $\hat{A}$.

In [None]:
print(model.recon_loss)

I like the way ChatGPT-4o described reconstruction loss, so I've cut and paste the discussion here. I'll intersperse Chat's resposne with a few observations and examples from the pinwheel network. Note that I've corrected some typos in Chat's response and changes the mathematical notation so that it matches what we've been using. Additionally, I've expounded on Chat's response inline without identifying the precise changes. (I don't know if that is good or bad practice.)

**Q: What is `model.recon_loss` defined mathematically?**

**A:**  
In the context of a **Graph Autoencoder (GAE)** in PyTorch Geometric, `model.recon_loss(z, edge_index)` computes the **reconstruction loss** based on how well the model predicts the presence or absence of edges from the embeddings matrix $Z$ which is represented as the argument `z`.

**Mathematical Definition of `recon_loss`**

Given:
- $ z \in \mathbb{R}^{n \times d} $: matrix of node embeddings
- `edge_index`: observed edges in the graph, which is equivalent to the information stored in the adjacency matrix $A$ but which matches the information in the PyG data structure `data.edge_index`
- $ \hat{A} = \sigma(Z Z^T) $: predicted adjacency probabilities (via inner product decoder)

The _reconstruction loss_ is:

$$
\mathcal{L}_{\text{recon}} = - \sum_{(i, j) \in E} \log \sigma({\mathbf z}_i {\mathbf z}_j^T) - \sum_{(i, j) \notin E} \log(1 - \sigma({\mathbf z}_i {\mathbf z}_j^T))
$$


You should recognize the inner product ${\mathbf z}_i{\mathbf z}$ and the  sigmoid operator $\sigma$.

The $\log$ function might seem weird if you haven't worked with neural networks before. It's beyond the scope of the class to talk about why it is used, but I'll say two things.
- Ask your favorite LLM "in the mathematical definition of pytorch.geometric reconstruction loss, why is the negative log used"
- We're uysing cosine similarity as the metric for _near_, so we want ${\mathbf z}_i{\mathbf z}j^T$ to be high. Since the sigmoid squashes the values to be between 0 and 1, the ideal case has $\sigma({\mathbf z}_i{\mathbf z}_j^T) = 1$. The $\log$ of 1 is zero. The log of anything strictly between 0 and 1 is negative. Notice the minus sign to the left of each summation. When 

$$ 0 < \sigma({\mathbf z}_i{\mathbf z}_j^T) < 1 $$

the negative of $\log(\sigma({\mathbf z}_i{\mathbf z}_j^T))$ is positive, making the loss big. We want to minimize this loss function, which is equivalent to trying to force $\sigma({\mathbf z}_i{\mathbf z}_j^T) = 1$.

Notice how there are two summations:
- Positive examples: real edges in the graph
- Negative examples: randomly sampled node pairs with no edge

For large graphs, there are too many edges to train on, and for large graphs that aren't dense (e.g., those that have few edges) we can swamp what can be learned about real edges with what is learned when there aren't edges. Thus, we don't use all vertex pairs that don't have edges, but rather sample them.

FYI, the function is called _binary cross-entropy loss_. I'm only pointing that out so that those who are familiar with neural networks can make a connection to other loss functions.

Before continuing, let's look at what would happen if we only used positive examples and then what would happen if we only used negative examples. 

- If we only use _positive examples_ then the easiest way to minimize the loss is to set all values of $Z$ to 1's. When passed through the sigmoid and through the inner product, $\hat{A}$ is a matrix of all 1's. It is guaranteed to have a 1 for every edge in the graph.
- If we only use _negative examples_ then the easiest way to minimize loss is to set all values of $Z$ to 0's. That makes the $1-\sigma({\mathbf z}_i{\mathbf z}_j^T)$ relatively small.

Thus, using only positive or only negative examples produces lousy estimates.

Let's write a function that returns negative edges. We'll need this function since the PyG Data structure only contains positive edges in `data.edge_index`.

In [None]:
#############
## Cell 14 ##
#############

from typing import Iterable, Tuple, Hashable

def generate_negative_samples(graph_edges: Iterable[Tuple[Hashable, Hashable]], 
                              num_negative_samples: int
                              ) -> Tensor:
    # Code from ChatGPT from prompt "how do i generate negative edge samples in pytorch"
    negative_samples = set()
    nodes = set(node for edge in graph_edges for node in edge)
    
    while len(negative_samples) < num_negative_samples:
        # Randomly sample two nodes
        node1, node2 = torch.randint(0, len(nodes), (2,))
        if (node1.item(), node2.item()) not in graph_edges and (node2.item(), node1.item()) not in graph_edges:
            negative_samples.add((node1.item(), node2.item()))
    row = torch.Tensor([node[0] for node in negative_samples])
    row = row.int()
    col = torch.Tensor([node[1] for node in negative_samples])
    col = col.int()
    
    return torch.stack([row,col], dim=0)


Let's call this function on the pinwheel graph. you can confirm that none of the edges it returns are in the graph. The function returns two tensors stacked on tope of each other. The elements of the first tensor, row 0,  are one end of an edge, and the elements of the second tensor, row 1, are the other.

In [None]:
#############
## Cell 15 ##
#############

num_neg_samples = data.edge_index.size(1)
negative_samples = generate_negative_samples(pinwheel_graph.edges,num_neg_samples)
negative_samples = torch.Tensor.tolist(negative_samples)
for i in range(num_neg_samples):
    print(f"Edge {i} not in graph is {negative_samples[0][i],negative_samples[1][i]}")

---

__Finishing the pipeline__

We know what the error function should be, so let's tell pytorch.geometric how to use the error function. We won't discuss the subtleties of using various ways of quickly minimizing error and instead use a standard function function called `Adam`. The Adam function computes loss efficiently using standard machine learning practices. It needs to know the parameters of the model so we have to tell it what they are. It also needs to know a learning rate, which is how quickly it tries to minimize error. We tell pytorch-geometric that we'll use the Adam optimizer. 

In [None]:
#############
## Cell 16 ##
#############

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

---

**Training the GAE**

We now create a training function that sets up the learning pipeline. The training function has a lot of possible variations, but we'll just use the bare minimum: encode (which calls the forward function from the encoder), compute loss, propagate the loss backward, and tell the optimizer to update weights. We'll include a default setting that doesn't use negative samples.

In [None]:
#############
## CELL 17 ##
#############

def train(data: PyGData, 
          model: GAE,
          negative_samples: Tensor,
          optimizer: torch.optim.Optimizer
          ) -> Tuple[Tensor, Tensor]:
    optimizer.zero_grad()  # Clear gradients.
    z = model.encode(data.x, data.edge_index)  # Perform a single forward pass.
    loss = model.recon_loss(z, data.edge_index, negative_samples) # Compute loss using negative sample
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss, z # Return the scalar loss and the node embeddings

Let's look at the structure.
- Clear the gradients in the model so that you start with clean information
- Get the node embedding by calling the embedding
- Compute the loss function by calling the reconstruction loss function, passing in the graph edges and some negative examples
- Compute the gradients
- Adjust the weights of the GCNs a tiny amount so that the decrease the loss function

We'll return both the loss function and the embedding. Returning the loss function allows us to see it decrease over time. Returning the embedding allows us to see how it changes over time.

---

#### Training

Now, we'll be a little sloppy. Rather than training the neural network on a subset of our data and then testing it on a different subset, repeating until the testing shows that we've learned well, we'll just run a bunch of training steps.

Let's increase the size of the embedding. Otherwise, we run into problems.

In [None]:
out_channels = 10

model = GAE(Encoder_With_Three_Layers(num_features, out_channels))
print(model)

In [None]:
#############
## CELL 18 ##
#############

# Get some negative samples
num_neg_samples = data.edge_index.size(1)
negative_samples = generate_negative_samples(pinwheel_graph.edges,num_neg_samples)
negative_samples = torch.Tensor.tolist(negative_samples)

# Set the number of baby steps we take
epochs = 200

# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
for epoch in range(1, epochs):
    loss, z = train(data, model, negative_samples, optimizer)
    ## Uncomment if you'd like to see how the embedding changes over time.
    if epoch%10==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")
        z = z.detach().numpy()
        plt.scatter(z[:, 0], z[:, 1], s=70)

Notice how the loss decreases and how the embeddings move.

__Did it work?__

Let's see how well it did reconstructing the adjacency matrix. We'll define a function that reconstructs an approximation of the adjacency matrix from the encoding.

In [None]:
#############
## CELL 19 ##
#############

from scipy.special import expit # Sigmoid function that can operate on **numpy objects**
def get_A_hat(model):
    with torch.no_grad():               # Disable gradient tracking for inference
        Z = model.encode(data.x, data.edge_index)
    Z = Z.detach().numpy()
    A_hat = Z@np.transpose(Z)
    A_hat = expit(A_hat)
    return A_hat
print(np.round(get_A_hat(model),1))

Compare to adjacency matrix

In [None]:
#############
## CELL 20 ##
#############
print(nx.linalg.adjacency_matrix(pinwheel_graph).toarray())

The two matrices are nearly identical. This means that with just two hidden layers we were able to find an encoding of the network that was "lossless", meaning that we were able to perfectly reconstruct the adjacency matrix using the encoding.

We can use the encoding to cluster similar nodes. Let's see how well this works, and then we'll break the model.

__See how well the embedding clustered__

We're using more than two dimensions for the embedding, so we'll use TSNE to compress this to two dimensions before visualizing.

In [None]:
#############
## CELL 21 ##
#############

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

def visualize(Z):
    z = TSNE(n_components=2, perplexity = 5).fit_transform(Z)
    plt.scatter(z[:, 0], z[:, 1], s=70)

Z = model.encode(data.x, data.edge_index).detach().numpy()
visualize(Z)


This shows a pretty good visual separation between node embeddings. Let's cluster on the original encoding to see which nodes belong together.

Cluster and show nodes by cluster color. Try 7 clusters first.

In [None]:
#############
## CELL 22 ##
#############

from sklearn.cluster import KMeans
kmeans = KMeans(
    init="random",
    n_clusters=7,
    n_init=10,
    random_state=1234
    )
kmeans.fit(Z)

nx.draw_networkx(pinwheel_graph, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(pinwheel_graph,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

Now try with four clusters

In [None]:
#############
## CELL 23 ##
#############

kmeans = KMeans(
    init="random",
    n_clusters=4,
    n_init=10,
    random_state=1234
    )
kmeans.fit(Z)
#print(kmeans.labels_)
plt.figure()
nx.draw_networkx(pinwheel_graph, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(pinwheel_graph,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')
_ = plt.axis('off')

The encoding produced by graph autoencoder did a good job of keeping structurally similar nodes together for both 7 clusters and 4 clusters. That's encouraging. 

---

Let's now change our model and see where it breaks. 

__Repeat but with Only an Output Layer__

Let's see what happens when we reconstruct $\hat{A}$ and what happens when we try to partition the network.

In [None]:
class Encoder_With_One_Layer(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder_With_One_Layer, self).__init__()
        self.output_layer = GCNConv(in_channels, out_channels)
    def forward(self, x, edge_index):
        z = self.output_layer(x, edge_index)
        return z

# Delete previous model and instantiate a new one
if 'model' in globals(): 
    del model

model = GAE(Encoder_With_One_Layer(num_features, out_channels))
print(model)


Nothing else needs to change. Let's train the model and inspect $\hat{A}$.

In [None]:
# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
for epoch in range(1, epochs):
    loss, z = train(data, model, negative_samples, optimizer)
    ## Uncomment if you'd like to see how the embedding changes over time.
    if epoch%10==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")
    

In [None]:
print(np.round(get_A_hat(model),1))

$\hat{A}$ isn't as close to $A$.  Why?


__Try One Hidden Layer and an Output Layer__

In [None]:
class Encoder_With_Two_Layers(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder_With_Two_Layers, self).__init__()
        self.hidden_layer_1 = GCNConv(in_channels, 4 * out_channels)
        self.output_layer = GCNConv(4 * out_channels, out_channels)
        
    def forward(self, x, edge_index):
        h1 = self.hidden_layer_1(x, edge_index)
        squashed_h1 = F.relu(h1)     
        z = self.output_layer(squashed_h1, edge_index)
        return z
if 'model' in globals(): 
    del model
model = GAE(Encoder_With_Two_Layers(num_features, out_channels))
print(model)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epochs = 200
for epoch in range(1, epochs):
    loss, z = train(data, model, negative_samples, optimizer)
    if epoch%10==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")


In [None]:
print(np.round(get_A_hat(model),2))

__Changing the Dimension of the Embedding__

Let's see what happens when we have one layer but try to make the embedding smaller. 

In [None]:
out_channels = 2
num_features = data.num_features
if 'model' in globals(): 
    del model
model = GAE(Encoder_With_One_Layer(num_features, out_channels))
print(model)

# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
for epoch in range(1, epochs):
    loss, z = train(data, model, negative_samples, optimizer)
    ## Uncomment if you'd like to see how the embedding changes over time.
    if epoch%20==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")

In [None]:
print(np.round(get_A_hat(model),2))

If we are clever we can see a structure. What if we train longer?


In [None]:
out_channels = 2
num_features = data.num_features
if 'model' in globals(): 
    del model
model = GAE(Encoder_With_One_Layer(num_features, out_channels))
print(model)

# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
epochs = 600
# Train the model
for epoch in range(1, 600):
    loss, z = train(data, model, negative_samples, optimizer)
    ## Uncomment if you'd like to see how the embedding changes over time.
    if epoch%20==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")

How'd we do?

In [None]:
print(np.round(get_A_hat(model),2))

What if the embedding has three dimensions?


In [None]:
out_channels = 3
num_features = data.num_features
if 'model' in globals(): 
    del model
model = GAE(Encoder_With_One_Layer(num_features, out_channels))
print(model)

# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
for epoch in range(1, epochs):
    loss, z = train(data, model, negative_samples, optimizer)
    ## Uncomment if you'd like to see how the embedding changes over time.
    if epoch%20==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")

In [None]:
print(np.round(get_A_hat(model),2))


What about with three hidden layers

In [None]:
out_channels = 3
num_features = data.num_features
if 'model' in globals(): 
    del model
model = GAE(Encoder_With_Three_Layers(num_features, out_channels))
print(model)
# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
for epoch in range(1, epochs):
    loss, z = train(data, model, negative_samples, optimizer)
    ## Uncomment if you'd like to see how the embedding changes over time.
    if epoch%10==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")

In [None]:
print(np.round(get_A_hat(model),2))

---
---

### Using Node Features

When we introduced GCNs, we created a set of node features such that nodes in the same community were correlated. Let's see what happens if we apply a GAE when we use node features instead of one-hot encoding. Since node features are correlated, we'd expect performance to improve.

In [None]:
pinwheel_graph: nx.Graph = nx.Graph()
for i in range(0,10):  # give each node a feature. We'll set them in a minute.
    if i in {1,2,4,5,7,8}: 
        pinwheel_graph.add_node(i, feature = [i+1,0.0,0.0])
    elif i in {0,3,6}: 
        pinwheel_graph.add_node(i, feature = [0.0,i+1,0.0])
    else: 
        pinwheel_graph.add_node(i, feature = [0.0,0.0,i+1])
pinwheel_graph.add_edges_from([(0,1),(0,2),(1,2),(3,4),(3,5),(4,5),(6,7),(6,8),(7,8),(0,9),(3,9),(6,9)])
pinwheel_graph.add_edges_from([(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)])
data: PyGData = from_networkx(pinwheel_graph)      # convert to pytorch data structure
print(f"Pinwheel graph as pytorch data structure is\n\t {data}")
print(data.feature)

Don't forget to add self loops.

In [None]:
from torch_geometric.utils import add_self_loops
data.edge_index, _ = add_self_loops(data.edge_index, num_nodes=data.num_nodes)

In [None]:
class Encoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder, self).__init__()
        self.hidden_layer_1 = GCNConv(in_channels, 4 * out_channels)
        self.hidden_layer_2 = GCNConv(4 * out_channels, 2 * out_channels)        
        self.output = GCNConv(2 * out_channels, out_channels)
    def forward(self, x, edge_index):
        h1 = self.hidden_layer_1(x, edge_index)
        squashed_h1 = F.relu(h1)
        h2 = self.hidden_layer_2(squashed_h1, edge_index)
        squashed_h2 = F.relu(h2)
        z = self.output(squashed_h2, edge_index)
        return z
# parameters
out_channels = 10
num_features = torch.Tensor.size(data.feature)[1]
print(f"There are {num_features} features")
if 'model' in globals(): 
    del model
model = GAE(Encoder(num_features, out_channels))
print(model)


In [None]:
def train_with_features(data: PyGData, 
          model: GAE,
          negative_samples: Tensor,
          optimizer: torch.optim.Optimizer
          ) -> Tuple[Tensor, Tensor]:
    optimizer.zero_grad()  # Clear gradients.
    z = model.encode(data.feature, data.edge_index)  # Perform a single forward pass.
    loss = model.recon_loss(z, data.edge_index, negative_samples)
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss, z

# The optimizer has to know the model parameters, so we pass the model parameters to it
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the model
# Loop over the number of epochs
# In each epoch, we compute the loss and update the model parameters
# We also compute the node embeddings
epochs = 200
for epoch in range(1, epochs):
    loss, z = train_with_features(data, model, negative_samples, optimizer)
    if epoch%10==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")
        

inspect the embedding

In [None]:
with torch.no_grad():               # Disable gradient tracking for inference
    Z = model.encode(data.feature, data.edge_index).detach().numpy()
visualize(Z)

inspect the approximation to the adjacency matrix. We have to redefine the function since this encoder works on the feature vector.

In [None]:
def get_A_hat_feature(model):
    with torch.no_grad():               # Disable gradient tracking for inference
        Z = model.encode(data.feature, data.edge_index)
    Z = Z.detach().numpy()
    A_hat = Z@np.transpose(Z)
    A_hat = expit(A_hat)
    return A_hat

try:
    A_hat = get_A_hat_feature(model)
    print(np.round(A_hat,1))
except Exception as e:
    print(f"get_A_hat_feature(model) failed. Error: {e}")

Create utility for gathering clusters.

In [None]:
from sklearn.cluster import KMeans
def get_clusters(Z, num_clusters = 2) -> KMeans:
    kmeans = KMeans(
        init="random",
        n_clusters= num_clusters,
        n_init=10,
        random_state=1234
        )
    return kmeans.fit(Z)


Cluster nodes and inspect. First, 7 clusters.

In [None]:
kmeans = get_clusters(Z, num_clusters = 7)
nx.draw_networkx(pinwheel_graph, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(pinwheel_graph,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

And with fewer clusters

In [None]:
kmeans = get_clusters(Z, num_clusters = 4)
#print(kmeans.labels_)
nx.draw_networkx(pinwheel_graph, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(pinwheel_graph,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

This does well. What happens if we change to a single layer given the new feature set?

In [None]:
class Encoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder, self).__init__()
        self.hidden_layer_1 = GCNConv(in_channels, 4 * out_channels)
    def forward(self, x, edge_index):
        z = self.hidden_layer_1(x, edge_index)
        return z
# parameters
out_channels = 3
num_features = torch.Tensor.size(data.feature)[1]
print(f"There are {num_features} features")
if 'model' in globals(): 
    del model
model = GAE(Encoder(num_features, out_channels))
print(model)

In [None]:
epochs = 200
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(1, epochs):
    loss, z = train_with_features(data, model, negative_samples, optimizer)
    if epoch%20==0:
        print(f"Loss in epoch {epoch} is {loss.item()}")   

with torch.no_grad():               # Disable gradient tracking for inference
    Z = model.encode(data.feature, data.edge_index).detach().numpy() 
visualize(Z)


In [None]:
try:
    A_hat = get_A_hat_feature(model)
    print(np.round(A_hat,1))
except Exception as e:
    print(f"get_A_hat_feature(model) failed. Error: {e}")

---

### Graph Autoencoder for the Karate Graph

Let's see what kind of clustering we get on the karate club graph

In [None]:
from torch_geometric.datasets import KarateClub

dataset = KarateClub()
print(f'Dataset: {dataset}:')
print('======================')
print(f'Number of graphs: {len(dataset)}')

This dataset only has one graph, so let's use the data from that graph. We'll begin by inspecting the data object.

In [None]:
data: PyGData = dataset[0]
print(data)

Each data object has x, edge_index, and y. Some have other information like the masks.  
- x contains _node features_. The first index is the number of nodes, and the second index is the number of features. The cora database which we'll explore later has 2708 nodes each with a feature vector with 1433 components. The karate graph has 34 nodes and 34 features, discussed in the next cell.
- y contains _node labels_. If our goal is to classify nodes then it helps to have some of the nodes labeled.
- edge_index has _graph connectivity_. This is essentially the edge set of the graph. The first index contains the ordered edge pair. The second index is the number of edges. Each edge is directed, which means that an undirected graph with an edge $\{u,v\}$ appears twice in the edge_index: once as $(u,v)$ and again as $(v,u)$.



Add self loops

In [None]:
from torch_geometric.utils import add_self_loops

data.edge_index, _ = add_self_loops(data.edge_index, num_nodes=data.num_nodes)

We'll use the one-hot encoding found in the karate club data.

In [None]:
for node in [0,1,2,32,33]:
    print(f'Features 0-9 of node {node}: \t{data.x[node][0:10]}')
    #print(f'Features 10-19 of node {node}: \t{data.x[node][10:20]}')
    #print(f'Features 20-29 of node {node}: \t{data.x[node][20:30]}')
    print(f'Features 30-34 of node {node}: \t{data.x[node][30:34]}')
    print('-----------------------')

Let's build a graph autoencoder for this graph. We'll ignore the class labels on the nodes. We'll use those in the next tutorial.

In [None]:
#############
## Cell 11 ##
#############

class Encoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder, self).__init__()
        self.hidden_layer_1 = GCNConv(in_channels, 8 * out_channels)
        self.hidden_layer_2 = GCNConv(8 * out_channels, 4 * out_channels)        
        self.output = GCNConv(4 * out_channels, out_channels)
    def forward(self, x, edge_index):
        h1 = self.hidden_layer_1(x, edge_index)
        h1 = F.relu(h1)
        h2 = self.hidden_layer_2(h1, edge_index)
        h2 = F.relu(h2)
        z = self.output(h2, edge_index)
        return z

In [None]:
# parameters
out_channels = 10
num_features = torch.Tensor.size(data.x)[1]
print(num_features)
if 'model' in globals(): 
    del model
model = GAE(Encoder(num_features, out_channels))
print(model)

In [None]:
# Generate negative samples using existing utility
# Match the number of negative samples to the number of positive samples.

from torch_geometric.utils import negative_sampling

# Assume data.edge_index contains your positive edges
neg_edge_index = negative_sampling(
    edge_index=data.edge_index,       # existing edges
    num_nodes=data.num_nodes,         # number of nodes
    num_neg_samples=data.edge_index.size(1)  # match # of positive samples
)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train(data: PyGData, 
          model: GAE,
          negative_samples: Tensor,
          optimizer: torch.optim.Optimizer
          ) -> Tuple[Tensor, Tensor]:
    optimizer.zero_grad()  # Clear gradients.
    z = model.encode(data.x, data.edge_index)  # Perform a single forward pass.
    loss = model.recon_loss(z, data.edge_index, negative_samples)
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    return loss, z

epochs = 1000
for epoch in range(1, epochs):
    loss, z = train(data, model, neg_edge_index, optimizer)
    if epoch%20 == 0:
        print(f"Loss in epoch {epoch} is {loss.item()}")

Inspect the embedding

In [None]:
model.eval()
#with torch.no_grad():
Z = model.encode(data.x, data.edge_index).detach().numpy()
visualize(Z)

In [None]:
kmeans = get_clusters(Z, num_clusters = 2)

G = nx.karate_club_graph()
nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

and if we look at more clusters?

In [None]:
kmeans = get_clusters(Z, num_clusters = 4)

G = nx.karate_club_graph()
nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

This is very similar to what we've seen before. What happens if we change to an embedding dimension of 2?

In [None]:
out_channels = 2
num_features = torch.Tensor.size(data.x)[1]
print(num_features)
if 'model' in globals(): del model
model = GAE(Encoder(num_features, out_channels))
print(model)

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epochs = 1000
for epoch in range(1, epochs):
    z = train(data, model, neg_edge_index, optimizer)
model.eval()
with torch.no_grad():
    Z = model.encode(data.x, data.edge_index).detach().numpy()
visualize(Z)

In [None]:
kmeans = get_clusters(Z, num_clusters = 2)

G = nx.karate_club_graph()
nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

In [None]:
kmeans = get_clusters(Z, num_clusters = 4)

#print(kmeans.labels_)
G = nx.karate_club_graph()
nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

What happens with the bare bones autoencoder?

In [None]:
class Encoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Encoder, self).__init__()
        self.output = GCNConv(in_channels, out_channels)
    def forward(self, x, edge_index):
        z = self.output(x, edge_index)
        return z
out_channels = 3
num_features = torch.Tensor.size(data.x)[1]
print(num_features)
if 'model' in globals(): del model
model = GAE(Encoder(num_features, out_channels))
print(model)

In [None]:
epochs = 1000

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(1, epochs):
    z = train(data, model, neg_edge_index, optimizer)
model.eval()
with torch.no_grad():
    Z = model.encode(data.x, data.edge_index).detach().numpy()

visualize(Z)

In [None]:
kmeans = get_clusters(Z, num_clusters = 2)

G = nx.karate_club_graph()
nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

In [None]:
kmeans = get_clusters(Z, num_clusters = 6)

G = nx.karate_club_graph()
nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G,prog="neato"),
                 node_color=kmeans.labels_,
                 node_size=800,
                 cmap='cool',
                 font_size=14,
                 font_color='white')

---
---
