# Graph Auto-Encoder (GAE)

This file is for design purposes - will first try it with one dataset and one way of representing, and then move on to other datasets and ways of representing it. Follows: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/autoencoder.py as much as possible.

**Note: I'm trying to work with sparse matrices as much as possible. If this is not possible, I will need to change back to dense matrices. But for memory efficiency, sparse matrices are much better.**

## Import packages

In [40]:
# import packages
import torch
import torch_geometric.transforms as trans
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv
from torch_geometric.utils import train_test_split_edges

# import python files
import reading_data

## Read in data

In [3]:
# for the AIFB+, filtered, not-relational, sparse
adjacency_matrix, mapping_index_to_node, mapping_entity_to_index = reading_data.create_adjacency_matrix_nt("data/aifb/aifb+.nt", literal_representation="filtered", sparse=True)
number_nodes = adjacency_matrix.size()[0]

In [15]:
adjacency_matrix.coalesce().indices()

tensor([[   0,    0,    0,  ..., 2833, 2834, 2834],
        [ 472, 1091, 1195,  ..., 2767,  472,  781]])

## Create feature matrix (one-hot)

In [9]:
feature_matrix = torch.sparse_coo_tensor(indices=torch.tensor([list(range(number_nodes)), list(range(number_nodes))]), values=torch.ones(number_nodes),size=(number_nodes, number_nodes))

In [10]:
feature_matrix

tensor(indices=tensor([[   0,    1,    2,  ..., 2832, 2833, 2834],
                       [   0,    1,    2,  ..., 2832, 2833, 2834]]),
       values=tensor([1., 1., 1.,  ..., 1., 1., 1.]),
       size=(2835, 2835), nnz=2835, layout=torch.sparse_coo)

## Create a data object

In [51]:
data_aifb = Data(x=feature_matrix, edge_index=adjacency_matrix.coalesce().indices(), num_nodes=number_nodes)

In [52]:
data_aifb

Data(x=[2835, 2835], edge_index=[2, 20338], num_nodes=2835)

In [53]:
data_aifb.is_directed()

True

In [54]:
data_aifb.num_node_features

2835

Split it into three:

In [55]:
from torch_geometric.transforms import RandomLinkSplit

transform = RandomLinkSplit(add_negative_train_samples=False)
train_data, val_data, test_data = transform(data_aifb)

In [56]:
train_data

Data(x=[2835, 2835], edge_index=[2, 14238], num_nodes=2835, edge_label=[14238], edge_label_index=[2, 14238])

In [57]:
val_data

Data(x=[2835, 2835], edge_index=[2, 14238], num_nodes=2835, edge_label=[4066], edge_label_index=[2, 4066])

In [58]:
test_data

Data(x=[2835, 2835], edge_index=[2, 16271], num_nodes=2835, edge_label=[8134], edge_label_index=[2, 8134])

Later on, this can be expanded, by, for example, adding labels and training, validation, and testing masks. For only the GAE, though, this is not needed.

## Create the GAE

## Run a training loop

## Analysis of results