# Demonstration
Here we demonstrate how these functions can simlify the process of converting a graph into a format amenable to language models.

## Loading Data
Datasets in the [graph kernel](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets#file_format) format can be loaded using the `load_graph_kernel_graph` utility function. 

Coming versions will support `node_attributes.txt`, `edge_attributes.txt`, and `graph_attributes.txt`

In [4]:
from utilities import load_graph_kernel_graph, load_graph_kernel_labels

mappings = {
    "node_labels": {
        0: "C",
        1: "N",
        2: "O",
        3: "F",
        4: "I",
        5: "Cl",
        6: "Br"
    },
    "edge_labels": {
        0: "aromatic",
        1: "single",
        2: "double",
        3: "triple"
    }
}

a = load_graph_kernel_graph(
    '../datasets/MUTAG/', mappings=mappings)

y = load_graph_kernel_labels('../datasets/MUTAG/')

Added node_labels[0]; current_vocab_size = 8
Added edge_labels; current_vocab_size = 13


## Embeddings
We can call `get_structural_signatures` on a graph to learn structural signatures, and assign them as node attributes.

In [2]:
from module import get_structural_signatures, walk_as_string

In [6]:
G, pca, kmeans = get_structural_signatures(networkXGraph=a['G'], vocab_size=a['vocab_size'])

In [7]:
G

<networkx.classes.graph.Graph at 0x1180462b0>

## Random Walks
Now that we've supplemented the graph with structural attributes, we can generate random walks.

In [8]:
walks = walk_as_string(networkXGraph=G, componentLabels=y)

Walk iteration:
('1', '/', '20')
('2', '/', '20')
('3', '/', '20')
('4', '/', '20')
('5', '/', '20')
('6', '/', '20')
('7', '/', '20')
('8', '/', '20')
('9', '/', '20')
('10', '/', '20')
('11', '/', '20')
('12', '/', '20')
('13', '/', '20')
('14', '/', '20')
('15', '/', '20')
('16', '/', '20')
('17', '/', '20')
('18', '/', '20')
('19', '/', '20')
('20', '/', '20')
[ 1  2  3  2  3  4 10  9  8  9 14  9 10  4  3  2  1  6  5  7  8  7  5  4
  3  4 10 11 12 13]
[ 1  2  3  2  1  6  1  6  5  6  1  6  5  4 10  4  3  2  1  6  5  7  8  9
 14  9  8  9 14 13]
[ 1  6  5  7  8  9 10  4 10 11 10 11 10 11 10  9  8  7  8  7  5  7  8  7
  8  7  8  7  5  6]


Note that the remaining numbers are the labels for the learned structures; working on a better way to represent this.

In [10]:
walks.walk[2256]

'0 14 8 0 15 9 0 15 9 0 15 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 15 9 0 15 8 0 16 8 0 15 9 0 15 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14 8 0 14'