# KGMicrobe Animation
Sometimes, expecially when preparing a presentation for a conference or your work colleagues, a good animation can say more than a thousand words.

For this reason, we have prepared a straighforward way to create animations for a number of tasks using GRAPE that, through subsampling, can be executed on graphs of arbitrary size.

In this brief tutorial, we will show how to get a KGMicrobe embedding using DeepWalk and Walklets-based GloVe embedding, and then we will use TSNE decoposition to reduce its dimensionality and plot it into a short video.

The resulting WEBM can be converted using one of many services and can be directly incorporated in Google Slides.

## Retrieving KGMicrobe
First, we retrieve KGMicrobe:

In [1]:
from grape.datasets.kghub import KGMicrobe

graph = KGMicrobe()

Then, let's take a look at its graph report:

In [2]:
graph

Since the graph contains disconnected nodes and such nodes cannot be meaningfully embedded using topological embedding, we drop them:

In [3]:
%%time
graph = graph.remove_disconnected_nodes()

CPU times: user 1.45 s, sys: 149 ms, total: 1.6 s
Wall time: 121 ms


## Compute the embedding
Next, we compute the embedding using the DeepWalk and Walklets sampling-based GloVe. Do note that this implementation is a data-race aware one that uses SGD as optimizer, and nothing fancy like ADAM or NADAM: this means that the memory footprint is only limited to the embedding size.

By default, in GRAPE we provide quite an heavy duty parametrization. For this relatively limited visualization, we can reduce it so to get the embedding a bit more quickly.

In [4]:
from grape.embedders import DeepWalkGloVeEnsmallen, WalkletsGloVeEnsmallen

In [5]:
%%time
deepwalk_embedding = DeepWalkGloVeEnsmallen(
    epochs=10,
    iterations=1,
    walk_length=128,
    window_size=5
).fit_transform(graph)

CPU times: user 29min 18s, sys: 2.6 s, total: 29min 21s
Wall time: 1min 14s


In [6]:
%%time
walklets_embedding = WalkletsGloVeEnsmallen(
    epochs=10,
    iterations=1,
    walk_length=128,
    window_size=5
).fit_transform(graph)

CPU times: user 48min 25s, sys: 6.11 s, total: 48min 31s
Wall time: 2min 5s


## Visualize the embedding on the test graph
We are at the end, finally visualizing the test graph.

In [7]:
from grape import GraphVisualizer

vis = GraphVisualizer(
    graph=graph,
    n_components=4,
    edge_embedding_method="Hadamard",
    rotate=True,
    verbose=True,
    # Automatically, since GloVe learns a cosine, the visualization tool
    # would dispatch a Cosine-distance based TSNE. This would use the sklearn
    # implementation, which is terribly slow. Therefore, we force it to use the Euclidean distance
    # and therefore the Multicore TSNE implementation (when available).
    decomposition_kwargs=dict(metric="euclidean")
)

Then we run the TSNE, this may take a while.

In [8]:
%%time
vis.fit_nodes(deepwalk_embedding)

Performing t-SNE using 24 cores.
Using no_dims = 4, perplexity = 30.000000, and theta = 0.500000
Computing input similarities...
Building tree...
 - point 2004 of 20000
 - point 4000 of 20000
 - point 6001 of 20000
 - point 8002 of 20000
 - point 10000 of 20000
 - point 12000 of 20000
 - point 14000 of 20000
 - point 16000 of 20000
 - point 18000 of 20000
 - point 20000 of 20000
Done in 0.00 seconds (sparsity = 0.006059)!
Learning embedding...
Iteration 51: error is 97.716087 (50 iterations in 7.00 seconds)
Iteration 101: error is 79.317347 (50 iterations in 9.00 seconds)
Iteration 151: error is 71.286876 (50 iterations in 10.00 seconds)
Iteration 201: error is 67.740619 (50 iterations in 9.00 seconds)
Iteration 251: error is 65.720989 (50 iterations in 10.00 seconds)
Iteration 301: error is 2.242787 (50 iterations in 9.00 seconds)
Iteration 351: error is 1.860831 (50 iterations in 10.00 seconds)
Iteration 400: error is 1.618086 (50 iterations in 10.00 seconds)


CPU times: user 9min 6s, sys: 19min 52s, total: 28min 58s
Wall time: 1min 14s


Fitting performed in 74.00 seconds.


In [9]:
%%time
vis.plot_node_types()

Rendering frames:   0%|                                                                                       …

OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'


Merging frames:   0%|                                                                                         …

CPU times: user 9min 48s, sys: 12.5 s, total: 10min
Wall time: 1min 29s


In [10]:
%%time
vis.fit_nodes(walklets_embedding)

Performing t-SNE using 24 cores.
Using no_dims = 4, perplexity = 30.000000, and theta = 0.500000
Computing input similarities...
Building tree...
 - point 2001 of 20000
 - point 9907 of 20000
 - point 9910 of 20000
 - point 10000 of 20000
 - point 13265 of 20000
 - point 15783 of 20000
 - point 15798 of 20000
 - point 16000 of 20000
 - point 18000 of 20000
 - point 20000 of 20000
Done in 0.00 seconds (sparsity = 0.006526)!
Learning embedding...
Iteration 51: error is 98.916224 (50 iterations in 7.00 seconds)
Iteration 101: error is 82.094338 (50 iterations in 9.00 seconds)
Iteration 151: error is 75.686108 (50 iterations in 9.00 seconds)
Iteration 201: error is 72.951000 (50 iterations in 9.00 seconds)
Iteration 251: error is 71.411492 (50 iterations in 9.00 seconds)
Iteration 301: error is 2.449860 (50 iterations in 9.00 seconds)
Iteration 351: error is 2.044232 (50 iterations in 10.00 seconds)
Iteration 400: error is 1.789491 (50 iterations in 10.00 seconds)


CPU times: user 9min 10s, sys: 19min 14s, total: 28min 24s
Wall time: 1min 13s


Fitting performed in 72.00 seconds.


In [None]:
%%time
vis.plot_node_types()

Rendering frames:   0%|                                                                                       …