# KGCOVID19 Animation
Sometimes, expecially when preparing a presentation for a conference or your work colleagues, a good animation can say more than a thousand words.

For this reason, we have prepared a straighforward way to create animations for a number of tasks using GRAPE that, through subsampling, can be executed on graphs of arbitrary size.

In this brief tutorial, we will show how to get a KGCOVID19 embedding using First-order LINE, and then we will use TSNE decoposition to reduce its dimensionality and plot it into a short video.

The resulting WEBM can be converted using one of many services and can be directly incorporated in Google Slides.

## Retrieving KGCOVID19
First, we retrieve KGCOVID19:

In [1]:
from grape.datasets.kghub import KGCOVID19

graph = KGCOVID19()

Then, let's take a look at its graph report:

In [2]:
graph

## Connected holdout
Since we want to visualize an edge prediction task on this graph, we need to create a connected holdout:

In [3]:
%%time
train, test = graph.connected_holdout(train_size=0.7)
train.enable()

CPU times: user 22.6 s, sys: 428 ms, total: 23 s
Wall time: 16.3 s


## Compute the embedding
Next, we compute the embedding using the First-order LINE method. Do note that this implementation is a data-race aware one that uses SGD as optimizer, and nothing fancy like ADAM or NADAM: this means that the memory footprint is only limited to the embedding size.

In [4]:
%%time
from grape.embedders import FirstOrderLINEEnsmallen
embedding = FirstOrderLINEEnsmallen().fit_transform(train)

CPU times: user 55min 34s, sys: 2.31 s, total: 55min 36s
Wall time: 2min 22s




## Visualize the embedding on the test graph
We are at the end, finally visualizing the test graph.

In [9]:
from grape import GraphVisualizer

vis = GraphVisualizer(
    graph=test,
    support=train,
    n_components=4,
    edge_embedding_method="Hadamard",
    rotate=True,
    verbose=True,
    # Automatically, since LINE learns a cosine, the visualization tool
    # would dispatch a Cosine-distance based TSNE. This would use the sklearn
    # implementation, which is terribly slow. Therefore, we force it to use the Euclidean distance
    # and therefore the Multicore TSNE implementation (when available).
    decomposition_kwargs=dict(metric="euclidean")
)

Then we run the TSNE, this may take a while.

In [10]:
%%time
vis.fit_negative_and_positive_edges(embedding)

Performing t-SNE using 24 cores.
Using no_dims = 4, perplexity = 30.000000, and theta = 0.500000
Computing input similarities...
Building tree...
 - point 2003 of 20000
 - point 4000 of 20000
 - point 6001 of 20000
 - point 8001 of 20000
 - point 10000 of 20000
 - point 14886 of 20000
 - point 15787 of 20000
 - point 16000 of 20000
 - point 18000 of 20000
 - point 20000 of 20000
Done in 1.00 seconds (sparsity = 0.006050)!
Learning embedding...
Iteration 51: error is 102.628107 (50 iterations in 5.00 seconds)
Iteration 101: error is 88.360756 (50 iterations in 7.00 seconds)
Iteration 151: error is 84.718261 (50 iterations in 7.00 seconds)
Iteration 201: error is 83.522217 (50 iterations in 6.00 seconds)
Iteration 251: error is 82.881675 (50 iterations in 7.00 seconds)
Iteration 301: error is 3.060295 (50 iterations in 6.00 seconds)
Iteration 351: error is 2.692082 (50 iterations in 6.00 seconds)
Iteration 400: error is 2.479336 (50 iterations in 7.00 seconds)


CPU times: user 6min 37s, sys: 13min 40s, total: 20min 18s
Wall time: 51.8 s


Fitting performed in 51.00 seconds.


In [13]:
%%time
vis.plot_positive_and_negative_edges()

Rendering frames:   0%|                                                                                       …

OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'


Merging frames:   0%|                                                                                         …

CPU times: user 9min 48s, sys: 12.2 s, total: 10min 1s
Wall time: 1min 37s
