# English Wikipedia Animation using First and Second order LINE
Sometimes, expecially when preparing a presentation for a conference or your work colleagues, a good animation can say more than a thousand words.

For this reason, we have prepared a straighforward way to create animations for a number of tasks using GRAPE that, through subsampling, can be executed on graphs of arbitrary size.

In this brief tutorial, we will show how to get a Directed English Wikipedia embedding using Second-order LINE, and then we will use TSNE decoposition to reduce its dimensionality and plot it into a short video.

The resulting WEBM can be converted using one of many services and can be directly incorporated in Google Slides.

## Retrieving English Wikipedia
First, we retrieve English Wikipedia:

In [1]:
%%time
from grape.datasets.wikipedia import WikiEN

english_wikipedia = WikiEN(directed=True)

CPU times: user 2h 8min 49s, sys: 34.4 s, total: 2h 9min 24s
Wall time: 29min 52s


Then, let's take a look at its graph report:

In [2]:
english_wikipedia

## Connected holdout
Since we want to visualize an edge prediction task on this graph, we need to create a connected holdout:

In [4]:
%%time
train, test = english_wikipedia.connected_holdout(train_size=0.7)
train.enable()

CPU times: user 2h 11min 48s, sys: 4.64 s, total: 2h 11min 53s
Wall time: 8min 17s


## Compute the embedding
Next, we compute the embedding using the Second-order LINE method. Do note that this implementation is a data-race aware one that uses SGD as optimizer, and nothing fancy like ADAM or NADAM: this means that the memory footprint is only limited to the embedding size.

In [5]:
from grape.embedders import SecondOrderLINEEnsmallen

In [6]:
%%time
second_order_embedding = SecondOrderLINEEnsmallen().fit_transform(train)



CPU times: user 5h 46min 54s, sys: 50.6 s, total: 5h 47min 45s
Wall time: 15min 16s


## Visualize the embedding on the test graph
We are at the end, finally visualizing the test graph.

In [8]:
%%time
from grape import GraphVisualizer

vis = GraphVisualizer(
    graph=test,
    support=train,
    n_components=4,
    edge_embedding_method="Hadamard",
    rotate=True,
    verbose=True,
    # Automatically, since LINE learns a cosine, the visualization tool
    # would dispatch a Cosine-distance based TSNE. This would use the sklearn
    # implementation, which is terribly slow. Therefore, we force it to use the Euclidean distance
    # and therefore the Multicore TSNE implementation (when available).
    decomposition_kwargs=dict(metric="euclidean")
)

Then we run the TSNE, this may take a while.

In [9]:
%%time
vis.fit_negative_and_positive_edges(second_order_embedding)

Performing t-SNE using 24 cores.
Using no_dims = 4, perplexity = 30.000000, and theta = 0.500000
Computing input similarities...
Building tree...
 - point 2000 of 20000
 - point 4000 of 20000
 - point 6000 of 20000
 - point 8000 of 20000
 - point 10000 of 20000
 - point 12000 of 20000
 - point 14000 of 20000
 - point 16000 of 20000
 - point 18000 of 20000
 - point 20000 of 20000
Done in 1.00 seconds (sparsity = 0.007598)!
Learning embedding...
Iteration 51: error is 102.865125 (50 iterations in 5.00 seconds)
Iteration 101: error is 102.725834 (50 iterations in 7.00 seconds)
Iteration 151: error is 102.540174 (50 iterations in 8.00 seconds)
Iteration 201: error is 102.513155 (50 iterations in 8.00 seconds)
Iteration 251: error is 102.511385 (50 iterations in 8.00 seconds)
Iteration 301: error is 4.877847 (50 iterations in 5.00 seconds)
Iteration 351: error is 4.600061 (50 iterations in 6.00 seconds)
Iteration 400: error is 4.445273 (50 iterations in 5.00 seconds)


CPU times: user 7min 47s, sys: 13min 20s, total: 21min 8s
Wall time: 57.4 s


Fitting performed in 52.00 seconds.


In [10]:
%%time
vis.plot_positive_and_negative_edges()

Rendering frames:   0%|                                                                                       …

OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'


Merging frames:   0%|                                                                                         …

CPU times: user 9min 35s, sys: 12.2 s, total: 9min 47s
Wall time: 1min 36s
