# Tutorials generator
This jupyter notebook contains the tooling to generate tutorials for all models available in the GRAPE library.

### Generation of tutorials on Node embedding

In [1]:
import nbformat as nbf
from grape import get_available_models_for_node_embedding
from grape.datasets import get_all_available_graphs_dataframe
from grape.utils import format_list
from grape.utils import AbstractEmbeddingModel
from tqdm.auto import tqdm
import subprocess

embedding_models_df = get_available_models_for_node_embedding()
available_graphs = get_all_available_graphs_dataframe(verbose=False)

In [None]:
for model_name, data in tqdm(embedding_models_df.groupby("model_name")):
    
    if data.requires_edge_types.any():
        continue
        
    if data.shape[0] > 1:
        note_on_multiple_implementations = """\
Of note, GRAPE provides multiple implementations of the {model_name} 
model, using as backend the {libraries} libraries, and we will show how to use all of them.""".format(
            model_name=model_name,
            libraries=format_list(data.library_name)
        )
    else:
        note_on_multiple_implementations = """\
Of note, GRAPE provides a single implementation of the {model_name} 
model, using as backend the {libraries} library.""".format(
            model_name=model_name,
            libraries=format_list(data.library_name)
        )
    
    nb = nbf.v4.new_notebook()
    
    title = "Using {model_name} to embed Cora".format(
        model_name=model_name
    )
    
    cells = []
    
    cells.append(nbf.v4.new_markdown_cell("""\
# {title}
In this tutorial, we will see how to use the {model_name}
node embedding method to compute embedding of the Cora citation graph.
We will retrieve the graph, compute its report, and the get visualizations, both
single-plot, complete and animated.

We will show how to get visualizations both using the most immediate
approach, simply using the `GraphVisualizer` object with the embedding method
name, and the slightly longer approach creating the embedding model and
computing the embedding.

{note_on_multiple_implementations}""".format(
        title=title,
        model_name=model_name,
        note_on_multiple_implementations=note_on_multiple_implementations
    )))
        
    cells.append(nbf.v4.new_markdown_cell("""\
## Installing the library
First of all, to install GRAPE just run as usual:

```bash
pip install grape
```"""
    ))
    
    cells.append(nbf.v4.new_code_cell("""!pip install -qU grape"""))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Retrieving the graph
To retrieve the Cora graph from the LINQS laboratory graph repository
we run the following line of code:"""))
    
    cells.append(nbf.v4.new_code_cell("""\
from grape.datasets.linqs import Cora
complete_graph = Cora()"""))

    cells.append(nbf.v4.new_markdown_cell("""\
Do note that we support the retrieval of many other graphs, {number_of_graphs} at the time of writing,
and you just need to import the graph you desire from the `datasets` submodule.

Remember that you can peruse the complete list of available graphs by running:""".format(
        number_of_graphs=available_graphs.shape[0]
    )))
    
    cells.append(nbf.v4.new_code_cell("""\
from grape.datasets import get_all_available_graphs_dataframe
all_available_graphs_dataframe = get_all_available_graphs_dataframe(verbose=False)
all_available_graphs_dataframe"""))
    
    cells.append(nbf.v4.new_markdown_cell("""\
Now, in the retrieved version of the Cora graph we find both nodes representing the
papers and nodes representing words present in the paper abstracts.

In most benchmarks, only the topology represented by the paper nodes is used and therefore
we run a filter that splits the graph into the topology of exclusively the paper nodes and
a dataframe containing the one-hot encoded words vectors."""))
    
    cells.append(nbf.v4.new_code_cell("""\
from grape.datasets.linqs import get_words_data
graph, node_features = get_words_data(complete_graph)"""))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Graph report
Let's proceed to compute the graph report, which includes both the ordinary topological
information, such as the number of edges and number of nodes, number of connectes components
and number of singletons, plus many topological oddities.

We will see that Cora, even though it is commonly used as a node-label prediction 
benchmark in the literature, is filled with topological oddities:
holdouts that put in the test set nodes belonging to some of these oddities may
achieve significantly higher performance than less lucky holdouts, ad models that
exploit these oddities may appear as extremely well performing while not actually
achieving nothing but overfitting to these graph peculiarities."""
    ))
    
    cells.append(nbf.v4.new_code_cell("""\
graph"""
    ))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Graph visualizations
In GRAPE we support the visualization of a wide variety of graph properties,
which may either visualized through PCA, TSNE or UMAP decompositions, as a single plot
per property or as a composite plot. Furthermore, single plots may animated through
a high-dimensional rotation that allows for a more comprensive view of the embedding
decompositions."""
    ))
    
    cells.append(nbf.v4.new_markdown_cell("""\
### Computing embedding and plotting the node types
We start by showing how to use the `GraphVisualizer` object to compute in a few lines of code
a provided node embedding method, and to use the TSNE decomposition to display the node
types both in 2D and 4D. The 4D animation is achieved by executing a 4D decomposition,
rotating the decomposition in 4D and then plot the first 3 dimensions in a 3-dimensional plot."""))
    
    for library_name in data.library_name:
        if data.shape[0] > 1:
            cells.append(nbf.v4.new_markdown_cell("""\
#### Visualization using the {} library""".format(library_name)
            ))
        model_class = AbstractEmbeddingModel.get_model_from_library(
            model_name=model_name,
            library_name=library_name
        )
        model_class_name = model_class.__name__
        cells.append(nbf.v4.new_code_cell("""\
from grape import GraphVisualizer
visualizer = GraphVisualizer(graph)

# You can either provide the model name
visualizer.fit_nodes("{model_name}", library_name="{library_name}")

# Or provide a precomputed embedding (here commented)
#
# visualizer.fit_nodes(numpy_array_with_embedding)
# visualizer.fit_nodes(pandas_dataframe_with_embedding)
#
# or alternatively provide the model to be used:
#
# from grape.embedders import {model_class_name}
# visualizer.fit_nodes({model_class_name}())
#
# And now we can visualize the node types:
visualizer.plot_node_types()""".format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )
        ))
        cells.append(nbf.v4.new_markdown_cell("""\
For the 4D animation we will be rendering a webm,
but most video formats are supported. Analogously to the bidimensional
image, we run:"""
        ))
        cells.append(nbf.v4.new_code_cell("""\
visualizer = GraphVisualizer(
    graph,
    n_components=4,
    rotate=True
)
visualizer.fit_nodes("{model_name}", library_name="{library_name}")
visualizer.plot_node_types()""".format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )
        ))
        cells.append(nbf.v4.new_markdown_cell("""\
To obtain instead the complete plot with all of the properties visualized,
you may run the following one-liner:"""
        ))
        cells.append(nbf.v4.new_code_cell("""\
GraphVisualizer(graph).fit_and_plot_all("{model_name}", library_name="{library_name}")
        """.format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Embedding the graph
Compute the embedding of the Cora graph using the {model_name} model.""".format(
        model_name=model_name,
    )
    ))
    for library_name in data.library_name:
        if data.shape[0] > 1:
            cells.append(nbf.v4.new_markdown_cell("""\
### Embedding using the {} library implementation""".format(library_name)
            ))
        
        cells.append(nbf.v4.new_code_cell("""\
from grape.embedders import {model_class_name}
embedding = {model_class_name}().fit_transform(graph)""".format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )))
    
    nb['cells'] = cells
    file_name = '{title}.ipynb'.format(
        title=title.replace(" ", "_")
    )

    with open(file_name, 'w') as f:
        nbf.write(nb, f)
    
    subprocess.run(
        "jupyter nbconvert --to notebook --execute {} --inplace".format(file_name),
        shell=True
    )

  0%|          | 0/54 [00:00<?, ?it/s]

[NbConvertApp] Converting notebook Using_BoostNE_to_embed_Cora.ipynb to notebook
OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'
[NbConvertApp] Writing 1401703 bytes to Using_BoostNE_to_embed_Cora.ipynb
[NbConvertApp] Converting notebook Using_DeepWalk_CBOW_to_embed_Cora.ipynb to notebook
OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'
[NbConvertApp] Writing 1554901 bytes to Using_DeepWalk_CBOW_to_embed_Cora.ipynb
[NbConvertApp] Converting notebook Using_DeepWalk_GloVe_to_embed_Cora.ipynb to notebook
OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'
[NbConvertApp] Writing 1625349 bytes to Using_DeepWalk_GloVe_to_embed_Cora.ipynb
[NbConvertApp] Converting notebook Using_DeepWalk_SkipGram_to_embed_Cora.ipynb to notebook
OpenCV: FFMPEG: tag 0x30387076/'vp80' is not supported with codec id 139 and format 'webm / WebM'
OpenCV: FFMPEG: tag 0x3038

In [None]:
for model_name, data in tqdm(embedding_models_df.groupby("model_name")):
    
    if data.shape[0] > 1:
        note_on_multiple_implementations = """\
Of note, GRAPE provides multiple implementations of the {model_name} 
model, using as backend the {libraries} libraries, and we will show how to use all of them.""".format(
            model_name=model_name,
            libraries=format_list(data.library_name)
        )
    else:
        note_on_multiple_implementations = """\
Of note, GRAPE provides a single implementation of the {model_name} 
model, using as backend the {libraries} library.""".format(
            model_name=model_name,
            libraries=format_list(data.library_name)
        )
    
    nb = nbf.v4.new_notebook()
    
    title = "Using {model_name} to embed STRING Species Tree".format(
        model_name=model_name
    )
    
    cells = []
    
    cells.append(nbf.v4.new_markdown_cell("""\
# {title}
In this tutorial, we will see how to use the {model_name}
node embedding method to compute embedding of the STRING Species Tree.
We will retrieve the graph, compute its report, and the get visualizations, both
single-plot, complete and animated.

We will show how to get visualizations both using the most immediate
approach, simply using the `GraphVisualizer` object with the embedding method
name, and the slightly longer approach creating the embedding model and
computing the embedding.

{note_on_multiple_implementations}""".format(
        title=title,
        model_name=model_name,
        note_on_multiple_implementations=note_on_multiple_implementations
    )))
        
    cells.append(nbf.v4.new_markdown_cell("""\
## Installing the library
First of all, to install GRAPE just run as usual:

```bash
pip install grape
```"""
    ))
    
    cells.append(nbf.v4.new_code_cell("""!pip install -qU grape"""))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Retrieving the graph
To retrieve the STRING Species Tree from the STRING repository
we run the following line of code:"""))
    
    cells.append(nbf.v4.new_code_cell("""\
from grape.datasets.string import SpeciesTree
graph = SpeciesTree()"""))

    cells.append(nbf.v4.new_markdown_cell("""\
Do note that we support the retrieval of many other graphs, {number_of_graphs} at the time of writing,
and you just need to import the graph you desire from the `datasets` submodule.

Remember that you can peruse the complete list of available graphs by running:""".format(
        number_of_graphs=available_graphs.shape[0]
    )))
    
    cells.append(nbf.v4.new_code_cell("""\
from grape.datasets import get_all_available_graphs_dataframe
all_available_graphs_dataframe = get_all_available_graphs_dataframe(verbose=False)
all_available_graphs_dataframe"""))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Graph report
Let's proceed to compute the graph report, which includes both the ordinary topological
information, such as the number of edges and number of nodes."""
    ))
    
    cells.append(nbf.v4.new_code_cell("""\
graph"""
    ))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Graph visualizations
In GRAPE we support the visualization of a wide variety of graph properties,
which may either visualized through PCA, TSNE or UMAP decompositions, as a single plot
per property or as a composite plot. Furthermore, single plots may animated through
a high-dimensional rotation that allows for a more comprensive view of the embedding
decompositions."""
    ))
    
    cells.append(nbf.v4.new_markdown_cell("""\
### Computing embedding and plotting the node types
We start by showing how to use the `GraphVisualizer` object to compute in a few lines of code
a provided node embedding method, and to use the TSNE decomposition to display the node
types both in 2D and 4D. The 4D animation is achieved by executing a 4D decomposition,
rotating the decomposition in 4D and then plot the first 3 dimensions in a 3-dimensional plot."""))
    
    for library_name in data.library_name:
        if data.shape[0] > 1:
            cells.append(nbf.v4.new_markdown_cell("""\
#### Visualization using the {} library""".format(library_name)
            ))
        model_class = AbstractEmbeddingModel.get_model_from_library(
            model_name=model_name,
            library_name=library_name
        )
        model_class_name = model_class.__name__
        cells.append(nbf.v4.new_code_cell("""\
from grape import GraphVisualizer
visualizer = GraphVisualizer(graph)

# You can either provide the model name
visualizer.fit_nodes("{model_name}", library_name="{library_name}")

# Or provide a precomputed embedding (here commented)
#
# visualizer.fit_nodes(numpy_array_with_embedding)
# visualizer.fit_nodes(pandas_dataframe_with_embedding)
#
# or alternatively provide the model to be used:
#
# from grape.embedders import {model_class_name}
# visualizer.fit_nodes({model_class_name}())
#
# And now we can visualize the node types:
visualizer.plot_node_types()""".format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )
        ))
        cells.append(nbf.v4.new_markdown_cell("""\
For the 4D animation we will be rendering a webm,
but most video formats are supported. Analogously to the bidimensional
image, we run:"""
        ))
        cells.append(nbf.v4.new_code_cell("""\
visualizer = GraphVisualizer(
    graph,
    n_components=4,
    rotate=True
)
visualizer.fit_nodes("{model_name}", library_name="{library_name}")
visualizer.plot_node_types()""".format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )
        ))
        cells.append(nbf.v4.new_markdown_cell("""\
To obtain instead the complete plot with all of the properties visualized,
you may run the following one-liner:"""
        ))
        cells.append(nbf.v4.new_code_cell("""\
GraphVisualizer(graph).fit_and_plot_all("{model_name}", library_name="{library_name}")
        """.format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )))
    
    cells.append(nbf.v4.new_markdown_cell("""\
## Embedding the graph
Compute the embedding of the STRING Species Tree using the {model_name} model.""".format(
        model_name=model_name,
    )
    ))
    for library_name in data.library_name:
        if data.shape[0] > 1:
            cells.append(nbf.v4.new_markdown_cell("""\
### Embedding using the {} library implementation""".format(library_name)
            ))
        
        cells.append(nbf.v4.new_code_cell("""\
from grape.embedders import {model_class_name}
embedding = {model_class_name}().fit_transform(graph)""".format(
            model_name=model_name,
            model_class_name=model_class_name,
            library_name=library_name
        )))
    
    nb['cells'] = cells
    file_name = '{title}.ipynb'.format(
        title=title.replace(" ", "_")
    )

    with open(file_name, 'w') as f:
        nbf.write(nb, f)
    
    subprocess.run(
        "jupyter nbconvert --to notebook --execute {} --inplace".format(file_name),
        shell=True
    )