# The Everything Bagel GNN
Welcome to the comprehensive tutorial on "The Everything Bagel GNN," a cutting-edge approach to multimodal graph neural networks (GNNs) empowered by the [GRAPE library](https://github.com/AnacletoLAB/grape). In this tutorial, we will embark on a fascinating journey, exploring the vast capabilities of this state-of-the-art GNN architecture implemented in Rust and TensorFlow, with Python bindings.

Graphs are pervasive in various domains, representing intricate relationships and connections between entities. However, traditional GNNs often struggle to effectively capture the richness and complexity of multimodal graphs that incorporate diverse node types, edge types, node embeddings, and node features.

The Everything Bagel GNN offers a comprehensive solution to leverage all available modalities. With this powerful general GNN architecture, you can seamlessly integrate different convolutional kernels, including right Laplacian, left Laplacian, and transposed Laplacian among others, enabling customized graph convolutions tailored to your specific analysis needs. [Learn more about using multiple kernels in GNNs in this paper](https://arxiv.org/pdf/2305.10498.pdf).

But that's just the beginning! We go beyond convolutional kernels and dive into the realm of multimodality. The Everything Bagel GNN enables the fusion of node type, edge type, and node embedding information, or node type, edge type, and node features, providing a holistic view of the graph's intricate relationships and properties. This comprehensive approach empowers you to extract deeper insights and uncover hidden patterns within your graph data.

Another thing that sets the Everything Bagel GNN apart is its unique integration of subgraph sketching-based edge features. Leveraging the advanced capabilities of GRAPE, you can now efficiently incorporate subgraph sketching techniques to capture rich structural information in the form of edge features. By doing so, you enhance the representation power of your GNN, enabling more accurate predictions, improved link prediction, and enhanced graph analysis capabilities. [Learn more about graph sketching in the original paper](https://openreview.net/pdf?id=m1oqEOAozQU) and [find the relative tutorial here](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Link%20prediction%20models%20using%20subgraph%20sketching%20using%20GRAPE.ipynb).


## Installing all requirements
To get started with the tutorial and use GRAPE, we need to install the necessary requirements. Please run the following command in your terminal or notebook cell to install the required packages:

```bash
pip install grape torch transformers tensorflow silence_tensorflow -qU
```

In [None]:
!pip install grape torch transformers tensorflow silence_tensorflow -qU

We use [Silence tensorflow](https://github.com/LucaCappelletti94/silence_tensorflow) to shut up the extra verbose and rather useless warnings from tensorflow.

In [None]:
import silence_tensorflow.auto

I want to only use the CPU for this tutorial, as I do not have CUDA properly configured on this machine.

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

Make sure that the version of GRAPE and TensorFlow you have installed is compatible with the ones below:

In [None]:
from grape import print_version
print_version()

## Retrieving the data
Throughout this tutorial, we will be working with two knowledge graphs available from KGOBO and KGHub. For both graph, we are going to use the associated [BioLink](https://biolink.github.io/biolink-model/) metadata as features for the node types and edge types - specifically, we are going to use the pre-computed BERT and DeepWalk embedding associated to the BioLink descriptions and topology, respectively. Subsequently. we are going to compute the Okapi TFIDF SciBERT embeddings of the node descriptions.

### Human Phenotype Ontology
The Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities and their related annotations. It provides a structured and comprehensive representation of human phenotypes, facilitating the analysis and interpretation of genetic and genomic data. [Learn more about this data here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2668030/).

HPO is by far the smaller of the two graphs, and we wil use it as example for the most complex version of the Everything Bagel. Specifically, we are going to consider its directionality, and use several graph convolution kernels to properly model it in the neural network.

In [None]:
from grape.datasets.kgobo import HP
hpo = HP(version='2023-01-27', directed=True)

We compute the graph report - please note that directed graph have a shorter report, and when converted to an undirected graph you can get the more extensive one. This happens because several algorithms become much less efficient when working on directed graphs.

In [None]:
hpo

Since there are singletons in the graph, we drop them. These nodes are most likely deprecated entities from previous versions. Also, since TensorFlow sparse vector does not support multigraphs, we need to drop the multigraph edges.

In [None]:
filtered_hpo = hpo.remove_disconnected_nodes()\
    .remove_parallel_edges()

We re-run the report and observe whether everything looks nominal:

In [None]:
filtered_hpo

### KGCOVID19
KGCOVID19 is a knowledge graph that aggregates and integrates various data sources related to the COVID-19 pandemic. [Learn more about this data here](https://www.cell.com/patterns/fulltext/S2666-3899(20)30203-8?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389920302038%3Fshowall%3Dtrue). KGCOVID is much larger than HPO, and requires more attention to scalability. To make sure we can work on this graph, we will avoid using trainable graph convolutions but, [as done in this paper](https://openreview.net/pdf?id=m1oqEOAozQU), we pre-compute the convolutions using the left laplacian. Furthermore, to reduce the dimensionality of these features, we use a [PCA Decomposition](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html).

In [None]:
%%time
from grape.datasets.kghub import KGCOVID19
kgcovid19 = KGCOVID19(version="20230102")

We compute the report of the KGCOVID19 graph:

In [None]:
kgcovid19

The graph comes with several topological oddities which we need to clean up before running some graph ML on it. For starters, we remove all singletons and smaller components, keeping only the larger one. Next, we remove parallel edges, as TensorFlow's sparse tensor representation does not support multigraphs. Finally, we remove the dendritic trees, i.e. all portions of the graph with sparsity equal to a tree.

In [None]:
%%time
filtered_kgcovid19 = kgcovid19.remove_components(top_k_components=1)\
    .remove_parallel_edges()\
    .remove_dendritic_trees()

We display the report associated to the filtered version of KGCOVID19:

In [None]:
filtered_kgcovid19

## Some helper functions
We are going to use the pre-computed BioLink [topological](https://github.com/LucaCappelletti94/kg-biolink) and [SciBERT](https://github.com/LucaCappelletti94/biolink_embedding) embeddings. As often happens with standards, there is some dis-alignment between the version used in KGCOVID19 and the one currently available from the biolink reference, so we need to normalize the two, hence the need of the two following helper functions.

In [None]:
from typing import List
import pandas as pd
from grape import Graph

def get_node_type_features(graph: Graph, df: pd.DataFrame) -> pd.DataFrame:
    """Return the node type features associated to the provided graph from the provided dataframe.
    
    Parameters
    ---------------
    graph: Graph
        The graph whose node types are to be normalized and queries
    df: pd.DataFrame
        The dataframe from which to get the node type features.
    """
    remap = {
        "rna": "RNAProduct",
        "assay": "procedure"
    }
    df = df.copy()
    df.index = [
        "".join([
            term.lower()
            for term in term.split(" ")
        ])
        for term in df.index
    ]
    df = df.loc[[
        remap.get(
            node_type_name.split(":")[1].lower(),
            node_type_name.split(":")[1].lower()
        ).lower()
        for node_type_name in graph.get_unique_node_type_names()
    ]]
    df.index = graph.get_unique_node_type_names()
    return df

In [None]:
from typing import List
import pandas as pd
from grape import Graph

def get_edge_type_features(graph: Graph, df: pd.DataFrame) -> pd.DataFrame:
    """Return the edge type features associated to the provided graph from the provided dataframe.
    
    Parameters
    ---------------
    graph: Graph
        The graph whose edge types are to be normalized and queries
    df: pd.DataFrame
        The dataframe from which to get the edge type features.
    """
    remap = {
        "positively_regulates": "increases_amount_or_activity_of",
        "negatively_regulates": "decreases_amount_or_activity_of",
        "positivelyregulates": "increases_amount_or_activity_of",
        "negativelyregulates": "decreases_amount_or_activity_of",
        "inverseof": "opposite_of",
        "subpropertyof": "member_of",
        "affectstransportof": "affects",
        "affects_transport_of": "affects",
        "increases_degradation_of": "decreased_amount_in",
        "affects_localization_of": "affects",
        "negativelyregulateprocesstoprocess": "decreases_amount_or_activity_of",
        "molecularly_interacts_with": "physically_interacts_with",
        "regulateprocesstoprocess": "affects",
    }
    df = df.copy()
    df.index = [
        "_".join([
            term.lower()
            for term in term.split(" ")
        ])
        for term in df.index
    ]
    df = df.loc[[
        remap.get(edge_type_name.split(":")[1].lower(), edge_type_name.split(":")[1].lower()).lower()
        for edge_type_name in graph.get_unique_edge_type_names()
    ]]
    df.index = graph.get_unique_edge_type_names()
    return df

We retrieve the precomputed BioLink [SciBERT](https://arxiv.org/pdf/1903.10676.pdf) embeding:

In [None]:
%%time
import pandas as pd

biolink_bert = pd.read_csv(
    "https://github.com/LucaCappelletti94/biolink_embedding/raw/main/"
    "biolink_3.4.3_allenai_scibert_scivocab_uncased.csv.gz",
    compression='gzip',
    index_col=[0]
)

biolink_bert.head()

We query the SciBERT node type features of HPO and KGCOVID19 - we can do this because the two graphs have node type and edge types that adhere to the BioLink standard:

In [None]:
%%time
node_type_bert_embedding_hpo = get_node_type_features(filtered_hpo, biolink_bert)
node_type_bert_embedding_hpo.head()

In [None]:
%%time
node_type_bert_embedding_kgcovid19 = get_node_type_features(filtered_kgcovid19, biolink_bert)
node_type_bert_embedding_kgcovid19.head()

In [None]:
%%time
edge_type_bert_embedding_hpo = get_edge_type_features(filtered_hpo, biolink_bert)
edge_type_bert_embedding_hpo.head()

In [None]:
%%time
edge_type_bert_embedding_kgcovid19 = get_edge_type_features(filtered_kgcovid19, biolink_bert)
edge_type_bert_embedding_kgcovid19.head()

Since BioLink describes the topological relationship between the various BioLink classes, we can [build a graph out of it](https://github.com/LucaCappelletti94/kg-biolink). Since we can build a graph, we can embed it. Since we can embed it, we have in addition to the textual features also topological features for all node types and edge types - Specifically, we employed [DeepWalk](https://arxiv.org/pdf/1403.6652.pdf) to embed the graph.

As done earlier, we will query the dataframe to get the precomputed DeepWalk topological features for the node types and edge types of the two graphs.

In [None]:
%%time
import pandas as pd

biolink_deepwalk = pd.read_csv(
    "https://github.com/LucaCappelletti94/kg-biolink/raw/main/"
    "kg_biolink_deepwalk_center.csv.gz",
    compression='gzip',
    index_col=[0]
)

biolink_deepwalk.head()

In [None]:
%%time
node_type_deepwalk_embedding_hpo = get_node_type_features(filtered_hpo, biolink_deepwalk)
node_type_deepwalk_embedding_hpo.head()

In [None]:
%%time
node_type_deepwalk_embedding_kgcovid19 = get_node_type_features(filtered_kgcovid19, biolink_deepwalk)
node_type_deepwalk_embedding_kgcovid19.head()

In [None]:
%%time
edge_type_deepwalk_embedding_hpo = get_edge_type_features(filtered_hpo, biolink_deepwalk)
edge_type_deepwalk_embedding_hpo.head()

In [None]:
%%time
edge_type_deepwalk_embedding_kgcovid19 = get_edge_type_features(filtered_kgcovid19, biolink_deepwalk)
edge_type_deepwalk_embedding_kgcovid19.head()

## Okapi BM25 SciBERT node features
We now proceed to compute the [Okapi BM25 SciBERT node features](https://github.com/LucaCappelletti94/pubmed_embedding/blob/main/BM25_weighted_BERT_based_embedding_of_PubMed.pdf), which are pretty much analogous to what was done for the precomputed BioLink features - Here I will show you step by step how to do it, and as you will see it is quite easy to do using the pipeline I have prepared for you.

First, we make sure that there is indeed textual data in the file associated to the nodes. In the case of both KGCOVID19 and HPO, we have decent descriptions of what the nodes represent - the more extensive, the better. We proceed to run an Okabi BM25 TFID ranking function to weight the single tokens in each of the rows, and we use the weights to compute a weighted average of the pretrained SciBERT embedding associated to each of the tokens:

In [None]:
%%time
import pandas as pd

kgcovid19_node_path = "/bfd/graphs/kghub/KGCOVID19/20230102/kg-covid-19/merged-kg_nodes.tsv"
pd.read_csv(kgcovid19_node_path, nrows=20, sep="\t").head()

In [None]:
%%time
import pandas as pd

hpo_node_path = "/bfd/graphs/kgobo/HP/2023-01-27/hp_kgx_tsv/hp_kgx_tsv_nodes.tsv"
pd.read_csv(hpo_node_path, nrows=20, sep="\t").head()

In [None]:
%%time
from grape.datasets import get_okapi_tfidf_weighted_textual_embedding

node_bert_embedding_kgcovid19 = pd.DataFrame(
    get_okapi_tfidf_weighted_textual_embedding(
        kgcovid19_node_path,
        pretrained_model_name_or_path="allenai/scibert_scivocab_uncased"
    ),
    # Since there is a row for each of the graph nodes, we need to use the complete
    # graph and not the filtered version of the graph. Afterwards, we query for the
    # subset of nodes which we actually care about.
    index=kgcovid19.get_node_names()
).loc[filtered_kgcovid19.get_node_names()]
node_bert_embedding_kgcovid19.head()

In [None]:
%%time
from grape.datasets import get_okapi_tfidf_weighted_textual_embedding

node_bert_embedding_hpo = pd.DataFrame(
    get_okapi_tfidf_weighted_textual_embedding(
        hpo_node_path,
        pretrained_model_name_or_path="allenai/scibert_scivocab_uncased"
    ),
    # Since there is a row for each of the graph nodes, we need to use the complete
    # graph and not the filtered version of the graph. Afterwards, we query for the
    # subset of nodes which we actually care about.
    index=hpo.get_node_names()
).loc[filtered_hpo.get_node_names()]
node_bert_embedding_hpo.head()

### Graph visualization
We use the graph visualization toolkit available from GRAPE to see whether the computed node features are meaningful for the current graph topologies:

In [None]:
from grape import GraphVisualizer

GraphVisualizer(filtered_hpo).fit_and_plot_all(node_bert_embedding_hpo)

In [None]:
from grape import GraphVisualizer

GraphVisualizer(filtered_kgcovid19).fit_and_plot_all(node_bert_embedding_kgcovid19)

### PCA dimensionality reduction
Since the BERT features have a rather large dimension (768) and the KGCOVID19 graph is not a small graph, we reduce the dimensionality of the features using [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) so to make it feaseable to use these features even on more modest hardware. We will reduce the dimensionality of the features down to `50`. This step is not necessary if your hardware allows to use larger features.


In [None]:
%%time
from sklearn.decomposition import PCA

pca = PCA(n_components=50)
node_bert_embedding_kgcovid19_pca = pd.DataFrame(
    pca.fit_transform(node_bert_embedding_kgcovid19),
    index=node_bert_embedding_kgcovid19.index
)

In [None]:
node_bert_embedding_kgcovid19_pca.head()

We visualize again the node features after the PCA procedure, so to make sure we did not destroy to much information with the dimensionality reduction:

In [None]:
%%time
from grape import GraphVisualizer

GraphVisualizer(filtered_kgcovid19).fit_and_plot_all(node_bert_embedding_kgcovid19_pca)

Since the number of node types and edge types is actually less than the number of target components, we are going to run the PCA on the full set of node type and edge type features, and query them afterwards:

In [None]:
%%time
from sklearn.decomposition import PCA

pca = PCA(n_components=50)

biolink_bert_pca = pd.DataFrame(
    pca.fit_transform(biolink_bert),
    index=biolink_bert.index
)

biolink_deepwalk_pca = pd.DataFrame(
    pca.fit_transform(biolink_deepwalk),
    index=biolink_deepwalk.index
)

node_type_bert_embedding_kgcovid19_pca = get_node_type_features(filtered_kgcovid19, biolink_bert_pca)
node_type_deepwalk_embedding_kgcovid19_pca = get_node_type_features(filtered_kgcovid19, biolink_deepwalk_pca)
edge_type_bert_embedding_kgcovid19_pca = get_edge_type_features(filtered_kgcovid19, biolink_bert_pca)
edge_type_deepwalk_embedding_kgcovid19_pca = get_edge_type_features(filtered_kgcovid19, biolink_deepwalk_pca)

In [None]:
node_type_bert_embedding_kgcovid19_pca.head()

In [None]:
node_type_deepwalk_embedding_kgcovid19_pca.head()

In [None]:
edge_type_bert_embedding_kgcovid19_pca.head()

In [None]:
edge_type_deepwalk_embedding_kgcovid19_pca.head()

### HERE WE START TO USE THE GRAPHS TOPOLOGY!
Please pay attention that here we are about to start use the graph topology. For this reason, at this point, we execute the graph holdouts that split the training and test edges. Until now, we did not use the graph topology at any point, so we have not introduced any bias while during the previous operations. [We are going to create connected holdouts - you can learn more about graph holdouts in this previous tutorial](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Graph_holdouts_using_GRAPE.ipynb).

In [None]:
%%time
train_kgcovid19, test_kgcovid19 = filtered_kgcovid19.connected_holdout(
    train_size=0.8,
    random_state=4435,
)
train_kgcovid19 = train_kgcovid19.add_selfloops(edge_type_name="biolink:related_to")

In [None]:
%%time
train_hpo, test_hpo = filtered_hpo.connected_holdout(
    train_size=0.8,
    random_state=4435,
)
train_hpo = train_hpo.add_selfloops(edge_type_name="biolink:related_to")

### Precomputing KGCOVID19 graph convolution
As aforementioned, for the KGCOVID19 graph instead than using trainable graph convolution layers, we are going to pre-compute `3` convolution steps, [as described in this paper](https://openreview.net/pdf?id=m1oqEOAozQU). We will emulate residual layers, by concatenating the result of each convolution and obtaining therefore a `50*(3+1) = 200` dimensional set of node features. Note that **we are using the training graph only, and not the test graph**.

In [None]:
%%time
from grape.feature_preprocessors import GraphConvolution

conv = GraphConvolution(
    number_of_convolutions=3,
    concatenate_features=True
)

conv_node_bert_embedding_kgcovid19_pca = pd.DataFrame(
    conv.transform(
        support=train_kgcovid19,
        node_features=node_bert_embedding_kgcovid19_pca.values
    )[0],
    index=node_bert_embedding_kgcovid19_pca.index
)

conv_node_bert_embedding_kgcovid19_pca.head()

Again, we visualize the node features after the procedure, to see if it has changed them and how:

In [None]:
%%time
from grape import GraphVisualizer

GraphVisualizer(train_kgcovid19,).fit_and_plot_all(conv_node_bert_embedding_kgcovid19_pca)

## Subgraph sketching
Subgraph sketching is a technique in graph analysis that captures the local connectivity patterns of subgraphs within a larger graph. By leveraging HyperLogLog counters, subgraph sketching allows for efficient estimation of neighbours intersections at different distances. We use it to compute edge embeddings.

[Learn more in its dedicated tutorial](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Link%20prediction%20models%20using%20subgraph%20sketching%20using%20GRAPE.ipynb) and [the original paper presenting it](https://openreview.net/pdf?id=m1oqEOAozQU).


The gist of it, is that we will compute the cardinality of the sets highlighted in the following pictures using [efficient HyperLogLog counters](https://github.com/LucaCappelletti94/hyperloglog-rs). The vector of cardinalities will be the sketch representing a given tuple of nodes.

![triple overlap](https://github.com/LucaCappelletti94/hyperloglog-rs/blob/main/triple_overlap.png?raw=true)

Do note that we do not precompute the sketching features for all possible edges, as that would never fit in memory, but we compute them on stream as the model is trained.

As done earlier, we start by displaying these features. Do note that since these are ONLY EDGE FEATURES the visualization will not include any plot that involves the nodes, but only the edges.

In [None]:
from grape.embedders import HyperSketching

GraphVisualizer(train_hpo).fit_and_plot_all(HyperSketching(
    # We execute six hops of intersections.
    number_of_hops=6
))

In [None]:
from grape.embedders import HyperSketching

GraphVisualizer(test_hpo, support=train_hpo).fit_and_plot_all(HyperSketching(
    # We execute six hops of intersections.
    number_of_hops=6
))

In [None]:
from grape.embedders import HyperSketching

GraphVisualizer(train_kgcovid19).fit_and_plot_all(HyperSketching(
    # We execute six hops of intersections.
    number_of_hops=6
))

In [None]:
from grape.embedders import HyperSketching

GraphVisualizer(test_kgcovid19, support=train_kgcovid19).fit_and_plot_all(HyperSketching(
    # We execute six hops of intersections.
    number_of_hops=6
))

## Composing the Bagels
Now that we have all ingredients ready, we can proceed to compose our bagels - in fact we are going to create two distinct models, again following an approach similar to what was described in [this paper](https://openreview.net/pdf?id=m1oqEOAozQU). We are going to create a first model which involves trainable graph convolutions for the smaller HPO graph, and another model that uses the precomputed graph convolutions for the larger KGCOVID19 graph.

![The Everything Bagel GNN](https://github.com/AnacletoLAB/grape/blob/main/images/bagel.png?raw=true)

First thing first, we import the two base models from grape.

In [None]:
from grape.edge_prediction import GCNEdgePrediction, GNNEdgePrediction

We start with the more complex of the two - the one including convolutional layers. Since the HPO graph includes the important aspect of directionality, we are going to [follow what is suggested in this paper](https://arxiv.org/pdf/2305.10498.pdf) and include the convolution kernels in both directions.

In [None]:
model = GCNEdgePrediction(
    epochs=10,
    number_of_units_per_graph_convolution_layers = 32,
    number_of_units_per_ffnn_body_layer = 32,
    number_of_units_per_ffnn_head_layer = 16,
    # We use the two aforementioned kernels
    kernels=["Symmetric Normalized Laplacian", "Transposed Symmetric Normalized Laplacian"],
    dropout_rate=0.7,
    # Enable the use of edge metrics as part of the input features, which include:
    # - Adamic Adar
    # - Jaccard Coefficient
    # - Resource allocation index
    # - Preferential attachment
    use_edge_metrics=True,
    # We enable the use of residual graph convolution layers
    residual_convolutional_layers=False,
    # And the use of a node embedding layer, to allow the network to learn
    # its own representation of the nodes - btw you can reuse this
    # for other tasks if you want.
    use_node_embedding=True,
    # And the use of a node type embedding layer, to allow the network to learn
    # its own representation of the node types - btw you can reuse this
    # for other tasks if you want.
    use_node_type_embedding=True,
    # And the use of a edge type embedding layer, to allow the network to learn
    # its own representation of the node types - btw you can reuse this
    # for other tasks if you want.
    use_edge_type_embedding=True,
    # To combine the node features into edge representation we use two approaches:
    # concatenation and hadamard product - though more methods are possible and available.
    edge_embedding_methods=["Concatenate", "Hadamard"],
    # We add the names of the node, node type and edge type features, which solely serve
    # to help the visualization of the model and make it a bit clearer.
    node_feature_names = ["SciBERT Nodes"],
    node_type_feature_names = ["SciBERT Node Types", "DeepWalk Node Types"], 
    edge_type_feature_names = ["SciBERT Edge Types", "DeepWalk Edge Types"], 
    verbose=True
)

We fit the HyperSketching features for the HPO graph - note we are using the training graph.

In [None]:
from grape.embedders import HyperSketching

hpo_hyper_sketching = HyperSketching(
    number_of_hops=2,
)
hpo_hyper_sketching.fit(train_hpo)

In [None]:
model.compile(
    graph=train_hpo,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_hpo,
    node_features=[node_bert_embedding_hpo],
    node_type_features=[
        node_type_bert_embedding_hpo,
        node_type_deepwalk_embedding_hpo
    ],
    edge_type_features=[
        edge_type_bert_embedding_hpo,
        edge_type_deepwalk_embedding_hpo
    ],
    #edge_features=[hpo_hyper_sketching]
)

In [None]:
model.plot()

In [None]:
model.fit(
    graph=train_hpo,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_hpo,
    node_features=[node_bert_embedding_hpo],
    node_type_features=[
        node_type_bert_embedding_hpo,
        node_type_deepwalk_embedding_hpo
    ],
    edge_type_features=[
        edge_type_bert_embedding_hpo,
        edge_type_deepwalk_embedding_hpo
    ],
    edge_features=[hpo_hyper_sketching]
)

In [None]:
model.predict_proba(
    graph=train_hpo,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_hpo,
    node_features=[node_bert_embedding_hpo],
    node_type_features=[
        node_type_bert_embedding_hpo,
        node_type_deepwalk_embedding_hpo
    ],
    edge_type_features=[
        edge_type_bert_embedding_hpo,
        edge_type_deepwalk_embedding_hpo
    ],
    edge_features=[hpo_hyper_sketching],
    return_predictions_dataframe=True,
)

In [None]:
model.predict_proba(
    graph=test_hpo,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_hpo,
    node_features=[node_bert_embedding_hpo],
    node_type_features=[
        node_type_bert_embedding_hpo,
        node_type_deepwalk_embedding_hpo
    ],
    edge_type_features=[
        edge_type_bert_embedding_hpo,
        edge_type_deepwalk_embedding_hpo
    ],
    edge_features=[hpo_hyper_sketching],
    return_predictions_dataframe=True
)

In [None]:
model = GNNEdgePrediction(
    epochs=10,
    batch_size=2**17,
    number_of_units_per_body_layer=32,
    number_of_units_per_head_layer=16,
    # Enable the use of edge metrics as part of the input features, which include:
    # - Adamic Adar
    # - Jaccard Coefficient
    # - Resource allocation index
    # - Preferential attachment
    use_edge_metrics=True,
    # And the use of a node embedding layer, to allow the network to learn
    # its own representation of the nodes - btw you can reuse this
    # for other tasks if you want.
    use_node_embedding=True,
    # And the use of a node type embedding layer, to allow the network to learn
    # its own representation of the node types - btw you can reuse this
    # for other tasks if you want.
    #use_node_type_embedding=True,
    # And the use of a edge type embedding layer, to allow the network to learn
    # its own representation of the node types - btw you can reuse this
    # for other tasks if you want.
    use_edge_type_embedding=True,
    # To combine the node features into edge representation we use two approaches:
    # concatenation and hadamard product - though more methods are possible and available.
    edge_embedding_methods=["Concatenate", "Hadamard"],
    # We add the names of the node, node type and edge type features, which solely serve
    # to help the visualization of the model and make it a bit clearer.
    node_feature_names = ["SciBERT Nodes"],
    node_type_feature_names = ["SciBERT Node Types", "DeepWalk Node Types"], 
    edge_type_feature_names = ["SciBERT Edge Types", "DeepWalk Edge Types"], 
    verbose=True
)

In [None]:
from grape.embedders import HyperSketching

kgcovid19_hyper_sketching = HyperSketching(
    number_of_hops=3,
    precision=8,
    bits=6
)
kgcovid19_hyper_sketching.fit(train_kgcovid19)

In [None]:
model.compile(
    graph=train_kgcovid19,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_kgcovid19,
    node_features=[conv_node_bert_embedding_kgcovid19_pca],
    node_type_features=[
        node_type_bert_embedding_kgcovid19_pca,
        node_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_type_features=[
        edge_type_bert_embedding_kgcovid19_pca,
        edge_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_features=[kgcovid19_hyper_sketching]
)

In [None]:
model.plot()

In [None]:
model.fit(
    graph=train_kgcovid19,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_kgcovid19,
    node_features=[conv_node_bert_embedding_kgcovid19_pca],
    node_type_features=[
        node_type_bert_embedding_kgcovid19_pca,
        node_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_type_features=[
        edge_type_bert_embedding_kgcovid19_pca,
        edge_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_features=[kgcovid19_hyper_sketching]
)

In [None]:
model.predict_proba(
    graph=train_kgcovid19,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_kgcovid19,
    node_features=[conv_node_bert_embedding_kgcovid19_pca],
    node_type_features=[
        node_type_bert_embedding_kgcovid19_pca,
        node_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_type_features=[
        edge_type_bert_embedding_kgcovid19_pca,
        edge_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_features=[kgcovid19_hyper_sketching]
)

In [None]:
model.predict_proba(
    graph=test_kgcovid19,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=train_kgcovid19,
    node_features=[conv_node_bert_embedding_kgcovid19_pca],
    node_type_features=[
        node_type_bert_embedding_kgcovid19_pca,
        node_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_type_features=[
        edge_type_bert_embedding_kgcovid19_pca,
        edge_type_deepwalk_embedding_kgcovid19_pca
    ],
    edge_features=[kgcovid19_hyper_sketching]
)

## Future directions
In this tutorial we have presented the extremely multi-modal Everything Bagel GNN, though a problem has clearly surfaced - while the model is capable of ingesting many features, it also comes with many free parameters to tune. How can we do that? The typical solutions is hyperparameters optimization. We will explore solutions such as Bayesian Optimization using [the Ray library](https://github.com/ray-project/ray) in the next tutorial!