In [4]:
import torch
import torch_geometric

# Graph Machine Learning with Graph Neural Networks (GNNs)

Having explored network science, we are about to dive into Graph Neural Networks (GNNs). The best introduction to GNNs is a long blog post by []() entitled [A Gentle Introduction to Graph Neural Networks](https://distill.pub/2021/gnn-intro/) which the authors have _generously_ licensed under the Creative Commons. This lets me utilize their work to explain how GNNs work while providing source code along with it to bring your theoretical understanding to a practical one.

## Citation: A Gentle Introduction to Graph Neural Networks

Parts of the content in Part 4 of this course are based upon: `Sanchez-Lengeling, et al., "A Gentle Introduction to Graph Neural Networks", Distill, 2021.` This content is cited inline. Students are encouraged to read this blog post before or after class, and to reference it if they become confused about concepts in their data science and machine learning practice. 

The full list of authors is:

* [Benjamin Sanchez-Lengeling](https://research.google/people/106640/)
* [Emily Reif](https://research.google/people/106150/)
* [Adam Pearce](https://research.google/people/AdamPearce/)
* [Alexander B. Wiltschko](https://www.linkedin.com/in/alex-wiltschko-0a7b7537/)

During the course you will have access to the instructor, who understands GNNs and can elaborate further and answer any questions you may have :)

## Why is there so much talk about Graph Neural Networks?

Knowledge graphs are at the peak of the Gartner hype cycle and graph neural networks (GNNs) are soon to be high on the ramp because they tap and unlock the potential of enterprise knowledge graphs. Data lakes put data in one place, knowledge graphs link datasets together and graph neural networks automate business processes using data from across an enterprise. 



Most graph databases are fast becoming cloud-based GNN platforms:

* Neo4j → [Neo4j Graph Data Science](https://neo4j.com/product/graph-data-science/)
* TigerGraph → [Machine Learning Workbench](https://www.tigergraph.com/ml-workbench/)
* ArangoDB → [ArrangoGraphML](https://www.arangodb.com/arangodb-for-machine-learning/)
* Kumo → [SQL query the future](https://kumo.ai/)


# Notes: Extra Text

Let's wrap our dataset in a `torch_geometric` `Dataset` class.

# PyG: Pytorch Geometric aka `torch_geometric`

## Describing Graphs with PyG `Data` Classes

Entire graphs in PyG are described by `Data` objects. The simple 3-node, 2-edge graph with a single feature in the [PyG documentation](https://pytorch-geometric.readthedocs.io/en/latest/get_started/introduction.html) looks like this:

Note we have to define our edges bidirectionally.

<center><img src="images/3-node-2-edge-pyg-graph.svg" width="300px" /></center>

In [21]:
import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)
print(data)
data.validate(raise_on_error=True)

Data(x=[3, 1], edge_index=[2, 4])


True

`Data` classes can describe themselves.

In [25]:
data.keys

['x', 'edge_index']

In [36]:
print("Describing our happy little Graph :)\n")
print(f"Number of nodes: {data.num_nodes:,}")
print(f"Number of edges: {data.num_edges:,}")
print(f"Number of node features: {data.num_node_features:,}")
print(f"Has isolated nodes: {data.has_isolated_nodes()}")
print(f"Has self loops: {data.has_self_loops()}")
print(f"Is directed: {data.is_directed()}")

Describing our happy little Graph :)

Number of nodes: 3
Number of edges: 4
Number of node features: 1
Has isolated nodes: False
Has self loops: False
Is directed: False


### Directed Graph `Data`

Below we make a directed version by failing to reflect the node IDs across the diagonal of the adjacency matrix.

In [44]:
directed_data = Data(x=x, edge_index=torch.tensor([[1,1],[0,2]]))
print(directed_data)
directed_data.edge_index

Data(x=[3, 1], edge_index=[2, 2])


tensor([[1, 1],
        [0, 2]])

In [42]:
directed_data.is_directed()

True