# Machine Learning for Graphs
This notebook contains:
1. A very brief introduction for the library [`NetworkX`](https://networkx.github.io/).
1. An introduction to Graph Neural Networks (GNNs) with [`PyTorch`](https://pytorch.org/).
1. Some basic queries for the proprietary graph database [`Neo4j`](https://neo4j.com/).

## Introduction to NetworkX
How to handle graph data in python?

In [None]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt

print(f'NetworkX version used: {nx.__version__}')

### Load dataset 📞 ☎️

In [None]:
df = pd.read_csv('./log_of_calls.csv')
df

### Convert to NetworkX graph
It is sometimes some work to get all the edge and feature attributes into the desired format 🤬!

In [None]:
from_df = df[[c for c in df.columns if c.startswith('from_')]]
from_df.columns = [c[5:] for c in from_df.columns]
to_df = df[[c for c in df.columns if c.startswith('to_')]]
to_df.columns = [c[3:] for c in to_df.columns]
df_nodes = pd.concat((from_df, to_df), ignore_index=True)
df_nodes = df_nodes.drop(columns='dt')
df_nodes = df_nodes.drop_duplicates(subset='number')
df_nodes = df_nodes.reset_index(drop=True)
df_nodes

In [None]:
G = nx.MultiDiGraph()

G.add_nodes_from(zip(
    df_nodes.number,
    df_nodes.drop(columns='number').to_dict('records')
))
G.nodes(data=True)['403-726-6587']

The addition of edges preserves the insertion order:

In [None]:
list(G.nodes)[0]

In [None]:
G.add_edges_from(zip(
    df.from_number,
    df.to_number,
    df[['from_dt', 'to_dt']].to_dict('records')
))
list(G.edges(nbunch='403-726-6587', data=True))

### What does the Graph look like? 📊 📈 📉
As mentioned this question strongly depends on the spatial order of nodes...

In [None]:
plt.figure(figsize=(16,16))
nx.draw_circular(G, width=0.1, node_size=10)

### Some Summary Statistics 📂
Plenty of algorithms are available:

In [None]:
from IPython.display import IFrame  
IFrame('https://networkx.github.io/documentation/networkx-2.4/reference', width=1000, height=650)

In [None]:
nx.average_shortest_path_length(G)

## Graph Neural Networks (GNNs) with PyTorch (aka Message Passing 📥 📤)
Graph Neural Networks (GNNs) unraveled!

In [None]:
from collections import OrderedDict
from typing import Tuple, List

import numpy as np
import scipy.sparse as sp
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
import torch
from torch.nn import functional as F
from torch import nn
from tqdm.notebook import tqdm 

### Loading the labels 🤷‍♂️ 🤷‍♀️
_(This task is probably political correct, but the result is quite surprising!)_

In [None]:
le = LabelEncoder()
y = le.fit_transform(df_nodes['gender'].values)
y = torch.from_numpy(y)
y

### Loading the features
I.e. the one hot encoding of names:

**Observation:** 
🔎_The name is basically a unique identifier, so how should we be able to learn something (with partially labelled data)?_ 🔎

In [None]:
X = OneHotEncoder().fit_transform(df_nodes['name'].values[:, None]).toarray()
X = torch.from_numpy(X).float()
X.shape

## Convert the NetworkX graph into a sparse adjacecny matrix
Otherwise the space requirements are $O(n^2)$ with the number of nodes $n$

In [None]:
def from_networkx_to_sparse_tensor(G: nx.Graph) -> torch.Tensor:
    if hasattr(G, 'to_undirected'):
        G = G.to_undirected()
    adjacency_matrix = nx.convert_matrix.to_scipy_sparse_matrix(G)
    adjacency_matrix += sp.diags(np.ones(len(G.nodes())))
    adjacency_matrix = adjacency_matrix.tocoo()
    row_index = torch.from_numpy(adjacency_matrix.row).to(torch.long)
    col_index = torch.from_numpy(adjacency_matrix.col).to(torch.long)
    A = torch.sparse.FloatTensor(
        torch.stack([row_index, col_index], dim=0),
        torch.ones_like(row_index, dtype=torch.float)
    ).coalesce()
    return A

In [None]:
A = from_networkx_to_sparse_tensor(G)
A

### Implementation of a Graph Convolutional Network (GCN)
For the graph convolutional layer we are going to use the following update scheme:

$$𝐻^{(𝑙+1)}=\sigma\left(𝐷^{−\frac{1}{2}} 𝐴 𝐷^{−\frac{1}{2}} 𝐻^{(𝑙)} 𝑊{(𝑙)}\right)=\sigma\left(\hat{A} 𝐻^{(𝑙)} 𝑊{(𝑙)}\right)$$

We use the ReLU for the activation function, but in the last layer where we directly output the raw logits (i.e. no activation at all). With $𝐻^{(0)}$ we denote the node features.

In [None]:
class GraphConvolution(nn.Module):
    """
    Graph Convolution Layer: as proposed in [Kipf et al. 2017](https://arxiv.org/abs/1609.02907).
    
    Parameters
    ----------
    in_channels: int
        Dimensionality of input channels/features.
    out_channels: int
        Dimensionality of output channels/features.
    """

    def __init__(self, in_channels: int, out_channels: int):
        super().__init__()
        self.linear = nn.Linear(in_channels, out_channels, bias=False)

    def forward(self, arguments: Tuple[torch.tensor, torch.sparse.FloatTensor]) -> torch.tensor:
        """
        Forward method.
        
        Parameters
        ----------
        arguments: Tuple[torch.tensor, torch.sparse.FloatTensor]
            Tuple of feature matrix `X` and normalized adjacency matrix `A_hat`
            
        Returns
        ---------
        X: torch.tensor
            The result of the message passing step
        """
        X, A_hat = arguments
        X = A_hat @ self.linear(X)
        return X

In the following we stack multiple layers ($l$) (with ReLU activation functions and dropout in between). Before we pass the adjacency matrix to the GCN, we calculate the normalized adjacency matrix: 
$$\hat{A} = 𝐷^{−\frac{1}{2}} 𝐴 𝐷^{−\frac{1}{2}}$$

In [None]:
class GCN(nn.Module):
    """
    Graph Convolution Network: as proposed in [Kipf et al. 2017](https://arxiv.org/abs/1609.02907).
    
    Parameters
    ----------
    n_features: int
        Dimensionality of input features.
    n_classes: int
        Number of classes for the semi-supervised node classification.
    hidden_dimensions: List[int]
        Internal number of features. `len(hidden_dimensions)` defines the number of hidden representations.
    activation: nn.Module
        The activation for each layer but the last.
    dropout: float
        The dropout probability.
    """
    
    def __init__(self,
                 n_features: int,
                 n_classes: int,
                 hidden_dimensions: List[int] = [80],
                 activation: nn.Module = nn.ReLU(),
                 dropout: float = 0.5):
        super().__init__()
        self.n_features = n_features
        self.n_classes = n_classes
        self.hidden_dimensions = hidden_dimensions
        self.layers = nn.ModuleList()
        self.layers.extend([
            nn.Sequential(OrderedDict([
                (f'gcn_{idx}', GraphConvolution(in_channels=in_channels,
                                                out_channels=out_channels)),
                (f'activation_{idx}', activation),
                (f'dropout_{idx}', nn.Dropout(p=dropout))
            ]))
            for idx, (in_channels, out_channels)
            in enumerate(zip([n_features] + hidden_dimensions[:-1], hidden_dimensions))
        ])
        self.layers.append(
            nn.Sequential(OrderedDict([
                (f'gcn_{len(hidden_dimensions)}', GraphConvolution(in_channels=hidden_dimensions[-1],
                                                                  out_channels=n_classes))
            ]))
        )
  
    def normalize(self, A: torch.sparse.FloatTensor) -> torch.tensor:
        """
        For calculating $\hat{A} = 𝐷^{−\frac{1}{2}} 𝐴 𝐷^{−\frac{1}{2}}$.
        
        Parameters
        ----------
        A: torch.sparse.FloatTensor
            Sparse adjacency matrix with added self-loops.
            
        Returns
        -------
        A_hat: torch.sparse.FloatTensor
            Normalized message passing matrix
        """
        row, col = A._indices()
        edge_weight = A._values()
        deg = (A @ torch.ones(A.shape[0], 1, device=A.device)).squeeze()
        deg_inv_sqrt = deg.pow(-0.5)
        normalized_edge_weight = deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col]
        A_hat = torch.sparse.FloatTensor(A._indices(), normalized_edge_weight, A.shape)
        return A_hat

    def forward(self, X: torch.Tensor, A: torch.sparse.FloatTensor) -> torch.tensor:
        """
        Forward method.
        
        Parameters
        ----------
        X: torch.tensor
            Feature matrix `X`
        A: torch.tensor
            adjacency matrix `A` (with self-loops)
            
        Returns
        ---------
        X: torch.tensor
            The result of the last message passing step (i.e. the logits)
        """
        A_hat = self.normalize(A)
        for layer in self.layers:
            X = layer((X, A_hat))
        return X

### Train/Validation/Test split 🎛

In [None]:
def split(labels: np.ndarray,
          train_size: float = 0.1,
          val_size: float = 0.1,
          test_size: float = 0.8,
          random_state: int = 42) -> List[np.ndarray]:
    """Split the arrays or matrices into random train, validation and test subsets.

    Parameters
    ----------
    labels: np.ndarray [n_nodes]
        The class labels
    train_size: float
        Proportion of the dataset included in the train split.
    val_size: float
        Proportion of the dataset included in the validation split.
    test_size: float
        Proportion of the dataset included in the test split.
    random_state: int
        Random_state is the seed used by the random number generator;

    Returns
    -------
    split_train: array-like
        The indices of the training nodes
    split_val: array-like
        The indices of the validation nodes
    split_test array-like
        The indices of the test nodes

    """
    idx = np.arange(labels.shape[0])
    idx_train_and_val, idx_test = train_test_split(idx,
                                                   random_state=random_state,
                                                   train_size=(train_size + val_size),
                                                   test_size=test_size,
                                                   stratify=labels)

    idx_train, idx_val = train_test_split(idx_train_and_val,
                                          random_state=random_state,
                                          train_size=(train_size / (train_size + val_size)),
                                          test_size=(val_size / (train_size + val_size)),
                                          stratify=labels[idx_train_and_val])
    
    return idx_train, idx_val, idx_test

### The training code... 🎓

In [None]:
def train(model: nn.Module, 
          X: torch.Tensor, 
          A: torch.sparse.FloatTensor, 
          labels: torch.Tensor, 
          idx_train: np.ndarray, 
          idx_val: np.ndarray,
          lr: float = 1e-2,
          weight_decay: float = 5e-4, 
          patience: int = 50, 
          max_epochs: int = 500, 
          display_step: int = 10):
    """
    Train a model using either standard or adversarial training.
    
    Parameters
    ----------
    model: nn.Module
        Model which we want to train.
    X: torch.Tensor [n, d]
        Dense attribute matrix.
    A: torch.sparse.FloatTensor [n, n]
        Sparse adjacency matrix.
    labels: torch.Tensor [n]
        Ground-truth labels of all nodes,
    idx_train: np.ndarray [?]
        Indices of the training nodes.
    idx_val: np.ndarray [?]
        Indices of the validation nodes.
    lr: float
        Learning rate.
    weight_decay : float
        Weight decay.
    patience: int
        The number of epochs to wait for the validation loss to improve before stopping early.
    max_epochs: int
        Maximum number of epochs for training.
    display_step : int
        How often to print information.
    seed: int
        Seed
        
    Returns
    -------
    trace_train: list
        A list of values of the train loss during training.
    trace_val: list
        A list of values of the validation loss during training.
    """
    trace_train = []
    trace_val = []
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

    best_loss = np.inf
    for it in tqdm(range(max_epochs), desc='Training...'):
        logits = model(X, A)     
        loss_train = F.cross_entropy(logits[idx_train], labels[idx_train])
        loss_val = F.cross_entropy(logits[idx_val], labels[idx_val])

        optimizer.zero_grad()
        loss_train.backward()
        optimizer.step()
        
        trace_train.append(loss_train.detach().item())
        trace_val.append(loss_val.detach().item())

        if loss_val < best_loss:
            best_loss = loss_val
            best_epoch = it
            best_state = {key: value.cpu() for key, value in model.state_dict().items()}
        else:
            if it >= best_epoch + patience:
                break

        if display_step > 0 and it % display_step == 0:
            print(f'Epoch {it:4}: loss_train: {loss_train.item():.5f}, loss_val: {loss_val.item():.5f} ')

    # restore the best validation state
    model.load_state_dict(best_state)
    return trace_train, trace_val

### 🚧 Putting it all together 🚧

In [None]:
D, C = X.shape[1], y.max() + 1

gcn = GCN(n_features=D, n_classes=C, hidden_dimensions=[64])

gcn

In [None]:
idx_train, idx_val, idx_test = split(y.numpy(), train_size=0.1, val_size=0.1, test_size=0.8)

In [None]:
trace_train, trace_val = train(gcn, X, A, y, idx_train, idx_val)

plt.plot(trace_train, label='train')
plt.plot(trace_val, label='validation')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

In [None]:
gcn.eval()
logits = gcn(X, A)
accuracy = (torch.argmax(logits, dim=-1) == y)[idx_test].float().mean()
print(f'We can predict the name with an accuracy of {100*accuracy:.2f} % ' 
      f'based on non-informative features due to the graph stucture!!!')

## 🗺 Graph Databases (Neo4j)
Handle your graph data professionally!

Switch to this folder: `cd 07_graphs`

Start Neo4j server e.g. via docker (initial user: `neo4j`, pw: `neo4j`):
```bash
docker run \
    --publish=7474:7474 --publish=7687:7687 \
    --volume=$PWD/data:/data \
    --volume=$PWD/import:/import \
    --env 'NEO4JLABS_PLUGINS=["graph-data-science"]' \
    neo4j:4.1.1
```

Then connect to the Neo4j bowser via `http://localhost:7474/` (default user and pw is typically `neo4j`).

### Load Graph

```sql
MATCH (n) DETACH DELETE n;
LOAD CSV WITH HEADERS FROM 'file:///log_of_calls.csv' AS line
MERGE (c1:City { name: line.from_city })
MERGE (p1:Person { name: line.from_name, number: line.from_number, gender: line.from_gender })
MERGE (p1)-[:FROM]->(c1)
MERGE (c2:City { name: line.to_city })
MERGE (p2:Person { name: line.to_name, number: line.to_number, gender: line.to_gender })
MERGE (p2)-[:FROM]->(c2)
CREATE (p1)-[c:Calls { 
		from: datetime(line.from_dt),
		to: datetime(line.to_dt),
        duration: duration.between(datetime(line.from_dt), datetime(line.to_dt)).minutes
	}]->(p2)
```

### Visualize Graph

For example we want to have a look at all persons from `Pattaya`:
```sql
MATCH p=()-[r:FROM]->({ name: 'Pattaya' }) 
RETURN p
```
or equivalently:
```sql
MATCH p=()-[r:FROM]->(c)
WHERE c.name='Pattaya'
RETURN p
```

### Explain
Similarily to SQL, we can execute an `EXPLAIN` query for analyis of the execution plan:
```
EXPLAIN MATCH p=()-[r:FROM]->(c)
WHERE c.name='Pattaya'
RETURN p
```

### Closeness centrality
```
CALL gds.alpha.closeness.stream({
  nodeProjection: 'Person',
  relationshipProjection: 'Calls'
})
YIELD nodeId, centrality
RETURN gds.util.asNode(nodeId).name AS user, centrality
ORDER BY centrality DESC
```