# Align all and Compute for Graphs

$\textbf{Lead Author: Anna Calissano}$

Dear learner, 

the aim of the current notebook is to introduce the align all and compute as a learning method for graphs. The align all and compute allows to estimate the Frechet Mean, the Generalized Geodesic Principal Components and the Regression. In this notebook you will learn how use all the learning methods.

In [1]:
import warnings
import random

import networkx as nx

import geomstats.backend as gs

from geomstats.geometry.symmetric_matrices import MatricesMetric as SymmetricMatricesMetric
from geomstats.geometry.stratified.graph_space import (
    GraphSpace,
    GraphSpaceMetric,
)
from geomstats.learning.aac import AAC
from geomstats.learning.conformal_prediction_set import conformal_prediction_set

warnings.filterwarnings("ignore")
gs.random.seed(2020)

INFO: Using numpy backend


Let's start by creating simulated data using `networkx`.

In [2]:
graphset_1 = gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5, p=0.6, directed=True)) for i in range(10)])
graphset_2 = gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5, p=0.6, directed=True)) for i in range(100)])
graphset_3 = gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=3, p=0.6, directed=True)) for i in range(1000)])

### A primer in space, metric and aligners

The first step is to create the embedding space and the corresponding metric.

In [None]:
graph_space = GraphSpace(n_nodes=5)


graph_space_metric=GraphSpaceMetric(space=graph_space)

By default, the space comes with a total space (`Matrices`), which in turn comes equipped with a matric (`MatricesMetric`).

In [None]:
graph_space.total_space.metric

(`total_metric` can also be accessed from metric: `graph_space_metric.total_space_metric`.)

The default aligner is 'ID' (identity), which means the graphs are not permuted. To set 'FAQ', do:

In [None]:
graph_space_metric.set_aligner('FAQ')

With the FAQ alignment and the default Frobenious norm, we match two graphs and a set of graphs to a base graph:

In [None]:
graph_permuted = graph_space_metric.align_point_to_point(base_graph=graphset_1[0], graph_to_permute=graphset_1[1])

graph_space_metric.align_point_to_point(base_graph= graphset_1[0], graph_to_permute =graphset_1[1:3])

To compute the distance we can either call the distance function:

In [None]:
graph_space_metric.dist(graphset_1[0], graphset_1[1])

Or, if the matching has been already run, we can use the identity matcher in the distance, to avoid computing the matching twice:

In [None]:
graph_space_metric.set_aligner('ID')

graph_space_metric.dist(graphset_1[0], graph_permuted)

Alternatively, use can use the total space metric instead.

In [None]:
graph_space_metric.total_space_metric.dist(graphset_1[0], graph_permuted)

We can change the total space metric by doing:

In [None]:
graph_space_metric.total_space_metric = SymmetricMatricesMetric(n=5, m=5)

Or:

In [None]:
graph_space.total_space.metric = SymmetricMatricesMetric(n=5, m=5)

For the point to geodesic aligner, there's no default set. In fact, if you try something like `graph_space_metric.align_point_to_geodesic(geodesic, point)`, an (hopefully) meaningful error will be raised, explaining how to set the point to geodesic aligner.

In [None]:
graph_space_metric.set_point_to_geodesic_aligner("default", s_min=-1., s_max=1., n_points=10)

In [None]:
init_point, end_point = graph_space.random_point(2)

geodesic = graph_space_metric.geodesic(init_point, end_point)

aligned_init_point = graph_space_metric.align_point_to_geodesic(geodesic, init_point)

graph_space_metric.total_space_metric.dist(init_point, aligned_init_point)

This short introduction should be enough to set you up for experimenting with the learning algorithms on graphs.

### Frechet Mean
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.

Given $\{[X_1], \dots, [X_k]\}, [x_i] \in X/T$, we estimate the Frechet Mean using AAC consisting on two steps:
1. Compute $\hat{X}$ as arithmetic mean of $\{X_1, \dots, X_k\}, X_i \in X$ 
2. Using graph to graph alignment to find $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with $\hat{X}$

Let's instantiate the graph space and the metric, and set the aligner.

In [None]:
graph_space = GraphSpace(n_nodes=5)

graph_space_metric = GraphSpaceMetric(space=graph_space)

graph_space_metric.set_aligner('FAQ')

And now create the estimator, and fit the data.

In [None]:
aac_fm = AAC(estimate='frechet_mean', metric=graph_space_metric)

fm = aac_fm.fit(graphset_2)

fm.estimate_

In [None]:
aac_fm.aligned_X_

### Principal Components
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.

We estimate the Generalized Geodesics Principal Components Analysis (GGPCA) using AAC. Given $\{[X_1], \dots, [X_k]\}, (s_i,[X_i]) \in X/T $ we are searching for:
$$\gamma: \mathbb{R}\rightarrow X/T$$ generalized geodesic principal component capturing the majority of the variability of the dataset. The AAC for ggpca works in two steps: 

1. finding $\delta: \mathbb{R}\rightarrow X$ principal component in the set of adjecency matrices $\{X_1, \dots, X_k\}, X_i \in X$ 
2. finding $\{X_1, \dots, X_k\}, X_i \in X$ as optimally aligned with respect to $\gamma$. The estimation required a point to geodesic aligment defined in the metric.

As before:

In [None]:
graph_space = GraphSpace(n_nodes=5)

graph_space_metric = GraphSpaceMetric(space=graph_space)

graph_space_metric.set_aligner('FAQ')

For GGPCA, we also need to set the pont to geodesic aligner.

In [None]:
graph_space_metric.set_point_to_geodesic_aligner('default', s_min=0, s_max=2)

Again, create the estimator and fit the data.

In [None]:
aac_ggpca = AAC(estimate='ggpca', metric=graph_space_metric, n_components=2)

aac_ggpca.fit(graphset_3)

## Regression
Reference: Calissano, A., Feragen, A., & Vantini, S. (2022). Graph-valued regression: Prediction of unlabelled networks in a non-Euclidean graph space. Journal of Multivariate Analysis, 190, 104950.

We estimate a graph-to-value regression model to predict graph from scalar or vectors. Given $\{(s_1,[X_1]), \dots, (s_k, [X_k])\}, (s_i,[X_i]) \in \mathbb{R}^p\times X/T $ we are searching for:
$$f: \mathbb{R}^p\rightarrow X/T$$
where $f\in \mathcal{F}(X/T)$ is a generalized geodesic regression model, i.e., the canonical projection onto Graph Space of a regression line $h_\beta : \mathbb{R}^p\rightarrow X$ of the form $$h_\beta(s) = \sum_{j=1}^{p} \beta_i s_i$$
The AAC algorithm for regression combines the estimation of $h_\beta$ given $\{X_1, \dots, X_k\}, X_i \in X$
$$\sum_{i=0}^{k} d_X(h_\beta(s_i), X_i)$$
and the searching for $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with respect to the prediction along the current regression model:
$$\min_{t\in T}d_X(h_\beta(s_i),t^TX_it)$$

In [None]:
graph_space = GraphSpace(n_nodes=5)

graph_space_metric = GraphSpaceMetric(space=graph_space)
graph_space_metric.set_aligner('FAQ')

In [None]:
s = gs.array([random.randint(0,10) for i in range(10)])

In [None]:
aac_reg = AAC(estimate='regression', metric=graph_space_metric)

In [None]:
aac_reg.fit(s, graphset_1)

The coefficients are saved in the following attributes and they can be changed into a graph shape.

In [None]:
aac_reg.total_space_estimator.coef_

A graph can be predicted using the fit model and the corresponding prediction error can be computed:

In [None]:
graph_pred = aac_reg.total_space_estimator.predict(s)

gs.sum(graph_space_metric.dist(graphset_1, graph_pred))

## Conformal Prediction Sets

Reference: Calissano, A., Zeni, G., Fontana, M., Vantini, S. “Conformal Prediction Sets for Populations of Graphs.” Mox report 42, 2021.

Conformal Prediction Sets allow to compute a prediction set of a given coverage \alpha of a given prediction model on space of graphs. Conformal Prediction Sets can be applied to obtain a graph set of prediction around a predicted graph.

In [3]:
space=GraphSpace(graphset_1.shape[1])
metric=GraphSpaceMetric(space)

#### Experiment 1 - Labled Graphs and Frechet Mean Estimator
To built the sets for labled graphs, we need to set as metric aligner the identity and initialize the conformal prediction sets function giving a percentage of data on which we train the model and a percentage on which we compute the prediction set.

In [8]:
metric.set_aligner('ID')
predrule=AAC(estimate='frechet_mean', metric=metric)
conf_pred=conformal_prediction_set(space=space, metric=metric, predrule=predrule, alpha=0.05, calibration_size=0.7)

We fit the conformal prediction sets on the graphset. The function return the intervals extremes.

In [6]:
conf_pred.fit(dataset=graphset_1)

(2, 5, 5)

#### Experiment 2 - Unlabled Graphs and Frechet Mean
To built the sets for labled graphs, we need to set as metric aligner one of the matcher and initialize the conformal prediction sets function giving a percentage of data on which we train the model and a percentage on which we compute the prediction set.

In [12]:
metric.set_aligner('FAQ')
predrule=AAC(estimate='frechet_mean', metric=metric)
conf_pred=conformal_prediction_set(space=space, metric=metric, predrule=predrule, alpha=0.05, calibration_size=0.7)

In [13]:
conf_pred.fit(dataset=graphset_1)

array([[[ 0.00000000e+00,  0.00000000e+00, -4.71401187e+04,
          0.00000000e+00, -4.71397854e+04],
        [-4.71397854e+04,  0.00000000e+00, -4.71401187e+04,
         -4.71397854e+04, -4.71397854e+04],
        [-4.71397854e+04,  1.00000000e+00,  0.00000000e+00,
         -4.71397854e+04, -4.71397854e+04],
        [ 0.00000000e+00,  1.00000000e+00, -4.71397854e+04,
          0.00000000e+00,  1.00000000e+00],
        [ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00,
          1.00000000e+00,  0.00000000e+00]],

       [[ 0.00000000e+00,  0.00000000e+00,  4.71407854e+04,
          0.00000000e+00,  4.71411187e+04],
        [ 4.71411187e+04,  0.00000000e+00,  4.71407854e+04,
          4.71411187e+04,  4.71411187e+04],
        [ 4.71411187e+04,  1.00000000e+00,  0.00000000e+00,
          4.71411187e+04,  4.71411187e+04],
        [ 0.00000000e+00,  1.00000000e+00,  4.71411187e+04,
          0.00000000e+00,  1.00000000e+00],
        [ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00,
  