<a href="https://colab.research.google.com/github/AnacletoLAB/grape/blob/main/tutorials/High_performance_graph_algorithms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# High performance graph algorithms
A number of high performance algorithms have been implemented in Ensmallen, a considerable portion of which is an implementation of algorithms described in the literature by [David Bader](https://davidbader.net/), who we thank for his contribution to the field of graph algorithms.

See below for the algorithms available in Ensmallen.

Note that all of these algorithms are highly parallel implementations, and these benchmarks are being run on COLAB which typically provides virtual machines with a very small number of cores: on a machine with a reasonable number of cores they will execute much faster.

To install the GraPE library run:

```bash
pip install grape
```

To install exclusively the Ensmallen module, which may be useful when the TensorFlow dependency causes problems, do run:

```bash
pip install ensmallen
```

In [1]:
! pip install -q ensmallen

## Retrieving a graph to run the sampling on
In this tutorial we will run samples on one of the graph from the ones available from the automatic graph retrieval of Ensmallen, namely the [Homo Sapiens graph from STRING](https://string-db.org/cgi/organisms). If you want to load a graph from an edge list, just follow the examples provided from the [add reference to tutorial].

In [2]:
from ensmallen.datasets.string import HomoSapiens

Retrieving and loading the graph

In [3]:
graph = HomoSapiens()

We compute the graph report:

In [4]:
graph

Enable the speedups

In [5]:
graph.enable()

## Random Spanning arborescence
The spanning arborescence algorithm computes a set of edges, an [Arborescence](https://en.wikipedia.org/wiki/Arborescence_(graph_theory)), that is spanning, i.e cover all the nodes in the graph.

This is an implementation of [A fast, parallel spanning tree algorithm for symmetric multiprocessors
(SMPs)](https://davidbader.net/publication/2005-bc/2005-bc.pdf).

In [6]:
%%time
spanning_arborescence_edges = graph.spanning_arborescence()

CPU times: user 132 ms, sys: 211 µs, total: 132 ms
Wall time: 73.5 ms


## Connected components
The [connected components](https://en.wikipedia.org/wiki/Component_(graph_theory)) of a graph are the set of nodes connected one another by edges.

In [7]:
%%time
(
    connected_component_ids,
    number_of_connected_components,
    minimum_component_size,
    maximum_component_size
) = graph.connected_components()

CPU times: user 240 ms, sys: 123 µs, total: 240 ms
Wall time: 127 ms


## Diameter
The following is an implementation of [On computing the diameter of real-world undirected graphs](https://who.rocq.inria.fr/Laurent.Viennot/road/papers/ifub.pdf).

In [8]:
%%time
diameter = graph.get_diameter(ignore_infinity=True)

CPU times: user 7.61 s, sys: 16.2 ms, total: 7.63 s
Wall time: 3.99 s


Note that most properties that boil down to a single value once computed are stored in a cache structure, so recomputing the diameter once it is done takes a significant smaller time.

In [9]:
%%time
diameter = graph.get_diameter(ignore_infinity=True)

CPU times: user 13 µs, sys: 1 µs, total: 14 µs
Wall time: 16.5 µs


## Clustering coefficient and triangles
This is an implementation of [Faster Clustering Coefficient Using Vertex Covers](https://davidbader.net/publication/2013-g-ba/2013-g-ba.pdf), proving the average clustering coefficient, the total number of triangles and the number of triangles per node.

In [11]:
%%time
graph.get_number_of_triangles()

CPU times: user 6min 10s, sys: 1 s, total: 6min 11s
Wall time: 3min 8s


798817778

In [12]:
%%time
graph.get_number_of_triangles_per_node()

CPU times: user 6min 26s, sys: 1.07 s, total: 6min 27s
Wall time: 3min 16s


array([189134,  27611, 256033, ...,      0,      0,   2014], dtype=uint32)

In [14]:
%%time
graph.get_average_clustering_coefficient()

CPU times: user 6min 27s, sys: 1.1 s, total: 6min 28s
Wall time: 3min 16s


0.19341635847497163

In [16]:
%%time
graph.get_clustering_coefficient_per_node()

CPU times: user 6min 26s, sys: 1.1 s, total: 6min 27s
Wall time: 3min 16s


array([0.18254768, 0.24017919, 0.17728976, ..., 0.        , 0.        ,
       0.59183074])