# Billion-scale triangles on multigraphs with 🍇 GRAPE 🍇
In this tutorial, I will show you how to use the [GRAPE library](https://github.com/AnacletoLAB/grape) to count the number of triangles in a graph. We will first compute a vertex cover of the graph using GRAPE, [as extensively covered in this previous tutorial](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Billion-scale%202-approximated%20vertex%20cover%20with%20GRAPE.ipynb), and then use this vertex cover to efficiently count the triangles in the graph by using a parallelized and generalized version of [this awesome algorithm by Oded Green and David Bader](https://davidbader.net/publication/2013-g-ba/2013-g-ba.pdf). This version handles both graphs with self-loops and multigraphs.

We explore the impact of using different vertex covers on smaller graphs, and then we are going to compute the triangles, both globally and per-node, for several larger graphs, including Friendster and ClueWeb09.

The key difference between the algorithms for global triangle counts and triangle counts per node is the use of atomic instructions, specifically [fetch add](https://en.wikipedia.org/wiki/Fetch-and-add). Global triangle counts can simply be computed by adding up the triangle counts of all the nodes in the graph, without no need to associate the values to each triangle. However, when computing triangle counts per node, it is important to ensure that each triangle is only counted once. This can be achieved using [atomic instructions](https://en.wikipedia.org/wiki/Fetch-and-add), which allow multiple threads to access and update a shared resource without interference. By using fetch add, we can ensure that each triangle is only counted once, allowing us to accurately compute the triangle count per node.

I will explain the concept of a triangles and its importance in triangle counting, and what triangle counting is for. We will touch upon on basic graph concepts such as self-loops and multigraphs, multisets and multiplicity functions. By the end of the tutorial, you will have a good understanding of how to use [GRAPE](https://github.com/AnacletoLAB/grape) to count the triangles in a graph and apply this knowledge to your projects.

[Remember to ⭐ GRAPE!](https://github.com/AnacletoLAB/grape)

### What is GRAPE?
[🍇🍇 GRAPE 🍇🍇](https://github.com/AnacletoLAB/grape) is a graph processing and embedding library that enables users to easily manipulate and analyze graphs. With [GRAPE](https://github.com/AnacletoLAB/grape), users can efficiently load and preprocess graphs, generate random walks, and apply various node and edge embedding models. Additionally, [GRAPE](https://github.com/AnacletoLAB/grape) provides a fair and reproducible evaluation pipeline for comparing different graph embedding and graph-based prediction methods.

![features in GRAPE](https://github.com/AnacletoLAB/grape/raw/main/images/sequence_diagram.png?raw=true)

## Triangles in graphs
In graph theory, **a triangle is a simple cycle of three vertices**. A triangle is also known as a 3-cycle.

A triangle can be represented by three vertices and the three edges connecting them. For example, in the following graph:

<img src="https://github.com/AnacletoLAB/grape/blob/main/images/triangle.jpg?raw=true" width=200 />

There is one triangle, formed by vertices `1`, `2`, and `3`. The triangle is represented by the three edges connecting these vertices: `(1,2)`, `(2,3)`, and `(3,1)`.

### How many triangles are in this graph?
In this graph we have one triangle, but also each of the nodes has one triangle. So, if we sum all of the triangles of the nodes in the graph, we would get three triangles.

### Why should you care about triangles of each node?
[Triangles](https://en.wikipedia.org/wiki/Triangle_graph) are an important concept in graph theory because they represent a basic unit of connectivity in a graph. Knowing exactly how connected each node is, and more importantly, the various areas of the graphs, is an extremely important analytics tool: it allows to plan and test out different approaches on the different areas of the graph, which may have massively different densities. Areas with high density, i.e. connectivity, may currespond to an area where one could execute model predictions easily as we have a lot of information. Conversely, areas with very low density may be areas where we should make an effort to find more knowledge. It is likely that a model that performs well in areas with low density may not be the same model that performs well in high density areas.

We discuss an analogous [degree-based Goldilock holdout in this previous tutorial](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Graph_holdouts_using_GRAPE.ipynb).

Triangles also have several applications in various fields, including social network analysis, machine learning, and data mining.

### What is triangle counting?
The triangle count problem is the problem of counting the number of triangles in a graph. It is a subproblem of more general cycle counting problems, such as counting the number of cycles of a given length in a graph.

To count the number of triangles in a graph, one must first identify all of the triangles in the graph. This can be done using various algorithms, such as brute force methods, matrix multiplication-based algorithms, and random sampling-based algorithms. Once all of the triangles in the graph have been identified, the total number of triangles can be counted by simply adding up the number of triangles identified by the algorithm.

<img src="https://github.com/AnacletoLAB/grape/blob/main/images/triangles_in_graph.jpg?raw=true" width=500 />

#### Why should I count triangles?
The triangle count problem has several applications in various fields, including social network analysis, machine learning, and data mining. In these fields, the number of triangles in a graph is often used as a measure of the graph's structure and connectivity. For example, in social network analysis, the number of triangles in a person's social network can be used to measure the person's [clustering coefficient](https://en.wikipedia.org/wiki/Clustering_coefficient), which is a measure of how well connected the person is to their friends. In machine learning and data mining, the triangle count problem can be used to identify patterns and trends in large data sets, and can be used for tasks such as general graphs node embedding, i.e. not specific to a single graph.

We will explore in an upcoming tutorial how we can compute the clustering coefficient of large graphs.

### Some topological quirks we need to look out for 
In several real-world graphs there exist topological peculiarities than can be forgotten while designing algorithms for graphs. The two that are relevant for triangle counting are self-loops and multigraphs, i.e. graphs characterized by nodes connected by multiple edges.

#### Multigraphs
In a multi-graph nodes may be connected by multiple edges. These graphs can be, for instance, knowledge graphs where various nodes are connected by multiple edges representing different types of relationships.

<img src="https://github.com/AnacletoLAB/grape/blob/main/images/multigraph.png?raw=true" width=400 />

##### Multisets
While we are often used to think of node neighbours as sets, for multi-graphs they may take the shape of [multisets](https://en.wikipedia.org/wiki/Multiset#:~:text=In%20mathematics%2C%20a%20multiset%20(or,that%20element%20in%20the%20multiset.), i.e. sets where one or more elements appear multiple times.

We use the notation $m_{\mathcal{N}(v)}(w): V \to \mathbb{N}$ to define the **multiplicity function** that returns for a given node $w \in V$, how many times it appears in the immediate neighbourhood of node $v \in V$, represented as $\mathcal{\mathcal{N}(v)}$. In the python pseudocode, we represent the function $m_{\mathcal{N}(v)}(w)$ as `multiplicity(neighbours(v), w)`.

<img src="https://github.com/AnacletoLAB/grape/blob/main/images/set_and_multiset.png?raw=true" width=400 />

#### Self-loops
The name is rather self-explanatory: self-loops are edges that start and end in the same node. In multi-graphs, nodes may have several self-loops.

<img src="https://github.com/AnacletoLAB/grape/blob/main/images/multigraph_with_self_loops.png?raw=true" width=400 />

### An efficient way to count triangles per node!
We will be using an updated and generalized version of [an efficient method to count triangles in a graph involves using a vertex cover created by Oded Green and David Bader](https://davidbader.net/publication/2013-g-ba/2013-g-ba.pdf). A vertex cover is a set of vertices such that for every edge in the graph, at least one of its endpoints is included in the vertex cover. By exploiting the properties of a vertex cover, it is possible to significantly reduce the number of intersections of adjacency lists that must be performed in order to count the triangles in a graph.

In this updated version, we introduce support for graphs and multi-graphs including self-loops, and provide both the global count of triangles in the graph, and the per node count.

[We have covered in a previous tutorial how we can compute good 2-approximated vertex covers](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Billion-scale%202-approximated%20vertex%20cover%20with%20GRAPE.ipynb).

#### Global count version
The algorithm for counting globally the number of triangles in the graph, in python pseudocode, is the following:

```python
# We initialize the number of
# triangles in the graph to zero
total_triangles = 0

# We can compute a vertex cover using many approaches
# During the last tutorial, I showed a 2-approximation
# of a minimum vertex cover.
# I stress that the following vertex cover does not need
# to be minimal, but the smaller it is the faster the algorithm
# will be.
vertex_cover = compute_vertex_cover()

# We iterate over all nodes in the vertex cover
# Of course, this iteration can be trivially
# parallelized, and if necessary, distributed over
# a computation cluster in the form of a map-reduce (https://en.wikipedia.org/wiki/MapReduce)
for first in vertex_cover:
    # We iterate over all neighbours of the current node
    for second in neighbours(node):
        # If the second is larger the first node,
        # we can skip it and focus on the lower triangular matrix
        # and avoid dividing by two later down the line.
        if second > first:
            break
        # We need to check to avoid self-loops
        if second == first:
            continue
        # And, we need to check whether this node is in the vertex
        # cover. Note that, by definition, in a triangle at least two
        # nodes are in the vertex cover.
        if second not in vertex_cover:
            continue
        # Otherwise we can continue, and we iterate
        # over the intersection of the first and second
        # node neighbours.
        for third in insersection(
            neighbours(first),
            neighbours(second)
        ):
            # We skip over the first and second
            # nodes when we encounter them, as
            # these would not be triangles but tuples,
            # or self-loops.
            if third == second or third == first:
                continue
               
            node_multiplicity = (
                multiplicity(neighbours(first), third) *
                multiplicity(neighbours(second), third)
            )
                
            # Then, if also the third node is in
            # the vertex cover, and we will likely
            # re-incounter this node again
            if third in vertex_cover:
                total_triangles += node_multiplicity
            else:
                # Otherwise this node won't be met
                # again, and we need to compensate
                total_triangles += 3 * node_multiplicity

# And done!
```

The algorithm works by iterating over all vertices in the vertex cover, and for each vertex, iterating over its neighbors in the adjacency list. If the neighbor is also included in the vertex cover, the algorithm calculates the intersection of the adjacency lists of the two vertices. If the intersection contains any vertices that are also included in the vertex cover, this implies the presence of a triangle in the graph, and the algorithm increments a counter. If the intersection contains any vertices that are not included in the vertex cover, this implies the presence of three triangles in the graph, and the algorithm increments the counter by three.

#### Per-node version
The algorithm for counting the number of triangles per node. The main difference from the global version to observe is the use of Atomics, i.e. instructions that guarantee that no memory collision happens when parallelizing the execution. Here it is in python pseudocode:

```python
# We create the vector of atomics
total_triangles = [
    Atomic(0)
    for _ in range(number_of_nodes_in_the_graph)
]

# We can compute a vertex cover using many approaches
# During the last tutorial, I showed a 2-approximation
# of a minimum vertex cover.
# I stress that the following vertex cover does not need
# to be minimal, but the smaller it is the faster the algorithm
# will be.
vertex_cover = compute_vertex_cover()

# We iterate over all nodes in the vertex cover
# Of course, this iteration can be trivially
# parallelized, and if necessary, distributed over
# a computation cluster in the form of a map-reduce (https://en.wikipedia.org/wiki/MapReduce)
for first in vertex_cover:
    # We iterate over all neighbours of the current node
    for second in neighbours(node):
        # If the second is larger the first node,
        # we can skip it and focus on the lower triangular matrix
        # and avoid dividing by two later down the line.
        if second > first:
            break
        # We need to check to avoid self-loops
        if second == first:
            continue
        # And, we need to check whether this node is in the vertex
        # cover. Note that, by definition, in a triangle at least two
        # nodes are in the vertex cover.
        if second not in vertex_cover:
            continue
        # Otherwise we can continue, and we iterate
        # over the intersection of the first and second
        # node neighbours.
        for third in insersection(
            neighbours(first),
            neighbours(second)
        ):
            # We skip over the first and second
            # nodes when we encounter them, as
            # these would not be triangles but tuples,
            # or self-loops.
            if third == second or third == first:
                continue
               
            node_multiplicity = (
                multiplicity(neighbours(first), third) *
                multiplicity(neighbours(second), third)
            )
                
            # Then, if also the third node is in
            # the vertex cover, and we will likely
            # re-incounter this node again
            if third in vertex_cover:
                total_triangles[first].fetch_add(node_multiplicity)
            else:
                # Otherwise this node won't be met
                # again, and we need to compensate
                total_triangles[first].fetch_add(node_multiplicity)
                total_triangles[second].fetch_add(node_multiplicity)
                total_triangles[third].fetch_add(node_multiplicity)

# And done!
```

The description of the global version understandably generally holds also for the per-node version.

## Installing GRAPE
First, we install the GRAPE library from PyPI:

In [1]:
!pip install grape -qU

## Experiments
Welcome to the experiments section of this tutorial! In this section, we will put our knowledge into practice by applying the triangles counting algorithm on several graphs, including [a few protein-protein interaction graphs from STRING PPI](https://string-db.org/), and then we are going to scale up our targer goals to the [KGCOVID19 knowledge graph](https://www.cell.com/patterns/fulltext/S2666-3899(20)30203-8?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389920302038%3Fshowall%3Dtrue), the [Friendster graph](https://networkrepository.com/friendster.php), and the [ClueWeb09 web graph](https://networkrepository.com/web-ClueWeb09.php).

We run these experiments on a machine with 24 threads and 12 cores.

**Do note that, for the limits of memory of my desktop, I will restart the jupyter after running the experiment on each of the large graphs.**

In my machine I have 12 cores and 24 threads. You can estimate the expected computation time by interpolating the time estimates on 24 threads and the amount you have:

In [1]:
import os

os.cpu_count()

24

Also, this machine has about `128GB` of RAM:

In [2]:
import psutil
    
psutil.virtual_memory().total / 1024**3 # total physical memory in Bytes

125.7062873840332

### Evaluating the impact of the vertex cover heuristic
One may employ several different strategies to build a vertex cover. Here, we consider six strategies overall, with three node ordering approaches and two addition schemas: either only the source node, or both the source and destination nodes. [Please do check out this tutorial to learn more about the vertex cover algorithm we employ here.](https://github.com/AnacletoLAB/grape/blob/main/tutorials/Billion-scale%202-approximated%20vertex%20cover%20with%20GRAPE.ipynb).

The node ordering schemas are:

* Natural: the order of how the nodes are loaded in the graph
* Decreasing degree: the nodes are sorted, as you have surely guessed, by decreasing node degree
* Increasing degree: the nodes are sorted, as you have surely guessed, by increasing node degree

#### ⚠️⚠️⚠️ Some of these are big graphs! Make sure you have the disk space! ⚠️⚠️⚠️
*This is a warning to ensure that users have sufficient disk space before downloading and using a large graph. It is important to ensure that you have enough space on your hard drive or another storage device to accommodate the graph size, as attempting to download or work with a graph that is too large for your available space can lead to errors and other issues. It is advisable to check your available disk space before downloading or working with a large graph and free up additional space if necessary.*

In [5]:
!du -sh /bfd/graphs/networkrepository/WebClueweb09/

631G	/bfd/graphs/networkrepository/WebClueweb09/


We start by loading the graphs:

In [1]:
%%time
from tqdm.auto import tqdm
from grape.datasets.string import SaccharomycesCerevisiae, HomoSapiens, MusMusculus
from grape.datasets.kghub import KGCOVID19
from grape.datasets.networkrepository import SocFriendster
from grape.datasets.networkrepository import WebClueweb09

graphs = [
    graph_builder(load_nodes=False)
    for graph_builder in tqdm((
        SaccharomycesCerevisiae, HomoSapiens, MusMusculus,
        KGCOVID19,# SocFriendster, WebClueweb09
    ))
]

  0%|          | 0/4 [00:00<?, ?it/s]

CPU times: user 41.2 s, sys: 4.25 s, total: 45.4 s
Wall time: 45 s


Let's plot some of the main properties of these graphs:

In [23]:
%%time
import pandas as pd

pd.DataFrame([
    {
        "Graph": graph.get_name(),
        "Nodes": graph.get_number_of_nodes(),
        "Edges": graph.get_number_of_directed_edges(),
        "Maximum degree": graph.get_maximum_node_degree()
    }
    for graph in graphs
])

CPU times: user 0 ns, sys: 476 µs, total: 476 µs
Wall time: 484 µs


Unnamed: 0,Graph,Nodes,Edges,Maximum degree
0,SaccharomycesCerevisiae,6691,1988592,2729
1,HomoSapiens,19566,11938498,7507
2,MusMusculus,22048,14496358,7669
3,KGCOVID19,574232,36501154,122238
4,SocFriendster,65608366,3612134270,5214
5,WebClueweb09,1684868322,15622771654,6444720


And now let's get started with benchmarking the vertex cover approaches by themselves:

In [32]:
%%time
from time import time
import numpy as np
from tqdm.auto import trange

vertex_cover_benchmarks = []

for approach in tqdm(
    ("arbitrary", "decreasing_node_degree", "increasing_node_degree"),
    leave=False
):
    for insert_only_source in (True, False):
        for graph in graphs:
            start = time()
            cover = graph.get_vertex_cover(
                approach=approach,
                insert_only_source=insert_only_source
            )
            delta = time() - start
            vertex_cover_benchmarks.append({
                "Graph": graph.get_name(),
                "Approach": "{approach}{only_source}".format(
                    approach=approach,
                    only_source = " (Only source)" if insert_only_source else ""
                ),
                "Degree": max([
                    graph.get_node_degree_from_node_id(node_id)
                    for node_id, in_cover in enumerate(cover)
                    if in_cover
                ]),
                "Size": np.sum(cover),
                "Time": delta,
            })

vertex_cover_benchmarks = pd.DataFrame(vertex_cover_benchmarks)
vertex_cover_benchmarks

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

CPU times: user 39min 2s, sys: 1min 56s, total: 40min 59s
Wall time: 32min 11s


Unnamed: 0,Graph,Approach,Degree,Size,Time
0,SaccharomycesCerevisiae,arbitrary (Only source),2729,6259,0.001069
1,HomoSapiens,arbitrary (Only source),7507,19220,0.004805
2,MusMusculus,arbitrary (Only source),7669,20984,0.005657
3,KGCOVID19,arbitrary (Only source),122238,206326,0.015471
4,SocFriendster,arbitrary (Only source),5214,35466082,8.325412
5,WebClueweb09,arbitrary (Only source),6444720,537894933,63.443028
6,SaccharomycesCerevisiae,arbitrary,2729,6240,0.077439
7,HomoSapiens,arbitrary,7507,19200,0.002452
8,MusMusculus,arbitrary,7669,20756,0.002645
9,KGCOVID19,arbitrary,122238,217552,0.012586


In [33]:
vertex_cover_benchmarks.to_csv("vertex_cover_benchmarks_with_degree.csv")

In [11]:
vertex_cover_benchmarks.groupby(["Graph", "Approach"]).first()[["Size"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,Size
Graph,Approach,Unnamed: 2_level_1
HomoSapiens,arbitrary,19200
HomoSapiens,arbitrary (Only source),19220
HomoSapiens,decreasing_node_degree,18904
HomoSapiens,decreasing_node_degree (Only source),18475
HomoSapiens,increasing_node_degree,19384
HomoSapiens,increasing_node_degree (Only source),19384
KGCOVID19,arbitrary,217552
KGCOVID19,arbitrary (Only source),206326
KGCOVID19,decreasing_node_degree,190520
KGCOVID19,decreasing_node_degree (Only source),180150


In [9]:
vertex_cover_benchmarks.groupby(["Graph", "Approach"]).agg(["mean", "std"])["Time"]

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,std
Graph,Approach,Unnamed: 2_level_1,Unnamed: 3_level_1
HomoSapiens,arbitrary,0.002573,0.000163
HomoSapiens,arbitrary (Only source),0.028659,0.052949
HomoSapiens,decreasing_node_degree,0.00342,0.000343
HomoSapiens,decreasing_node_degree (Only source),0.002373,0.000142
HomoSapiens,increasing_node_degree,0.002965,0.000372
HomoSapiens,increasing_node_degree (Only source),0.001332,0.000105
KGCOVID19,arbitrary,0.012551,0.001117
KGCOVID19,arbitrary (Only source),0.085708,0.156793
KGCOVID19,decreasing_node_degree,0.042259,0.000408
KGCOVID19,decreasing_node_degree (Only source),0.046375,0.002786


Let's explore the impact of the considered vertex cover approaches on counting triangles:

In [2]:
from tqdm.auto import trange
from time import time
import os
from typing import List
from grape import Graph
import pandas as pd

def experiment(graphs: List[Graph]):
    triangles_times = []

    for (approach, insert_only_source) in tqdm(
        (("arbitrary", False), ("decreasing_node_degree", True), ("increasing_node_degree", True)),
        leave=False,
        desc="Approaches"
    ):
        for graph in tqdm(graphs, leave=False, desc="Graphs"):
            global_start = time()
            number_of_triangles = graph.get_number_of_triangles(
                approach=approach,
                insert_only_source=insert_only_source
            )
            global_delta = time() - global_start
            per_node_start = time()
            _ = graph.get_number_of_triangles_per_node(
                approach=approach,
                insert_only_source=insert_only_source
            )
            per_node_delta = time() - per_node_start
            triangles_times.append({
                "Graph": graph.get_name(),
                "Approach": approach,
                "Global time": global_delta,
                "Triangles": number_of_triangles,
                "Per-node time": per_node_delta,
                "Threads": os.environ["RAYON_NUM_THREADS"]
            })

    triangles_times = pd.DataFrame(triangles_times)
    return triangles_times

We start by using a single thread.

In [3]:
%%time
assert os.environ["RAYON_NUM_THREADS"] == "1"

triangle_times = experiment(graphs)
triangle_times

Approaches:   0%|          | 0/3 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

CPU times: user 16min 36s, sys: 1.22 s, total: 16min 38s
Wall time: 16min 37s


Unnamed: 0,Graph,Approach,Global time,Triangles,Per-node time,Threads
0,SaccharomycesCerevisiae,arbitrary,4.211038,48834553,3.863275,1
1,HomoSapiens,arbitrary,45.747744,399408889,41.8122,1
2,MusMusculus,arbitrary,68.39175,713495427,62.750611,1
3,KGCOVID19,arbitrary,54.885351,402950936,51.553095,1
4,SaccharomycesCerevisiae,decreasing_node_degree,4.199256,48834553,3.849533,1
5,HomoSapiens,decreasing_node_degree,45.781654,399408889,41.9282,1
6,MusMusculus,decreasing_node_degree,68.105666,713495427,62.890942,1
7,KGCOVID19,decreasing_node_degree,63.495424,402950936,54.527481,1
8,SaccharomycesCerevisiae,increasing_node_degree,4.202889,48834553,3.85655,1
9,HomoSapiens,increasing_node_degree,45.684477,399408889,41.777634,1


We run the same code, but now we employ six threads, half of the available number of cores:

In [3]:
%%time
assert os.environ["RAYON_NUM_THREADS"] == "6"

triangle_times = experiment(graphs)
triangle_times

Approaches:   0%|          | 0/3 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

CPU times: user 16min 34s, sys: 1.38 s, total: 16min 35s
Wall time: 2min 47s


Unnamed: 0,Graph,Approach,Global time,Triangles,Per-node time,Threads
0,SaccharomycesCerevisiae,arbitrary,0.7059,48834553,0.647278,6
1,HomoSapiens,arbitrary,7.640224,399408889,6.995766,6
2,MusMusculus,arbitrary,11.36686,713495427,10.471254,6
3,KGCOVID19,arbitrary,9.392555,402950936,8.740777,6
4,SaccharomycesCerevisiae,decreasing_node_degree,0.704237,48834553,0.646049,6
5,HomoSapiens,decreasing_node_degree,7.635043,399408889,6.987294,6
6,MusMusculus,decreasing_node_degree,11.352611,713495427,10.45227,6
7,KGCOVID19,decreasing_node_degree,10.154157,402950936,9.384674,6
8,SaccharomycesCerevisiae,increasing_node_degree,0.703822,48834553,0.647169,6
9,HomoSapiens,increasing_node_degree,7.618852,399408889,7.004814,6


Now we run the same experiment with all of the available cores on this machine, which are 12.

In [5]:
%%time
assert os.environ["RAYON_NUM_THREADS"] == "12"

triangle_times = experiment(graphs)
triangle_times

Approaches:   0%|          | 0/3 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

CPU times: user 16min 51s, sys: 1.12 s, total: 16min 52s
Wall time: 1min 24s


Unnamed: 0,Graph,Approach,Global time,Triangles,Per-node time,Threads
0,SaccharomycesCerevisiae,arbitrary,0.35553,48834553,0.327627,12
1,HomoSapiens,arbitrary,3.863208,399408889,3.555809,12
2,MusMusculus,arbitrary,5.75946,713495427,5.290155,12
3,KGCOVID19,arbitrary,4.849544,402950936,4.413471,12
4,SaccharomycesCerevisiae,decreasing_node_degree,0.35578,48834553,0.331338,12
5,HomoSapiens,decreasing_node_degree,3.90134,399408889,3.566692,12
6,MusMusculus,decreasing_node_degree,5.741678,713495427,5.29499,12
7,KGCOVID19,decreasing_node_degree,5.163635,402950936,4.807692,12
8,SaccharomycesCerevisiae,increasing_node_degree,0.353358,48834553,0.328253,12
9,HomoSapiens,increasing_node_degree,3.850049,399408889,3.56634,12


Next, we will evaluate how the performance change when introducing hyper-treading, using two threads per core for a total of 24.

In [3]:
%%time
assert os.environ["RAYON_NUM_THREADS"] == "24"

triangle_times = experiment(graphs)
triangle_times

Approaches:   0%|          | 0/3 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

Graphs:   0%|          | 0/4 [00:00<?, ?it/s]

CPU times: user 21min 20s, sys: 2.33 s, total: 21min 22s
Wall time: 54.7 s


Unnamed: 0,Graph,Approach,Global time,Triangles,Per-node time,Threads
0,SaccharomycesCerevisiae,arbitrary,0.216855,48834553,0.201116,24
1,HomoSapiens,arbitrary,2.388485,399408889,2.191812,24
2,MusMusculus,arbitrary,3.585369,713495427,3.361303,24
3,KGCOVID19,arbitrary,3.365396,402950936,3.101996,24
4,SaccharomycesCerevisiae,decreasing_node_degree,0.217885,48834553,0.207023,24
5,HomoSapiens,decreasing_node_degree,2.396374,399408889,2.192528,24
6,MusMusculus,decreasing_node_degree,3.567286,713495427,3.343544,24
7,KGCOVID19,decreasing_node_degree,3.644732,402950936,3.393732,24
8,SaccharomycesCerevisiae,increasing_node_degree,0.221537,48834553,0.203044,24
9,HomoSapiens,increasing_node_degree,2.375963,399408889,2.202439,24


## Conclusions

In this tutorial, we learned how to use the [GRAPE](https://github.com/AnacletoLAB/grape) library to compute the exact number of triangles in large graphs. We discussed what is a triangle, and why counting triangles can be useful. Also, we illustrated an algorithm for computing triangles using an approximated vertex cover.

I hope you now have a better grasp on computing triangles and how to use GRAPE to compute them for your projects. Do feel free to reach out with any questions or feedback, as I always look for ways to improve this tutorial.

[And remember to ⭐ GRAPE!](https://github.com/AnacletoLAB/grape)