# Topological Data Analysis with Python and the Gudhi Library 

# Introduction to simplex trees 

**Authors** : F. Chazal and B. Michel

TDA typically aims at extracting topological signatures from a point cloud in $\mathbb R^d$ or in a general metric space. By studying the topology of the point clouds, we actually mean studying the topology of unions of balls centered at the point cloud (offsets). However, non-discrete sets such as offsets, and also continuous mathematical shapes like curves, surfaces and more generally manifolds, cannot easily be encoded as finite discrete structures. [Simplicial complexes](https://en.wikipedia.org/wiki/Simplicial_complex) are therefore used in computational geometry to approximate such shapes.

A simplicial complex is a set of [simplices](https://en.wikipedia.org/wiki/Simplex). It can be seen as a higher dimensional generalization of a graph. It is a mathematical object that is both topological and combinatorial, which makes it particularly useful for TDA. Here is an exemple of simplicial complex:

![title](Images/Pers14.PNG)
 
A filtration is a increasing sequence of sub-complexes of a simplicial complex $\mathcal K$. It can be seen as ordering the simplices included in the complex. Indeed, simpicial complexes often come with a specific order, as for [Vietoris-Rips complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex), [Cech complexes](https://en.wikipedia.org/wiki/%C4%8Cech_complex) and [alpha complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex). 

In [1]:
from IPython.display import Image
from os import chdir
import numpy as np
import gudhi as gd
import matplotlib.pyplot as plt

In Gudhi, filtered simplicial complexes are encoded through a data structure called simplex tree. 
![CSexemple](http://gudhi.gforge.inria.fr/python/latest/_images/Simplex_tree_representation.png)

This notebook illustrates the use of simplex tree to represent simplicial complexes from data points.

See the [Python Gudhi documentation](http://gudhi.gforge.inria.fr/python/latest/simplex_tree_ref.html#) for more details on simplex trees.

### My first simplex tree

Let's create our first simplicial complex, represented by a simplex tree :

In [2]:
st = gd.SimplexTree()

The `st` object has class `SimplexTree`. For now, `st` is an empty simplex tree.

The `SimplexTree` class has several useful methods for the practice of TDA. For instance, there are methods to define new types of simplicial complexes from existing ones.

The `insert()` method can be used to insert simplices in the simplex tree. In the simplex tree:

- vertices (0-dimensional simplices) are represented with integers, 
- edges (1-dimensional simplices) are represented with a length-2 list of integers (corresponding to the two vertices involved in the edge),
- triangles (2-dimensional simplices) by three integers are represented with a length-3 list of integers (corresponding to the three vertices involved in the triangle),
- etc.

For example, the following piece of code inserts three edges into the simplex tree:

In [22]:
st.insert([0, 1])
st.insert([1, 2])
st.insert([3, 1])

False

When the simplex is successfully inserted into the simplex tree, the `insert()` method outputs `True` as you can see from the execution of the above code. On the contrary, if the simplex is already in the filtration, the `insert()` method outputs `False`:

In [23]:
st.insert([3, 1])

False

We obtain the list of all the simplices in the simplex tree with the `get_filtration()` method : 

In [5]:
st_gen = st.get_filtration() 

The output `st_gen` is a generator and we thus we can iterate on its elements. Each element in the list is a tuple that contains a simplex and its **filtration value**.

In [6]:
for splx in st_gen :
    print(splx)

([0], 0.0)
([1], 0.0)
([0, 1], 0.0)
([2], 0.0)
([1, 2], 0.0)
([3], 0.0)
([1, 3], 0.0)


Intuitively, the filtration value of a simplex in a filtered complex acts as a *time stamp* corresponding to "when" the simplex appears in the filtration. By default, the `insert()` method assigns a filtration value equal to 0.

Notice that inserting an edge automatically inserts its vertices (if they were not already in the complex) in order to satisfy the **inclusion property** of a filtered complex: any simplex with filtration value $t$ must have all its faces in the filtered complex, with filtration values smaller than or equal to $t$.

### Simplex tree description

The dimension of a simplical complex is the largest dimension of the simplices in it. It can be retrieved by the simplex tree `dimension()` method:

In [7]:
st.dimension()

1

It is possible to compute  the number of vertices in a simplex tree via the `num_vertices()` method:

In [10]:
st.num_vertices()

4

The number of simplices in the simplex tree is given by

In [9]:
st.num_simplices()

7

The [$d$-skeleton](https://en.wikipedia.org/wiki/N-skeleton) -- which is the union of all simplices of dimensions smaller than or equal to $d$ -- can be also computed with the `get_skeleton()` method. This method takes as argument the dimension of the desired skeleton. To retrieve the topological graph from a simplex tree, we can therefore call:

In [18]:
print(st.get_skeleton(1))

[([0, 1], 0.0), ([0], 0.0), ([1, 2], 0.0), ([1, 3], 0.0), ([1], 0.0), ([2], 0.0), ([3], 0.0)]


One can also check whether a simplex is already in the filtration. This is achieved with the `find()` method:

In [19]:
st.find([2, 4])

False

### Filtration values

We can insert simplices at a given filtration value. For example, the following piece of code will insert three triangles in the simplex tree at three different filtration values:

In [24]:
st.insert([0, 1, 2], filtration = 0.1)
st.insert([1, 2, 3], filtration = 0.2)
st.insert([0, 1, 3], filtration = 0.4)
st_gen = st.get_filtration() 

for splx in st_gen :
    print(splx)

([0], 0.0)
([1], 0.0)
([0, 1], 0.0)
([2], 0.0)
([1, 2], 0.0)
([3], 0.0)
([1, 3], 0.0)
([0, 2], 0.1)
([0, 1, 2], 0.1)
([2, 3], 0.2)
([1, 2, 3], 0.2)
([0, 3], 0.4)
([0, 1, 3], 0.4)


As you can see, when we add a new simplex with a given filtration value, all its faces that were not already in the complex are added with the same filtration value: here the edge `[0, 3]` was not part of the tree before including the triangle `[0, 1, 3]` and is thus inserted with the filtration value of the inserted triangle. On the other hand, the filtration value of the faces of added simplices that were already part of the tree before is left alone. One can modify the filtration value of any simplex included in the tree with the `assign_filtration()` method:

In [26]:
st.assign_filtration([3], filtration = 0.8)
st_gen = st.get_filtration()
for splx in st_gen:
    print(splx)   

([0], 0.0)
([1], 0.0)
([0, 1], 0.0)
([2], 0.0)
([1, 2], 0.0)
([1, 3], 0.0)
([0, 2], 0.1)
([0, 1, 2], 0.1)
([2, 3], 0.2)
([1, 2, 3], 0.2)
([0, 3], 0.4)
([0, 1, 3], 0.4)
([3], 0.8)


Notice that, the vertex `[3]` has been moved to the end of the filtration because it now has the highest filtration value. However, this simplex tree is not a filtered simplicial complex anymore because the filtration value of the vertex `[3]` is higher than the filtration value of the edge `[2 3]`. We can use the `make_filtration_non_decreasing()` method to solve the problem:

In [27]:
st.make_filtration_non_decreasing()
st_gen = st.get_filtration()
for splx in st_gen:
    print(splx)  

([0], 0.0)
([1], 0.0)
([0, 1], 0.0)
([2], 0.0)
([1, 2], 0.0)
([0, 2], 0.1)
([0, 1, 2], 0.1)
([3], 0.8)
([0, 3], 0.8)
([1, 3], 0.8)
([0, 1, 3], 0.8)
([2, 3], 0.8)
([1, 2, 3], 0.8)


Finally, it is worth mentioning the `filtration()` method, which returns the filtration value of a given simplex in the filtration :

In [29]:
st.filtration([2, 3])

0.8