Copyright 2014 IST Austria
Jan Reininghaus (Email: jan.reininghaus@gmail.com)
Ulrich Bauer, Michael Kerber
This C++ software package computes persistent homology following the algorithm proposed in [A]. Besides supporting parallel execution on a single machine, DIPHA may also be run on a cluster of several machines using MPI. For an introduction to persistent homology, see the textbook [B].
There are three types of input that are currently supported by DIPHA:
-
d-dimensional gray-scale image data. The data is internally interpreted as a weighted cubical cell complex and its lower-star filtration is used for the subsequent persistence computation. This approach is described in detail in [C].
-
distance matrix data. The data is internally interpreted as a Vietoris–Rips complex of as many points as there are columns in the given matrix. The distance between any two points is then defined by the input data. To allow for large data analysis, DIPHA supports sparse distance matrices which only encode the finite distances to save space.
-
weighted regular cell complexes in (co-)boundary matrix form. This (fallback) input type allows the computation of persistent homology of e.g. Alpha complexes, Vietoris–Rips complex approximations, or Witness complexes.
The output produced by DIPHA consists of the persistence diagram. Each point in the diagram is defined by the dimension of the homological feature that it represents and the corresponding birth- and death value.
Input and output is realized using binary files whose format is specified below. DIPHA includes MATLAB functions to create the input files and visualize the output.
To achieve good performance DIPHA supports dualized computation as described in [D], makes use of the optimization introduced in [E] and employs an efficient data structure developed in the PHAT project as described in [F].
Prerequisites:
-
A modern C++ compiler like GCC 4.7, Clang 3.3, or Visual Studio 2015. DIPHA uses C++11, so older C++ compilers are not supported. In particular, Visual Studio 2013 is not supported.
-
An
MPI
implementation like Open MPI, MPICH2, or MS-MPI. Other MPI implementations may also work, but are untested. -
the cross-platform, open-source build system CMake, version 2.8 or later
To compile DIPHA:
-
use CMake to create a build environment.
-
compile DIPHA using your favorite C++ compiler.
To run DIPHA using a single process:
dipha [options] input_filename output_filename
where the available options
are:
-
--benchmark
: prints some profiling information. -
--dual
: runs the dualized version of the algorithm, which dramatically improves the running time in some cases. -
--upper_dim D
: restricts the computation to the firstD
dimensions of the input complex. This option is mandatory when dealing with distance matrix data.
To run DIPHA with N processes using MPI
:
mpiexec -n N dipha [options] input_filename output_filename
All file formats in DIPHA are in little-endian binary format. Integral values are stored using the binary representation of 64bit signed integers, while floating point values are stored in IEEE 754 double-precision binary floating-point format.
The first symbol in every DIPHA file is the magic number 8067171840. The second symbol is the integer encoding the actual file type. See dipha/file_types.h
for a list of file types and their associated integral identifier. The rest of the symbols in the file depend on the file type:
-
d-dimensional gray-scale image data (
IMAGE_DATA
):- number of data values
n
- dimension
d
- lattice resolution:
g(1) ... g(d)
- floating point data values in x-fastest order:
v(1) ... v(n)
Example: a gray-scale FullHD frame would have
n = 1920 * 1080
,d = 2
,g(1) = 1920
,g(2) = 1080
, andv(1) ... v(n)
, would be the gray-scale values in scan-line order. A simple MATLAB file writer is included in DIPHA (matlab/save_image_data.m
). - number of data values
-
weighted regular cell complexes in (co-)boundary matrix form (
WEIGHTED_BOUNDARY_MATRIX
):- 0 if the file contains the boundaries of the cells, 1 if the file contains the coboundaries
- total number of cells
n
- dimension of the cell complex
d
- dimensions of the cells:
dim(1) ... dim(n)
- floating point values of the cells:
value(1) ... value(n)
- offsets of (co-)boundary entries of the cells:
offset(1) ... offset(n)
. Each offset represents the first index of the (co-)boundary in the flattened (co-)boundary matrix. - total number of non-zero entries
m
of the (co-)boundary matrix - entries of the flattened (co-)boundary matrix:
entry(1) ... entry(m)
To convert a file from boundary format (required for primal computation) to coboundary format (required for dual computation), you may use the included dualize
utility.
-
distance matrix data (
DISTANCE_MATRIX
)- number of input points
n
- floating point values for the distances in row order:
d(1,1) ... d(1,n) d(2,1) ... d(n,n)
A simple MATLAB file writer is included in DIPHA (
matlab/save_distance_matrix.m
). - number of input points
-
sparse distance matrix data (
SPARSE_DISTANCE_MATRIX
)- number of input points
n
- number of connections of each point:
c(1) ... c(n)
- indices of connected vertices:
i(1,1) ... i(1,c(1)) i(2,1) ... i(n,c(n))
- distance values of connected vertices
d(1,1) ... d(1,c(1) d(2,1) ... d(n,c(n))
- number of input points
To create a sparse distance matrix DIPHA provides a full_to_sparse_distance_matrix
utility.
-
persistence diagram (
PERSISTENCE_DIAGRAM
):- number of points in the diagram
p
- dimension, birth, death triples:
dim(1) birth(1) death(1) ... dim(p) birth(p) death(p)
. Dimensions are stored as integers, while birth and death values are stored as floating point numbers.
If
dim(k)
is negative, then the corresponding triple represents an essential class of dimension-dim(k) - 1
. A simple MATLAB function to plot persistence diagrams is included in DIPHA (matlab/plot_persistence_diagram.m
). - number of points in the diagram
To save space, DIPHA does not come with any example data files. However, there are several MATLAB functions (matlab/create_*.m
) that may be used to create synthetic examples for benchmark purposes. For example, to create a 3-dimensional image data set with 32^3 voxels whose values are given by random numbers you can execute the command
matlab -nojvm -nodisplay -nosplash -r "addpath('../matlab'); create_noise_image_data(3, 32); exit"
in the dipha/examples
folder.
To visualize persistence diagrams with a large number of points it is recommended to use the provided MATLAB function that draws a density estimate of the diagram (matlab/plot_persistence_diagram_density.m
).
A) U.Bauer, M.Kerber, J.Reininghaus: Distributed computation of persistent homology. Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), 2014.
B) H.Edelsbrunner, J.Harer: Computational Topology, An Introduction. American Mathematical Society, 2010, ISBN 0-8218-4925-5.
C) H.Wagner, C.Chen, E.Vucini: Efficient Computation of Persistent Homology for Cubical Data. Topological Methods in Data Analysis and Visualization II, 2012.
D) V.de Silva, D.Morozov, M.Vejdemo-Johansson: Dualities in persistent (co)homology. Inverse Problems 27, 2011.
E) C.Chen, M.Kerber: Persistent Homology Computation With a Twist. 27th European Workshop on Computational Geometry, 2011.
F) U.Bauer, M.Kerber, J.Reininghaus, H.Wagner: PHAT – Persistent Homology Algorithms Toolbox. Mathematical Software – ICMS 2014, Lecture Notes in Computer Science Volume 8592, 2014, pp 137-143