Enhanced Matching Pursuit Implementation (empi)
Author: Piotr Różański email@example.com ⓒ 2015–2022
IMPORTANT: Neither the official stable release of 1.0 is yet available, nor the current version of the master codebase should be considered stable. Coming soon!
What is empi?
empi is an implementation of Matching Pursuit algorithm (Mallat, Zhang 1993) with optimal dictionaries (Kuś, Różański, Durka 2013) supporting both Gabor atoms as well as atoms related to non-Gaussian envelopes (Różański 2020). It is a highly-optimized multi-threaded version written in C++, with GPU support, designed as a faster replacement for MP5. Prior version shared most of the input/output specification with MP5, and could be used as MP decomposition tool in SVAROG. Support for the current version is underway.
The goal is to provide an optimal decomposition of the input signal as a linear combination of functions from predefined set (dictionary) consisting mainly of oscillating atoms. By combining the optimal dictionary construction with a detailed analysis of the maximum error within a single iteration, it can be used to simulate a continuous dictionary, therefore relieving the user from the necessity of defining the particular structure for the dictionary. What is even more important, it therefore completely eliminates the statistical bias caused by the dictionary structure—an important effect which has been earlier dealt with by e.g. introduction of stochastic dictionaries.
There are two modes of CPU parallelization, which can also be used together. First, a number of independent workers can be started, and each worker will process a separate subset of segments and/or channels. Second, each worker can be started with a number of concurrent CPU threads. Either way, in addition to all workers' threads, an additional single thread will be active and responsible for writing the decomposition results from all workers, in proper order, to the output file.
If the GPU devices are used in addition to CPU, each worker is started with one additional thread (corresponding to a separate CUDA stream) for each GPU device. The performance gain from enabling GPU devices, especially those with good double-precision floating point capabilities, is significant.
How to get empi?
You can compile empi from source, or download the precompiled versions from the “Releases” tab. Both are available on project's GitHub. If you decide to use the precompiled binaries, you can skip the “Compilation” section altogether. However, since the purpose of the provided binaries is to be as compatible as possible, they may not take full advantage of your specific architecture. To achieve maximal performance and/or use GPU in calculations, compiling empi from source is recommended.
To compile empi, CMake build system is required. The only external library requirement is the FFTW library in version 3. Both library and the development headers must be installed to compile empi. Under Ubuntu, package “libfftw3-dev” does the trick. MacOS and other Linux distributions may have packages of slightly different names. Under Windows, follow the FFTW installation instructions.
Also, you will need a modern C++ compiler with support for C++17 standard, as well as CMake version 3.12 or later.
This project uses CMake, so the proper way of compilation depends on your environment. Generally speaking, you can use CMake-gui to generate build files for your specific configuration.
The easiest way is to run
or, if you need to build standalone binaries (however, it will disable some platform-specific optimizations),
cmake -DSTANDALONE=1 .
in the directory where you cloned your repository; you can also do an out-of-source build, if you prefer. If successful, binary file “empi” shall appear. It can be installed to system directory (e.g. /usr/local/bin) by calling
sudo make install
How to use empi?
Single invocation of empi will
- read a single binary signal file (or its part),
- decompose it as a linear combination of well-defined structures, and
- save the results as either SQLite (default) or JSON
Directory demo includes Python and Matlab/Octave scripts demonstrating how to access data from the resulting SQLite decomposition file.
empi needs to be run with at least two command-line argument: a path to the input
file and a path to the output file. It can be run with
--help flag to list all
possible flags and arguments:
Enhanced Matching Pursuit Implementation (empi) Usage: empi [OPTIONS] input_file output_file Positionals: input_file TEXT REQUIRED Path to the input signal file or input configuration file output_file TEXT REQUIRED Path for the output file unless configuration file is used Options: -h,--help Print this help message and exit -c INT=1 Number of channels in the input signal -f FLOAT Sampling frequency of the input signal in hertz (default: 1 Hz) -i INT Maximum number of iterations (default: no limit) -o TEXT Parameter optimization mode: none|local|global (default: global) -r FLOAT=0.01 Energy of the residual as a fraction of the total signal energy --channels TEXT Range of channels to process, e.g. 1-3,5,8-9 (default: all) --cpu-threads UINT=6 Number of CPU threads for each worker --cpu-workers UINT=1 Number of independent CPU workers to run --delta Include delta-type atoms --energy-error FLOAT=0.05 Epsilon-squared parameter corresponding to the dictionary size --gpu-id TEXT Comma-separated ID list of GPU device(s) to use (default: none) --input64 Read input data as double-precision (64-bit) floating point values (default: read as 32-bit values) --mmp1 Excludes: --mmp3 Use multi-variate decomposition with constant phase across channels --mmp3 Excludes: --mmp1 Use multi-variate decomposition with variable phase across channels --opt-max-iter INT=10000 Maximum number of iterations for local parameter optimization --opt-target FLOAT=1e-05 Target accuracy (relative to the initial dictionary size) for local parameter optimization --residual-log-dir TEXT Directory in which residual energy log files should be created (default: none) --segment-size INT Number of samples in each segment (default: all samples) --segments TEXT Needs: --segment-size Range of signal segments, e.g. 1-100,201-300 (default: all) --gabor Include atoms with Gaussian envelope (not needed if any other --gabor-* option is given) --gabor-freq-max FLOAT Maximum frequency (in hertz) for Gaussian envelope (default: auto) --gabor-scale-min FLOAT Minimum scale (in seconds) for Gaussian envelope (default: auto) --gabor-scale-max FLOAT Maximum scale (in seconds) for Gaussian envelope (default: auto) --gabor-half-width FLOAT=3 Half-width of the Gaussian envelope function
There are two required positional arguments:
- input_file is the full (or relative to the current directory) path to the input file.
The input file should consist of 32-bit (or 64-bit if the
--input64flag is given) floating-point values in the byte order of the current machine (no byte-order conversion is performed). For multichannel signals, first come the samples for all channels at t=0, then for all channels at t=Δt, and so forth. In other words, the signal should be written in column-major order (rows = channels, columns = samples).
- output_file is the full (or relative to the current directory) path for the output file.
If the path ends in
.json, JSON-formatted text file will be created. Otherwise, SQLite database file will be created.
The optional parameters are described below:
Properties of the input file
-crepresents the number of all channels in the input file, the default corresponding to a single-channel signal (as with
-frepresents the signal's sampling frequency in hertz, the default being 1 Hz.
--channelsallows to specify the subset of channels (between 1 and the value of
-c) that should be read from the signal and decomposed. These can be specified as a single channel
1, as an interval
1-5, as a list
--input64assumes the input signal file consists of 64-bit floating point values, as opposed to the 32-bit as default.
--segment-sizespecifies the size of each signal segment (in samples); segments will be processed independently, and their decomposition will be written to the same output file. If this parameter is absent, the entire signal will be processed as a single segment.
--segmentsis only valid with
--segment-size, and it specifies a list of epoch numbers (starting from 1) to be processed. These can be passed as a single epoch
1, as an interval
1-100, as a list
1-100,201-300,400. If not given, all epochs (the entire signal) will be processed.
The decomposition will iterate until
-i iterations or
-r residual energy is reached, whichever comes first.
-ispecifies an upper limit for the number of iterations, and therefore, a maximal number of atoms in the resulting decomposition.
-ris the percent of the residual energy that can be left un-explained by decomposition. For example, specifying
-r 0.01corresponds to performing the decomposition until the energy of the residual falls below 1% of the initial energy of the signal.
--mmp1specifies a constant-phase multi-variate decomposition, while
--mmp3specifies a variable-phase variant. If neither is given, each channel is processed separately. Detailed description of multi-variate decomposition modes can be found in the literature, e.g. Kuś, Różański, Durka 2013.
Structure of the dictionary
--energy-errorspecifies the ε² parameter in optimal dictionary construction. Usually the values will be close to 0. Smaller value will allow for a more precise decomposition, but it will also engage more time and RAM. When simulating the continuous dictionary (
-ooption), this parameter does not affect the results, only time and memory consumption. The default value of 0.05 is approximately optimal for simulating continuous dictionaries.
--gabor-scale-minspecifies the minimum scale of Gabor atoms in the dictionary. If not specified, the minimum scale is taken as the shortest scale permitted by the dictionary construction for the given value of ε².
--gabor-scale-maxspecifies the maximum scale of Gabor atoms in the dictionary. If not specified, it defaults to the length of the signal segment.
--gabor-freq-maxspecifies the maximum frequency (in hertz) of Gabor atoms in the dictionary. If not specified, defaults to Nyquist frequency.
--gabor-freq-max is not
mandatory, but it allows to additionally reduce the computational time.
These three options have their counterparts for other envelope functions as well
--cpu-workersdefines the number of independent workers, where each worker will independently analyse different subset of signal segments.
--cpu-threadsdefines the number of CPU computation threads per each worker.
--gpu-id(only if compiled with GPU support) defines the comma-separated list of GPU devices that should assist in the decomposition.
empi is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
empi is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with empi (file “LICENCE”); if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA