Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Enhanced Matching Pursuit Implementation (empi)
Author: Piotr Różański ⓒ 2015–2022

IMPORTANT: Neither the official stable release of 1.0 is yet available, nor the current version of the master codebase should be considered stable. Coming soon!

What is empi?

empi is an implementation of Matching Pursuit algorithm (Mallat, Zhang 1993) with optimal dictionaries (Kuś, Różański, Durka 2013) supporting both Gabor atoms as well as atoms related to non-Gaussian envelopes (Różański 2020). It is a highly-optimized multi-threaded version written in C++, with GPU support, designed as a faster replacement for MP5. Prior version shared most of the input/output specification with MP5, and could be used as MP decomposition tool in SVAROG. Support for the current version is underway.

The goal is to provide an optimal decomposition of the input signal as a linear combination of functions from predefined set (dictionary) consisting mainly of oscillating atoms. By combining the optimal dictionary construction with a detailed analysis of the maximum error within a single iteration, it can be used to simulate a continuous dictionary, therefore relieving the user from the necessity of defining the particular structure for the dictionary. What is even more important, it therefore completely eliminates the statistical bias caused by the dictionary structure—an important effect which has been earlier dealt with by e.g. introduction of stochastic dictionaries.

There are two modes of CPU parallelization, which can also be used together. First, a number of independent workers can be started, and each worker will process a separate subset of segments and/or channels. Second, each worker can be started with a number of concurrent CPU threads. Either way, in addition to all workers' threads, an additional single thread will be active and responsible for writing the decomposition results from all workers, in proper order, to the output file.

If the GPU devices are used in addition to CPU, each worker is started with one additional thread (corresponding to a separate CUDA stream) for each GPU device. The performance gain from enabling GPU devices, especially those with good double-precision floating point capabilities, is significant.

How to get empi?

You can compile empi from source, or download the precompiled versions from the “Releases” tab. Both are available on project's GitHub. If you decide to use the precompiled binaries, you can skip the “Compilation” section altogether. However, since the purpose of the provided binaries is to be as compatible as possible, they may not take full advantage of your specific architecture. To achieve maximal performance and/or use GPU in calculations, compiling empi from source is recommended.



To compile empi, CMake build system is required. The only external library requirement is the FFTW library in version 3. Both library and the development headers must be installed to compile empi. Under Ubuntu, package “libfftw3-dev” does the trick. MacOS and other Linux distributions may have packages of slightly different names. Under Windows, follow the FFTW installation instructions.

Also, you will need a modern C++ compiler with support for C++17 standard, as well as CMake version 3.12 or later.


This project uses CMake, so the proper way of compilation depends on your environment. Generally speaking, you can use CMake-gui to generate build files for your specific configuration.

Unix-family OS

The easiest way is to run

cmake .

or, if you need to build standalone binaries (however, it will disable some platform-specific optimizations),

cmake -DSTANDALONE=1 .

followed by

make empi

in the directory where you cloned your repository; you can also do an out-of-source build, if you prefer. If successful, binary file “empi” shall appear. It can be installed to system directory (e.g. /usr/local/bin) by calling

sudo make install

How to use empi?

Single invocation of empi will

  • read a single binary signal file (or its part),
  • decompose it as a linear combination of well-defined structures, and
  • save the results as either SQLite (default) or JSON

Directory demo includes Python and Matlab/Octave scripts demonstrating how to access data from the resulting SQLite decomposition file.

empi needs to be run with at least two command-line argument: a path to the input file and a path to the output file. It can be run with --help flag to list all possible flags and arguments:

Enhanced Matching Pursuit Implementation (empi)
Usage: empi [OPTIONS] input_file output_file

  input_file TEXT REQUIRED    Path to the input signal file or input configuration file
  output_file TEXT REQUIRED   Path for the output file unless configuration file is used

  -h,--help                   Print this help message and exit
  -c INT=1                    Number of channels in the input signal
  -f FLOAT                    Sampling frequency of the input signal in hertz (default: 1 Hz)
  -i INT                      Maximum number of iterations (default: no limit)
  -o TEXT                     Parameter optimization mode: none|local|global (default: global)
  -r FLOAT=0.01               Energy of the residual as a fraction of the total signal energy
  --channels TEXT             Range of channels to process, e.g. 1-3,5,8-9 (default: all)
  --cpu-threads UINT=6        Number of CPU threads for each worker
  --cpu-workers UINT=1        Number of independent CPU workers to run
  --delta                     Include delta-type atoms
  --energy-error FLOAT=0.05   Epsilon-squared parameter corresponding to the dictionary size
  --gpu-id TEXT               Comma-separated ID list of GPU device(s) to use (default: none)
  --input64                   Read input data as double-precision (64-bit) floating point values (default: read as 32-bit values)
  --mmp1 Excludes: --mmp3     Use multi-variate decomposition with constant phase across channels
  --mmp3 Excludes: --mmp1     Use multi-variate decomposition with variable phase across channels
  --opt-max-iter INT=10000    Maximum number of iterations for local parameter optimization
  --opt-target FLOAT=1e-05    Target accuracy (relative to the initial dictionary size) for local parameter optimization
  --residual-log-dir TEXT     Directory in which residual energy log files should be created (default: none)
  --segment-size INT          Number of samples in each segment (default: all samples)
  --segments TEXT Needs: --segment-size
                              Range of signal segments, e.g. 1-100,201-300 (default: all)
  --gabor                     Include atoms with Gaussian envelope (not needed if any other --gabor-* option is given)
  --gabor-freq-max FLOAT      Maximum frequency (in hertz) for Gaussian envelope (default: auto)
  --gabor-scale-min FLOAT     Minimum scale (in seconds) for Gaussian envelope (default: auto)
  --gabor-scale-max FLOAT     Maximum scale (in seconds) for Gaussian envelope (default: auto)
  --gabor-half-width FLOAT=3  Half-width of the Gaussian envelope function

Command-line options

There are two required positional arguments:

  1. input_file is the full (or relative to the current directory) path to the input file. The input file should consist of 32-bit (or 64-bit if the --input64 flag is given) floating-point values in the byte order of the current machine (no byte-order conversion is performed). For multichannel signals, first come the samples for all channels at t=0, then for all channels at t=Δt, and so forth. In other words, the signal should be written in column-major order (rows = channels, columns = samples).
  2. output_file is the full (or relative to the current directory) path for the output file. If the path ends in .json, JSON-formatted text file will be created. Otherwise, SQLite database file will be created.

The optional parameters are described below:

Properties of the input file

  • -c represents the number of all channels in the input file, the default corresponding to a single-channel signal (as with -c 1).
  • -f represents the signal's sampling frequency in hertz, the default being 1 Hz.
  • --channels allows to specify the subset of channels (between 1 and the value of -c) that should be read from the signal and decomposed. These can be specified as a single channel 1, as an interval 1-5, as a list 3,4 or mixed: 1-2,5,8-10.
  • --input64 assumes the input signal file consists of 64-bit floating point values, as opposed to the 32-bit as default.
  • --segment-size specifies the size of each signal segment (in samples); segments will be processed independently, and their decomposition will be written to the same output file. If this parameter is absent, the entire signal will be processed as a single segment.
  • --segments is only valid with --segment-size, and it specifies a list of epoch numbers (starting from 1) to be processed. These can be passed as a single epoch 1, as an interval 1-100, as a list 1,2,3 or mixed: 1-100,201-300,400. If not given, all epochs (the entire signal) will be processed.

Decomposition process

The decomposition will iterate until -i iterations or -r residual energy is reached, whichever comes first.

  • -i specifies an upper limit for the number of iterations, and therefore, a maximal number of atoms in the resulting decomposition.
  • -r is the percent of the residual energy that can be left un-explained by decomposition. For example, specifying -r 0.01 corresponds to performing the decomposition until the energy of the residual falls below 1% of the initial energy of the signal.
  • --mmp1 specifies a constant-phase multi-variate decomposition, while --mmp3 specifies a variable-phase variant. If neither is given, each channel is processed separately. Detailed description of multi-variate decomposition modes can be found in the literature, e.g. Kuś, Różański, Durka 2013.

Structure of the dictionary

  • --energy-error specifies the ε² parameter in optimal dictionary construction. Usually the values will be close to 0. Smaller value will allow for a more precise decomposition, but it will also engage more time and RAM. When simulating the continuous dictionary (-o option), this parameter does not affect the results, only time and memory consumption. The default value of 0.05 is approximately optimal for simulating continuous dictionaries.

  • --gabor-scale-min specifies the minimum scale of Gabor atoms in the dictionary. If not specified, the minimum scale is taken as the shortest scale permitted by the dictionary construction for the given value of ε².

  • --gabor-scale-max specifies the maximum scale of Gabor atoms in the dictionary. If not specified, it defaults to the length of the signal segment.

  • --gabor-freq-max specifies the maximum frequency (in hertz) of Gabor atoms in the dictionary. If not specified, defaults to Nyquist frequency.

Specifying --gabor-scale-min, --gabor-scale-max and --gabor-freq-max is not mandatory, but it allows to additionally reduce the computational time. These three options have their counterparts for other envelope functions as well (e.g. --tri-*).

Parallelization scheme

  • --cpu-workers defines the number of independent workers, where each worker will independently analyse different subset of signal segments.
  • --cpu-threads defines the number of CPU computation threads per each worker.
  • --gpu-id (only if compiled with GPU support) defines the comma-separated list of GPU devices that should assist in the decomposition.


empi is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

empi is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with empi (file “LICENCE”); if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA


Enhanced Matching Pursuit Implementation







No packages published