Large scale sparse blind source separation using blocks
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Pictures
ResultsDedale
.DS_Store
.gitattributes
.gitignore
BSS_Utils.py
Prox_SparsePositive.py
README.md
bGMCA_block.py
demo.py
launch_bGMCA.py
pyStarlet.py
sample_nmr_spectra.py
sparse2d.so

README.md

Implementation of large scale learning methods

This package comprehends the functions implemented in the context of the workpackage 3,4 milestone 3 (implementation of large scale learning methods) of the DEDALE project. For more explanations about the theoretical aspects of the algorithms, please have a look on the related technical report.

The software implements the block GMCA algorithm (bGMCA), whose goal is to perform sparse Blind Source Separation (BSS) in the large scale regime, i.e. when a large number of sources (typically 50) needs to be estimated.

Dependencies

launch_bGMCA.py

Allows for a simple launch of the bGMCA algorithm, e.g. to reproduce the results of the technical report.

bGMCA_block.py

Contains the algorithm in itself, divided into the warm-up (GMCA with blocks) and refinement stage (PALM with blocks). For each stage, a special function is implemented for the case of a non-negativity constraint in the direct domain associated with a sparsity enforced in the wavelet domain.

BSS_Utils.py

Contains some utilitary functions to perform BSS, such as the computation of the mixing criterion or the generation of simulated data.

Prox_SparsePositive.py

Computes the proximal operator for non-negativity in the direct domain and sparsity in the wavelet domain. Note that in this file, the PATH_WT variable has to be changed.

pyStarlet.py

Transformations into the wavelet domain.

sample_nmr_spectra.py

This module provides synthetic NMR spectra.

Data

3 ways of creating simulated data to test the algorithm are proposed in the launch_bGMCA.py file: using Bernouilli-Gaussian sources, generelized Gaussian sources or realistic 1H NMR sources. It is however possible to try on your own data, for instance in adding an option dataType == 4 in the launch_bGMCA.py file. Doing so, you must load:

  • X : the data matrix
  • A0 : the true mixing matrix
  • S0 : the true sources
  • N : the noise matrix

If you have only access to X, you can directly launch the bGMCA function (warm-up stage) followed by the PALM_NMF_MainBlock function (refinement stage).

Exemple: a relatively simple large scale BSS problem

import launch_bGMCA as lgc


N_MC = 1
numExp = 2
ptot = 0.1
cd = 1
dataType = 1
n_sources = [20]

folderInitSave='ResultsDedale/' 
optBatch = 0
verPos = 0
blocksize = [1,2,3,4,5,7,10,13,16,18,20]

C_gmca,S,A = lgc.launch_GMCA(n_sources=n_sources,blocksize=blocksize,numExp=numExp,N_MC=N_MC,dataType=dataType,sauv=2,optBlock=optBatch,folderInitSave=folderInitSave,ptot=ptot,verPos=verPos,cd=cd)

Imports

Only launch_bGMCA.py is imported, since it encompasses all the necessary files.

Data generation parameters

More specifically, it comprehends some random data generation, which can be parametriezd as inputs of the function:

N_MC = 1

Only one Monte Carlo experiment is performed

numExp = 2

Used for setting the random seed for the data generation

ptot = 0.1

Proportion of non-zero coefficients

cd = 1

Condition number of the mixing matrix

dataType = 1

The data is created a Bernouilli Gaussian distribution: a proportion of ptot coefficients is taken non-zeros and have a Gaussian amplitude.

n_sources = [20]

Number of sources. 20 sources is already a quite high number of sources. Take 50+ sources for really large-scale examples.

Parametrization of the algorithm itself

folderInitSave='ResultsDedale/‘

Folder to write the results. It must contain 2 subfolders « data » and « matricesAS ».

optBatch = 0

Corresponds to random choices for the blocks.

verPos = 0

Corresponds to a simple sparsity constraint in the direct domain

blocksize = [1,2,3,4,5,7,10,13,16,18,20]

Block sizes to be tested

Launch of the function

C_gmca,S,A = lgc.launch_GMCA(n_sources=n_sources,blocksize=blocksize,numExp=numExp,N_MC=N_MC,dataType=dataType,sauv=2,optBlock=optBatch,folderInitSave=folderInitSave,ptot=ptot,verPos=verPos,cd=cd)

The function outputs the mixing matrix criterion for each block size. The A and S matrices for the last block size are also available (the matrices for all the block sizes are saved in folderInitSave)

Output

Example: a more difficult case

The next examples deals with non-negative matrix factorization. The A and S matrices are assumed to be sparse in a transformed domain, while non-negative in the direct domain.

N_MC = 1
numExp = 2
folderInitSave='ResultsDedale/'
optBatch = 0

dataType = 3
verPos = 1
J=2

blocksize = [1,2,3,4,5,7,10,13,16,18,20]
n_sources = [20]

C_gmca,S,A = lgc.launch_GMCA(n_sources=n_sources,blocksize=blocksize,numExp=numExp,N_MC=N_MC,dataType=dataType,sauv=2,optBlock=optBatch,folderInitSave=folderInitSave,verPos=verPos,J=J)

Changes in the code

dataType = 3

The data comes now from saved realistic sources. All the matrices are non-negative.

verPos = 1

Enforces the non-negativity and sparsity constraint.

J = 2

Number of wavelet scales.

Output

Required packages

The software requires the following Python packages:

  • numpy
  • scipy.io
  • scipy.linalg
  • matplotlib.pyplot
  • copy
  • time