A CUDA implementation of non-negative matrix factorization for GPUs.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cuda
H.bin
LICENSE
README.md
W.bin
X.bin
matrix_export.m
nmf_ismir_2009.pdf

README.md

nmf-cuda

A CUDA implementation of non-negative matrix factorization for GPUs.

A description of the implementation is available in [1] below. If you use this in your work, please cite [1].

[1] E. Battenberg and D. Wessel, “Accelerating non-negative matrix factorization for audio source separation on multi-core and many-core architectures,” in International Society for Music Information Retrieval Conference (ISMIR 2009), 2009.

Bibtex entry:

@inproceedings{battenberg2009accelerating,
    title={Accelerating nonnegative matrix factorization for audio source separation on multi-core and many-core architectures},
    author={Battenberg, E. and Wessel, D.},
    booktitle={10th International Society for Music Information Retrieval Conference (ISMIR 2009)},
    year={2009}
}

Implementation Details:

Iterative NMF on Cuda: X = W*H

  • multiplicative updates
  • divergence cost function

nmf.cu contains an example usage of the update_div function.

The function read_matrix reads binary floats in from files and stores them in a matrix struct

  • values need to be stored in column-major order, and the first two values in the file are integer dimensions of the matrix
  • an example of how to create the binary files (from e.g. Matlab data) is contained in the matlab script matrix_export.m
  • data can also just be stored in a column-major float array using the matrix struct and proper assignment of the dim values

The function update_div is where the work is done.

void update_div(
        matrix W, matrix H, 
        matrix X,
        float CONVERGE_THRESH,
        int max_iter,
        double t[10],
        int verbose);
  • W, H are initial values for the factor matrices
  • X is the target matrix to be decomposed
  • CONVERGE_THRESH is the convergence threshold (expressed as a ratio of cost function change to cost function value)
  • max_iter is the maximum number of iterations
  • double t[10] is a pointer to a double array of at least size 10 that will contain individual timing results for different computational pieces. (set this to NULL for normal use)
  • verbose set to 1 if you want more text output, 0 otherwise