A CUDA implementation of non-negative matrix factorization for GPUs.
Cuda C Matlab
Switch branches/tags
Nothing to show
Latest commit fa8c54c Jul 3, 2014 @ebattenberg Update README.md
Permalink
Failed to load latest commit information.
cuda
H.bin
README.md
W.bin
X.bin
matrix_export.m
nmf_ismir_2009.pdf

README.md

nmf-cuda

A CUDA implementation of non-negative matrix factorization for GPUs.

A description of the implementation is available in [1] below. If you use this in your work, please cite [1].

[1] E. Battenberg and D. Wessel, “Accelerating non-negative matrix factorization for audio source separation on multi-core and many-core architectures,” in International Society for Music Information Retrieval Conference (ISMIR 2009), 2009.

Bibtex entry: ''' @inproceedings{battenberg2009accelerating, title={Accelerating nonnegative matrix factorization for audio source separation on multi-core and many-core architectures}, author={Battenberg, E. and Wessel, D.}, booktitle={10th International Society for Music Information Retrieval Conference (ISMIR 2009)}, year={2009} } '''

#Implementation Details:

Iterative NMF on Cuda: X = W*H
multiplicative updates divergence cost function

nmf.cu contains an example usage of the update_div function.

The function read_matrix reads binary floats in from files and stores them in a matrix struct

  • values need to be stored in column-major order, and the first two values in the file are integer dimensions of the matrix
  • an example of how to create the binary files (from e.g. Matlab data) is contained in the matlab script matrix_export.m
  • data can also just be stored in a column-major float array using the matrix struct and proper assignment of the dim values

The function update_div is where the work is done.

void update_div(
        matrix W, matrix H, 
        matrix X,
        float CONVERGE_THRESH,
        int max_iter,
        double t[10],
        int verbose);
  • W, H are initial values for the factor matrices
  • X is the target matrix to be decomposed
  • CONVERGE_THRESH is the convergence threshold (expressed as a ratio of cost function change to cost function value)
  • max_iter is the maximum number of iterations
  • double t[10] is a pointer to a double array of at least size 10 that will contain individual timing results for different computational pieces. (set this to NULL for normal use)
  • verbose set to 1 if you want more text output, 0 otherwise