Skip to content
Gabriel A. Devenyi edited this page Feb 18, 2021 · 4 revisions

The following page describes how to run an NMF in R. NMF is a technique that decomposes an input matrix (m x n) into a component matrix (we call this W and its dimensions are m x k) and a weight matrix (we call this H and its dimensions are k x n). k is the number of components (i.e. factorization rank) and we select it based on certain criteria (e.g. accuracy, stability). Of note, the multiplication of the component and weight matrices reconstructs the input matrix in a manner that minimizes reconstruction error.

It deserves mention that there are many variants of NMF. Below, the non-smooth NMF algorithm is used but this can be changed depending on what you want to accomplish.

This wiki is largely based on the steps at: https://cran.r-project.org/web/packages/NMF/vignettes/NMF-vignette.pdf and https://cran.r-project.org/web/packages/NMF/NMF.pdf

The citation is: Gaujoux, R., Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11, 367 (2010) doi:10.1186/1471-2105-11-367

Preparing your data for NMF run

#load library
library(NMF)

#load in your data
data <- read.csv('/path/to/your/data.csv')

#if you want to Z-score
data<-scale(data)

#if input matrix is subjects (rows) by features (columns), transpose it
transposed_data<-t(data)

Importantly, if your matrix has negative values, you need to add by the lowest negative value. For example, if min(transposed_data) outputs -3.1, I add 3.100001 to the matrix. On that note, instead of z-scoring, which will result in negative values, you can: 1) use the natural log or 2) divide by standard deviation.

NMF run

#NMF
nmf_nndsvd <- nmf(transposed_data, 3, 'nsNMF', nrun=500, seed='nndsvd', .options='t')
#note: 3 here is number of components (k). One method to select k is described below. 
#note: nsNMF refers to non-smooth NMF algorithm. Type nmfAlgorithm() to list available algorithms.
#note: nrun can be lowered to speed up your run
#note: nndsvd seed refers to nonnegative double singular value decomposition. It is used here to enforce sparsity. Type nmfSeed() to list available options.

#retrieve fitted model
fit(nmf_nndsvd)

#estimate target matrix
fittedresult_nndsvd<-fitted(nmf_nndsvd)
dim(fittedresult_nndsvd)

#estimate smoothing parameters
smoothing(fit(nmf_nndsvd))

#quality and performance measures
summary(nmf_nndsvd)
#additional quality and performance measures
summary(nmf_nndsvd, target=transposed_data)

#get matrix W; features by factors
w_nndsvd<-basis(nmf_nndsvd)
dim(w_nndsvd)

#get matrix H; factors by subjects
h_nndsvd<-coef(nmf_nndsvd)
dim(h_nndsvd)

Finding component number (k) to use

#run NMF for k=2:10
estim.r <- nmf(transposed_data, 2:10, 'ns', nrun=100, seed='nndsvd')
plot(estim.r)

#randomize your data
V.random <- randomize(transposed_data)

#run NMF for k=2:10 on randomized data
estim.r.random <- nmf(V.random, 2:10, 'ns', nrun=100, seed='nndsvd')
#can plot(estim.r, estim.r.random) to visualize
#can decrease the number of nruns to 10 to speed this up

In estim.r[["measures"]][["rss"]], you will find the RSS, which can be used to calculate the RSS change between components. Graphing the RSS change with increasing components helps to select how many components to use for the actual NMF run.

Visualization

library(heatmaply)

#z-score the h matrix and cluster using Ward's method
#note: dx_group (e.g. patient, control) only used for visualization
a_nndsvd<-h_nndsvd
a_nndsvd[1,] = scale(a_nndsvd[1,])
a_nndsvd[2,] = scale(a_nndsvd[2,])
a_nndsvd[3,] = scale(a_nndsvd[3,])
heatmaply(a_nndsvd, dendrogram="column", colorbar_len = 0.4, colorbar_yanchor = "bottom", k_col=4, 
          fontsize_row = 16,
          hclust_method = "ward.D2", 
          ColSideColors = dx_group,
          plot_method = "plotly", key.title = 'Z-Score')%>%
  colorbar(tickfont = list(size = 15), titlefont = list(size = 15), which = 1, x=1.05, y=0.88)%>%
  colorbar(tickfont = list(size = 15), titlefont = list(size = 15), which = 2, tickmode = "auto", x=1.05)

#z-score the w matrix and cluster using Ward's method
b_nndsvd<-w_nndsvd
b_nndsvd[,1] = scale(b_nndsvd[,1])
b_nndsvd[,2] = scale(b_nndsvd[,2])
b_nndsvd[,3] = scale(b_nndsvd[,3])
heatmaply(b_nndsvd, dendrogram="row", hclust_method = "ward.D2", cexRow = 0.5, cexCol = 3)

Extra stuff

#comparing algorithms
res.multi.method <-nmf(transposed_data, 3, list('brunet', 'lee', 'ns'), seed='nndsvd', .options='t')
compare(res.multi.method)

#ploting residuals
nmf_nndsvd <- nmf(transposed_data, 3, .options='t')
plot(nmf_nndsvd)
Clone this wiki locally