WON-PARAFAC is a variant of parallel factor analysis (PARAFAC), a tensor factorization method. WON-PARAFAC impose the following three constraints on the standard PARAFAC:
- Weighting scheme
- For balanced integration of the multiple data types
- Orthogonality constraint
- To reduce overlapping between a factor (originally used on gene mode). This also introduces extra sparcity on the mode.
- Non-negativity
- To induce sparse and parts-based representation.
A multiplicative update rule was used to derive the algorithm, as in the original NMF implementation from Lee & Seung (Nature, 1999). The code requires tensor toolbox version 2.6 (by Tamara Kolda), freely available for non-commercial use upon registration.
For running the code, tenstor toobox must be avilable on the path environment, using addpath
command in MATLAB.
You can load demo data, which contains pan-cancer multiomics data produced in GDSC1000 project (Sanger). You can load the data by:
load Demo.mat
The command will load a varialbe X
, a 3-way tensor (1815 gene by 935 cell lines and 5 data types).
Note that the 5 data types corresponds to below:
- positive gene expression levels (non-negative continuous; GE(+))
- absolute value of negative gene expression levels (non-negative continuous; GE(-))
- mutation (binary; MT)
- copy number gain (binary; CN(+))
- copy number loss (binary; CN(-))
The list of genes names in X
is indicated in genenames
, which will also be loaded together with X
.
Demo.m
will perform WON-PARAFAC analysis using random 100 genes by default, and varying number of factors and strength of orthogonal constraint on gene factor matrix.
- Number of basis: 10, 20, 30, ..., 200
- Strength of orthogonal constraint: 0 (no constraint), 0.2, 0.5, 1
Finally, a plot will be generated to show the performance of WON-PARAFAC for reconstructing input tensor (see below for an example).