implementations of CX, PCA, and NMF factorizations in Spark and MPI
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


A collection of code for computing truncated PCAs (Spark and C/MPI),
Nonnegative Matrix Factorization (Spark and C/MPI), and randomized CX (Spark),
collated from the separate code-bases used to compile the experimental results

"Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in
Spark and C+MPI Using Three Case Studies" by Alex Gittens, Aditya Devarakonda,
Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottaalam, et al.
(technical report available at

Originally, this code was for timing and producing output for specific
scientific problems, so there are places where extra code not relevant to
general purpose use is present and places where code that would be relevant for
general purpose use (e.g., storing the C and X from the CX decomposition) is
missing. These issues are being worked on.

One specific issue worth noting is that these codes were written to compile on
the Cori NERSC System, so some of the compilation procedures will need to be
changed for your system. 

Authors of the code:
  Alex Gittens (corresponding author:
  Aditya Devarakonda
  Jey Kottalam