Skip to content

Linear Algebra Cheat Sheet

Cloud Tang edited this page Aug 7, 2015 · 5 revisions

Core Concepts

Compared to other numerical computing environments, Marlin matrices default to column major ordering, like Breeze, but indexing is 0-based, like Numpy.

Creation

sc represents created SparkContxt, rows represents the number of rows of the matrix, columns represents the number of columns of the matrix, path represents the path in hdfs.

Operation API
Zeroed matrix MTUtils.zerosDenVecMatrix(sc, rows, columns)
Ones matrix MTUtils.onesDenVecMatrix(sc, rows, columns)
Random elements (from 0 to 1) matrix MTUtils.randomDenVecMatrix(sc, rows, columns)
Matrix creation from two-dimension array MTUtils.arrayToMatrix(sc, array, partitions)
DenseVecMatrix creation from text file MTUtils.loadMatrixFile(sc, path)
DenseVecMatrix creation from text files in a directory MTUtils.loadMatrixFiles(sc, path)
Blockmatrix creation from text file MTUtils.loadBlockMatrixFile(sc, path)
DenseVecMatrix creation from sequence file MTUtils.loadMatrixSeqFile(sc, path)

API randomDenVecMatrix default gengerates elements from 0 to 1 in uniform distribution, if you want to gengerate elemnts value with different range, you can use code below

import import edu.nju.pasalab.marlin.utils._ 
val generator = new UniformGenerator(-1.0, 1.0)
val rows, cols = 10000
val mat =  MTUtils.randomDenVecMatrix(sc, rows, cols, generator)

Except uniform distribution, we also support PoissonGenerator(mean: Double) and StandardNormalGenerator(mean: Double, variance: Double).

When load matrix from files, the orginal format can be as like 2:1, 0.0, 4 9.1, then call method MTUtils.loadMatrixFile, it will generate a pair (2L, DenseVector(1.0,0.0,4.0,9.1)) in RDD.

Slicing Matrices

Currently, these APIs are only suporrted in DenseVecMatrix type, parameters startRow, endRow, startCol and endCol are all included boundaries.

Operation API
Extract rows of matrix sliceByRow(startRow, endRow)
Extract columns of matrix sliceByColumn(startCol, endCol)
get sub-Matrix getSubMatrix(startRow, endRow, startCol, endCol)

Operations

Operation API
Matrix-matrix addition add(other: DistributedMatrix)
Matrix-matrix subtraction subtract(other: DistributedMatrix)/td>
Matrix-matrix multiplication multiply(other: DistributedMatrix, cores: Int)
Element-wise addition add(b: Double)
Element-wise subtraction subtract(b: Double) / substactBy(b: Double)
Element-wise multiplication multiply(b: Double)
Element-wise division divide(b: Double) / divideBy(b: Double)
transpose of matrix transpose()
lu decompose of matrix luDecompose()
cholesky decompose of matrix choleskyDecompose()
inverse of matrix inverse()

Storage of matrix

Operation API
Save the matrix to hdfs in text format saveToFileSystem(path)
Save the matrix to hdfs in SequenceFile format saveSequenceFile(path)

If the original matrix is in DenseVecMatrix type, then call saveToFileSystem(path) method will save the matrix in the DenseVecMatrix text format. The output directory will store these files in which each line may be unordered. You can then call MTUtils.loadMatrixFiles(sc, path) method to create matrix from these files.

On the other hand, if the original matrix is in BlockMatrix type, call saveToFileSystem(path, "blockmatrix") method will save the matrix in the BlockMatrix text format. Otherwise, you can call saveToFileSystem(path), it will do another transformation from BlockMatrix to DenseVecMatrix and store the result in DenseVecMatrix text format.

Others

print the matrix out to screen, easy usage for developers print()