Matrix multiply algorithm

The algorithm we use in the two large distributed matrices multiplication has three aproaches.

The first one refers to the HAMA 0.1, here you can find more about the algorithm.

Basically, the algorithm is to split a large matrix into smaller ones, and after multiply these submatrices, combine them together.

It's MapReduce version interpretation can be seen below:

We collect the blocks to 'collectionTable' firstly using map/reduce. Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. CollectionTable:

                        matrix A         matrix B
------------------------+-------------------------------             
block(0, 0)-0               block(0, 0)      block(0, 0)
block(0, 0)-1               block(0, 1)      block(1, 0)
block(0, 0)-2               block(0, 2)      block(2, 0)
...         N               ...
block(N-1, n-1)-(N^3-1)     block(N-1, N-1)  block(N-1, N-1)

Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Blocking jobs:

Collect the blocks to 'collectionTable' from A and B.

- A map task receives a row n as a key, and vector of each row as its value
- emit (blockID, sub-vector) pairs
- Reduce task merges block structures based on the information of blockID

Multiplication job:

- A map task receives a blockID n as a key, and two sub-matrices of A and B as its value
- Multiply two sub-matrices: a[i][j] * b[j][k]
- Reduce task computes sum of blocks 
- c[i][k] += multiplied blocks

We transform these two steps into Spark.

We refer to the CARMA algorithm, and redesign another algorithm.
The last one is based on the idea that broadcasts one of the matrices, and split another matrix into blocks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix multiply algorithm

Clone this wiki locally