Skip to content

Matrix multiply algorithm

Myasuka edited this page Oct 19, 2014 · 2 revisions

The algorithm we use in the two large distributed matrices multiplication has three aproaches.

  • The first one refers to the HAMA 0.1, here you can find more about the algorithm.

Basically, the algorithm is to split a large matrix into smaller ones, and after multiply these submatrices, combine them together.

It's MapReduce version interpretation can be seen below:

We collect the blocks to 'collectionTable' firstly using map/reduce. Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. CollectionTable:

                        matrix A         matrix B
------------------------+-------------------------------             
block(0, 0)-0               block(0, 0)      block(0, 0)
block(0, 0)-1               block(0, 1)      block(1, 0)
block(0, 0)-2               block(0, 2)      block(2, 0)
...         N               ...
block(N-1, n-1)-(N^3-1)     block(N-1, N-1)  block(N-1, N-1)

Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Blocking jobs:

Collect the blocks to 'collectionTable' from A and B.

- A map task receives a row n as a key, and vector of each row as its value
- emit (blockID, sub-vector) pairs
- Reduce task merges block structures based on the information of blockID

Multiplication job:

- A map task receives a blockID n as a key, and two sub-matrices of A and B as its value
- Multiply two sub-matrices: a[i][j] * b[j][k]
- Reduce task computes sum of blocks 
- c[i][k] += multiplied blocks

We transform these two steps into Spark.

  • We refer to the CARMA algorithm, and redesign another algorithm.
  • The last one is based on the idea that broadcasts one of the matrices, and split another matrix into blocks.