# Biostat M280: Matrix Multiplication By Looping

##### Dr. Hua Zhou, Jan 18, 2016

This example shows the effect of the looping order on computational efficiency.

This is a `Julia` function for matrix multiplication by simple looping.

In [1]:
function matmul_by_loop!(A::Matrix{Float64}, B::Matrix{Float64}, 
    C::Matrix{Float64}, order::ASCIIString)
    
    m = size(A, 1)
    n = size(A, 2)
    p = size(B, 2)

    # make sure C has entries 0
    fill!(C, 0.0)
    
    if order == "jki"
        for j = 1:n
            for k = 1:p
                for i = 1:m
                    C[i, j] += A[i, k] * B[k, j]
                end
            end
        end
    end

    if order == "kji"
        for k = 1:p
            for j = 1:n
                for i = 1:m
                    C[i, j] += A[i, k] * B[k, j]
                end
            end
        end
    end
    
    if order == "ikj"
        for i = 1:m
            for k = 1:p
                for j = 1:n
                    C[i, j] += A[i, k] * B[k, j]
                end
            end
        end
    end

    if order == "kij"
        for k = 1:p
            for i = 1:m
                for j = 1:n
                    C[i, j] += A[i, k] * B[k, j]
                end
            end
        end
    end
    
    if order == "ijk"
        for i = 1:m
            for j = 1:n
                for k = 1:p
                    C[i, j] += A[i, k] * B[k, j]
                end
            end
        end
    end
    
    if order == "jik"
        for j = 1:n
            for i = 1:m
                C[i, j] = 0.0
                for k = 1:p
                    C[i, j] += A[i, k] * B[k, j]
                end
            end
        end
    end
    
end

matmul_by_loop! (generic function with 1 method)

Generate data.

In [2]:
n = 1000; m = 1000; p = 1000
A = rand(n, m)
B = rand(m, p)
C = zeros(n, p);

Now let's compute matrix multiplication by different looping order.

In [3]:
@elapsed matmul_by_loop!(A, B, C, "jki")

2.161883603

In [4]:
@elapsed matmul_by_loop!(A, B, C, "kji")

2.180031129

In [5]:
@elapsed matmul_by_loop!(A, B, C, "ikj")

15.215429012

In [6]:
@elapsed matmul_by_loop!(A, B, C, "kij")

14.882370612

In [7]:
@elapsed matmul_by_loop!(A, B, C, "ijk")

6.601579471

In [8]:
@elapsed matmul_by_loop!(A, B, C, "jik")

7.274852647

How much time does BLAS take?

In [9]:
@elapsed C = A * B

0.347075744

Show system information.

In [10]:
versioninfo()

Julia Version 0.4.2
Commit bb73f34 (2015-12-06 21:47 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
