# Lots of loops

This notebook illustrates the different ways in which loops for matrix-matrix multiplication can be ordered.  Let's start by creating some matrices.

In [1]:
import numpy as np

m = 4
n = 3
k = 5

C = np.matrix( np.random.random( (m, n) ) )
print( 'C = ' )
print( C )

Cold = np.matrix( np.zeros( (m,n ) ) )
Cold = np.matrix( np.copy( C ) )   # an alternative way of doing a "hard" copy, in this case of a matrix
    
A = np.matrix( np.random.random( (m, k) ) )
print( 'A = ' )
print( A )

B = np.matrix( np.random.random( (k, n) ) )
print( 'B = ' )
print( B )

C = 
[[0.57468849 0.67261082 0.73075518]
 [0.308928   0.89245792 0.28350157]
 [0.06804404 0.82660112 0.21237071]
 [0.17217679 0.96991592 0.87078941]]
A = 
[[0.85605356 0.86250952 0.41825529 0.79668813 0.6314444 ]
 [0.58459581 0.80742077 0.6037568  0.10054359 0.84053884]
 [0.87065621 0.83705063 0.50636025 0.11389862 0.11020125]
 [0.39378956 0.74848032 0.95491181 0.27831634 0.52578672]]
B = 
[[0.1455835  0.12005868 0.63432576]
 [0.3541427  0.56089455 0.77962699]
 [0.14295384 0.48354263 0.52182377]
 [0.05482756 0.47874102 0.33193327]
 [0.42959977 0.32409858 0.64980319]]


## <h2>The basic algorithm</h2  <p> Given $ A \in \mathbb{R}^{m \times k} $, $ B \in \mathbb{R}^{k \times n} $, and $ C \in \mathbb{R}^{m \times n} $, we will consider $ C := A B + C $. </p>      <p>     Now, recall that the $ i,j $ element of $ A B $ is computed as the dot product of  the $ i $th row of $ A $ with the $ j $th column of $ B $: </p>  <p>     $\sum_{p=0}^{k-1} \alpha_{i,j} \beta_{i,j}$ </p>  <p>     and here, by adding to $ C $ we get </p>  <p> $ \gamma_{i,j} = \sum_{p=0}^{k-1} \alpha_{i,j} \beta_{i,j} + \gamma_{i,j}.$ </p>  <p>     Now, we have to loop over all elements of $ C $.  The code, without the FLAMEpy API, becomes </p>

In [2]:
def MMmult_lots_of_loops( A, B, C ):

    m, n = np.shape( C )
    m, k = np.shape( A )
    
    # i,j,p
    for i in range( m ):                     
        for j in range( n ):                    
            for p in range( k ):                    
                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
                
    # i,p,j
#    for i in range( m ):                     
#        for p in range( k ):                    
#            for j in range( n ):                    
#                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
                
    # j,i,p
#    for j in range( n ):                     
#        for i in range( m ):                    
#            for p in range( k ):                    
#                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]

    # j,p,i
#    for j in range( n ):                     
#        for p in range( k ):                    
#            for i in range( m ):                    
#                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]

    # p,i,j
#    for p in range( k ):                     
#        for i in range( m ):                    
#            for j in range( n ):                    
#                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]

    # p,j,i
#    for p in range( k ):                     
#        for j in range( n ):                    
#            for i in range( m ):                    
#                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]

In [3]:
C = np.matrix( np.copy( Cold ) )             # restore C

MMmult_lots_of_loops( A, B, C )

print( 'C - ( Cold + A * B )' )
print( C - ( Cold + A * B ) )

C - ( Cold + A * B )
[[0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 2.22044605e-16]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00]]


Now, go back and systematically move the loops around, so that in the end you try out all six orders of the loops: three choices for the first, outermost, loop; two choices for the secod loop; one choice for the third loop, for a total of $ 3! $ (3 factorial) choices. Check that you get the right answer, regardless. 

(We suggest you just change the box in which the routine is defined and comment out variations that you've already tested.  Be careful with indentation.)

## Why $ C := A B + C $ rather than $ C := A B $?

Notice that we could have written a routine to compute $ C := A B $ instead, given below.

In [4]:
def MMmult_C_eq_AB( A, B, C ):

    m, n = np.shape( C )
    m, k = np.shape( A )
    
    for i in range( m ):                     
        for j in range( n ):   
            C[ i,j ] = 0.0
            for p in range( k ):                    
                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]

In [5]:
C = np.matrix( np.copy( Cold ) )             # restore C

MMmult_C_eq_AB( A, B, C )

print( 'C - ( A * B )' )
print( C - ( A * B ) )

C - ( A * B )
[[ 1.11022302e-16  0.00000000e+00  4.44089210e-16]
 [ 0.00000000e+00  0.00000000e+00  2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00 -2.22044605e-16]]


Now, start changing the order of the loops.  You notice it is not quite as simple.  But, if you have a routine for computing $ C := A B + C $, you can always initialize $ C = 0 $ (the zero matrix) and then use it to call $ C := A B $:

In [6]:
C = np.matrix( np.zeros( np.shape( C ) ) )          # initialize C = 0 

MMmult_lots_of_loops( A, B, C )

print( 'C - ( A * B )' )
print( C - ( A * B ) )

C - ( A * B )
[[ 1.11022302e-16  0.00000000e+00  4.44089210e-16]
 [ 0.00000000e+00  0.00000000e+00  2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00 -2.22044605e-16]]
