# Homework Batch I: Matrix Multiplication
## Davide Basso - SM3500450

In the following functions I used the class Matrix, which we've implemented during lessons and it can be found within the **matrix.py** file.

In [10]:
from __future__ import annotations
from numbers import Number
from typing import List, Tuple
from random import random, seed
from sys import stdout
from timeit import timeit
import math
from matrix import *

1) implement the strassen matrix mult function to multiply two $2^n \times 2^n$ matrices by using the Strassen's algorithm;

The implementation is pretty straightforward by following the original formulation that we saw during classes:

In [11]:
def strassen_matrix_mult(A: Matrix, B: Matrix) -> Matrix:
    assert A.num_of_cols == B.num_of_rows , "Incompatible matrices"

    # Base case
    if max(A.num_of_rows, B.num_of_cols, A.num_of_cols) < 32:
        return gauss_matrix_mult(A, B)
    
    # Recursive step
    A11, A12, A21, A22 = get_matrix_quadrants(A)
    B11, B12, B21, B22 = get_matrix_quadrants(B)

    # First batch of sums has cost Theta(n^2)
    S1 = B12 - B22
    S2 = A11 + A12
    S3 = A21 + A22
    S4 = B21 - B11
    S5 = A11 + A22
    S6 = B11 + B22 
    S7 = A12 - A22 
    S8 = B21 + B22
    S9 = A11 - A21
    S10 = B11 + B12 

    # Recursive calls
    P1 = strassen_matrix_mult(A11, S1)
    P2 = strassen_matrix_mult(S2, B22)
    P3 = strassen_matrix_mult(S3, B11)
    P4 = strassen_matrix_mult(A22, S4)
    P5 = strassen_matrix_mult(S5, S6)
    P6 = strassen_matrix_mult(S7, S8)
    P7 = strassen_matrix_mult(S9, S10)

    # Second batch of sums has cost Theta(n^2)
    C11 = P5 + P4 - P2 + P6
    C12 = P1 + P2
    C21 = P3 + P4
    C22 = P5 + P1 - P3 - P7

    # Build resulting matrix
    result_matrix = Matrix([[0 for x in range(B.num_of_cols)] for y in range(A.num_of_rows)],
                            clone_matrix=False)
    
    # Copying Cij into the resulting matrix
    result_matrix.assign_submatrix(0, 0, C11)
    result_matrix.assign_submatrix(0, result_matrix.num_of_cols//2, C12)
    result_matrix.assign_submatrix(result_matrix.num_of_rows//2, 0, C21)
    result_matrix.assign_submatrix(result_matrix.num_of_rows//2, result_matrix.num_of_cols//2, C22)
    
    return result_matrix


2) generalize strassen matrix mult to deal with any kind of matrix pair that can be multiplied (possibly also non-square matrices) and prove that the asymptotic complexity does not change;

We know that Strassen's algorithm can work only with square matrices in which the size of the columns or row is equal to $2^n$. 
A more general formulation of this algorithm can be achieved thanks to a technique called Padding.
In fact, padding with zeros the matrix given as input if this one is not square or has as number of rows or column a number which is not equal to $2^n$ makes us able to recall and use the trivial Strassen algorithm implementation after having padded those matrices to a $N \times N$ matrix, where $N$ will be the nearest power of two of the biggest size among rows and columns.

In [12]:
# Some auxiliary functions used in the implementation of the following methods

# Find the next power of 2 of a given number n
def next_power_of_two(n):
    return int(math.pow(2, math.ceil(math.log(n)/math.log(2))))

# Check if number n is a power of 2
def is_power_of_two(n):
    return ((n & (n-1) == 0) and n != 0)

# Check if number n is even
def is_even(n):
    return (n%2 == 0)

In [13]:
def generalized_pow_strassen_matrix_mult(A: Matrix, B: Matrix) -> Matrix:
    assert A.num_of_cols == B.num_of_rows , "Incompatible matrices"

    # find which dimension is the largest
    biggest = max(A.num_of_rows, A.num_of_cols, B.num_of_cols)

    # base case
    if(A.num_of_cols==A.num_of_rows and B.num_of_cols==B.num_of_rows) and (is_power_of_two(A.num_of_rows)):
        return strassen_matrix_mult(A,B)

    # padding of matrices if needed
    else:
        A_padded = Matrix([[0 for y in range(next_power_of_two(biggest))] 
                            for x in range(next_power_of_two(biggest))], clone_matrix=False)
        A_padded.assign_submatrix(0,0,A)

        B_padded = Matrix([[0 for y in range(next_power_of_two(biggest))] 
                            for x in range(next_power_of_two(biggest))], clone_matrix=False)
        B_padded.assign_submatrix(0,0,B)
    
    return strassen_matrix_mult(A_padded, B_padded).submatrix(0, A.num_of_rows, 0, B.num_of_cols)
    

Another approach consists in adding just a row or a column if the number of those is equal to an odd number. This could also lead to a smaller overhead w.r.t previous method since we don't have to allocate such bigger matrix than the original ones. Here below the implementation:


In [14]:
def generalized_even_strassen_matrix_mult(A: Matrix, B: Matrix) -> Matrix:
    assert A.num_of_cols == B.num_of_rows , "Incompatible matrices"

    if(A.num_of_cols==A.num_of_rows and B.num_of_cols==B.num_of_rows) and (is_power_of_two(A.num_of_rows)):
        return strassen_matrix_mult(A,B)

    if max(A.num_of_rows, B.num_of_cols, A.num_of_cols) < 32:
        return gauss_matrix_mult(A, B)
    
    # next even padding
    A_even_row = A.num_of_rows + A.num_of_rows%2
    A_even_col = A.num_of_cols + A.num_of_cols%2
    B_even_row = B.num_of_rows + B.num_of_rows%2
    B_even_col = B.num_of_cols + B.num_of_cols%2
    
    A_padded = Matrix([[0 for y in range(A_even_col)] for x in range(A_even_row)], clone_matrix=False)
    A_padded.assign_submatrix(0,0,A)
    B_padded = Matrix([[0 for y in range(B_even_col)] for x in range(B_even_row)], clone_matrix=False)
    B_padded.assign_submatrix(0,0,B)
    
    # Recursive step
    A11, A12, A21, A22 = get_matrix_quadrants(A_padded)
    B11, B12, B21, B22 = get_matrix_quadrants(B_padded)

    # First batch of sums has cost Theta(n^2)
    S1 = B12 - B22
    S2 = A11 + A12
    S3 = A21 + A22
    S4 = B21 - B11
    S5 = A11 + A22
    S6 = B11 + B22 
    S7 = A12 - A22 
    S8 = B21 + B22
    S9 = A11 - A21
    S10 = B11 + B12 

    # Recursive calls
    P1 = generalized_even_strassen_matrix_mult(A11, S1)
    P2 = generalized_even_strassen_matrix_mult(S2, B22)
    P3 = generalized_even_strassen_matrix_mult(S3, B11)
    P4 = generalized_even_strassen_matrix_mult(A22, S4)
    P5 = generalized_even_strassen_matrix_mult(S5, S6)
    P6 = generalized_even_strassen_matrix_mult(S7, S8)
    P7 = generalized_even_strassen_matrix_mult(S9, S10)

    # Second batch of sums has cost Theta(n^2)
    C11 = P5 + P4 - P2 + P6
    C12 = P1 + P2
    C21 = P3 + P4
    C22 = P5 + P1 - P3 - P7

    # Build resulting matrix
    result_matrix = Matrix([[0 for y in range(B_even_col)] for x in range(A_even_row)],
                            clone_matrix=False)
    
    # Copying Cij into the resulting matrix
    result_matrix.assign_submatrix(0, 0, C11)
    result_matrix.assign_submatrix(0, result_matrix.num_of_cols//2, C12)
    result_matrix.assign_submatrix(result_matrix.num_of_rows//2, 0, C21)
    result_matrix.assign_submatrix(result_matrix.num_of_rows//2, result_matrix.num_of_cols//2, C22)
    
    return result_matrix.submatrix(0,A.num_of_rows,0,B.num_of_cols)


Checking if the results are compatible by taking the difference between the two resulting matrices, using both methods implemented before:

In [6]:
n=[59,31,10,38,91,234,387]
m=[37,61,42,35,73,212,315]

for i,j in zip(n,m):
    A = Matrix([[random() for k in range(j)]for l in range(i)])
    B = Matrix([[random() for k in range(i)]for l in range(j)])

    C = gauss_matrix_mult(A,B)
    D = generalized_even_strassen_matrix_mult(A,B)
    E = generalized_pow_strassen_matrix_mult(A,B)
    diff_even = C-D
    diff_pow = C-E
    sum1 = 0
    sum2 = 0

    for z in range(i):
        for y in range(i):
            sum1 = sum1 + diff_even[z][y]
            sum2 = sum2 + diff_pow[z][y]

    print(f'The difference between the two matrices using next power of 2 method is = {sum1}')
    print(f'The difference between the two matrices using next even number method is = {sum2}')

The difference between the two matrices using next power of 2 method is = 0
The difference between the two matrices using next even number method is = 0
The difference between the two matrices using next power of 2 method is = 0
The difference between the two matrices using next even number method is = 0
The difference between the two matrices using next power of 2 method is = 0
The difference between the two matrices using next even number method is = 0
The difference between the two matrices using next power of 2 method is = 0
The difference between the two matrices using next even number method is = 0
The difference between the two matrices using next power of 2 method is = 0
The difference between the two matrices using next even number method is = 0


It's important to notice that for both methods, with padding we are not affecting asymptotic complexity since in the worst case we will enlarge the number of columns or rows up to $N$ which is $<2n$ (for the first method e.g $n=1025$ will become $N=2048$). So knowing that time complexity for Strassen algorithm is $O(n^{log_{2}7})$ we can prove that this actually is left unchanged:
$$ O(N^{log_{2}7}) < O(2n^{log_{2}7}) = O(7n^{log_{2}7}) \in O(n^{log_{2}7}) $$


Finally we can make a comparison between timings using the usual Gauss method, the padding to next even number and to next power of 2:

In [12]:
seed(0)

for i in range(7):
    size = 3**i
    stdout.write(f'{size}') 
    A = Matrix([[random() for x in range(size)] for y in range(size)])
    B = Matrix([[random() for x in range(size)] for y in range(size)])

    for funct in ['gauss_matrix_mult','generalized_even_strassen_matrix_mult','generalized_pow_strassen_matrix_mult']:
        T = timeit(f'{funct}(A,B)', globals=locals(), number=1)
        stdout.write('\t{:.3f}'.format(T))
        stdout.flush()
    stdout.write('\n')

1	0.000	0.000	0.000
3	0.000	0.000	0.000
9	0.000	0.000	0.003
27	0.007	0.009	0.017
81	0.231	0.227	0.804
243	5.182	4.493	5.359
729	166.004	110.730	288.919


Results show that the latter method is strongly affected by the instantiation of a much larger matrix w.r.t the original one, so even if the complexity is the same it is not convenient to use this kind of approach. On the other hand we can observe that the generalized method using padding to the next even number is very efficient and can lead to strong benefits.

3) improve the implementation of the function by reducing the number of auxiliary matrices and test the effects on the execution time;

The idea I followed in order to reduce the number of auxiliary matrices was to store within the original final matrix $C$ the matrices given as result of the partials multiplications. Moreover to enhance memory usage efficiency I opted, instead of assigning to each one of the matrices $P_i$ a different space in memory, to compute them in the same matrix M.
To do so I also used an additional method *add_submatrix* which can be found directly matrix.py file.

In [6]:
def opt_strassen_matrix_mult(A: Matrix, B: Matrix) -> Matrix:

    # Base case
    if max(A.num_of_rows, B.num_of_cols, A.num_of_cols) < 32:
        return gauss_matrix_mult(A, B)
    
    # Recursive step
    A11, A12, A21, A22 = get_matrix_quadrants(A)
    B11, B12, B21, B22 = get_matrix_quadrants(B)

    # Build resulting matrix
    result_matrix = Matrix([[0 for x in range(B.num_of_cols)] for y in range(A.num_of_rows)],
                            clone_matrix=False)

    # Recursive calls and putting already in the correct position of the result matrix 
    # the matrix obtained by the multiplication
    
    M = strassen_matrix_mult(A11, B12 - B22)
    result_matrix.add_submatrix(0, result_matrix.num_of_cols//2,M)
    result_matrix.add_submatrix(result_matrix.num_of_rows//2, result_matrix.num_of_cols//2,M)

    M = strassen_matrix_mult(A11 + A12, B22)
    result_matrix.add_submatrix(0, 0, -1*M)
    result_matrix.add_submatrix(0, result_matrix.num_of_cols//2, M)

    M = strassen_matrix_mult(A21 + A22, B11)
    result_matrix.add_submatrix(result_matrix.num_of_rows//2, 0, M)
    result_matrix.add_submatrix(result_matrix.num_of_rows//2, result_matrix.num_of_cols//2, -1*M)

    M = strassen_matrix_mult(A22, B21 - B11)
    result_matrix.add_submatrix(0, 0, M)
    result_matrix.add_submatrix(result_matrix.num_of_rows//2, 0, M)

    M = strassen_matrix_mult(A11 + A22,  B11 + B22)
    result_matrix.add_submatrix(0, 0, M)
    result_matrix.add_submatrix(result_matrix.num_of_rows//2, result_matrix.num_of_cols//2, M)

    M = strassen_matrix_mult(A12 - A22, B21 + B22)
    result_matrix.add_submatrix(0, 0, M)


    M = strassen_matrix_mult(A11 - A21,  B11 + B12)
    result_matrix.add_submatrix(result_matrix.num_of_rows//2, result_matrix.num_of_cols//2, -1*M)
    
    return result_matrix


Here the some testings in order to check result's correctness and effects on execution time:

In [18]:
for i in range(9):
    size = 2**i
    A = Matrix([[random() for k in range(size)]for l in range(size)])
    B = Matrix([[random() for k in range(size)]for l in range(size)])

    C = gauss_matrix_mult(A,B)
    D = opt_strassen_matrix_mult(A,B)
    diff = C-D
    sum = 0
    for x in range(i):
        for y in range(i):
            sum = sum + diff[x][y]

    print(f'The difference between the two matrices using optimized method is = {sum}')

The difference between the two matrices using optimized method is = 0
The difference between the two matrices using optimized method is = 0.0
The difference between the two matrices using optimized method is = 0.0
The difference between the two matrices using optimized method is = 0.0
The difference between the two matrices using optimized method is = 0.0
The difference between the two matrices using optimized method is = 3.552713678800501e-15
The difference between the two matrices using optimized method is = 1.1901590823981678e-13
The difference between the two matrices using optimized method is = -1.4566126083082054e-13
The difference between the two matrices using optimized method is = -5.044853423896711e-13


In [15]:
seed(0)

for i in range(11):
    size = 2**i
    stdout.write(f'{size}') 
    A = Matrix([[random() for x in range(size)] for y in range(size)])
    B = Matrix([[random() for x in range(size)] for y in range(size)])

    for funct in ['gauss_matrix_mult','strassen_matrix_mult','opt_strassen_matrix_mult']:
        T = timeit(f'{funct}(A,B)', globals=locals(), number=1)
        stdout.write('\t{:.3f}'.format(T))
        stdout.flush()
    stdout.write('\n')

1	0.000	0.000	0.000
2	0.000	0.000	0.000
4	0.000	0.000	0.000
8	0.000	0.000	0.000
16	0.003	0.002	0.002
32	0.011	0.016	0.014
64	0.096	0.106	0.123
128	0.766	0.743	0.723
256	6.089	5.409	5.429
512	54.996	43.639	42.039
1024	515.849	277.411	275.665


We can see that with this approach the algorithm actually performs better expecially for larger sizes of the matrix.

4) answer to the following question: how much is the minimum auxiliary
space required to evaluate the Strassen's algorithm? Motivate the answer.

The minimum auxiliary space required to evaluate Strassen's algorithm without any kind of modifications is:

$\begin{aligned}
M(n) &= 7M(\frac{n}{2})+ \Theta(n^2), \space n>1 \\
M(n) &= 1, \space n = 1
\end{aligned}$

This is because at each iteration we need to allocate memory for the 10 partial sums (that follows $\Theta(n^2)$ complexity) and for the 7 matrices in which we store the partial multiplications.
As we can notice the formulation is the same as the one for arithmetic complexity of Strassen's algorithm, so the final result is that the minimum axiliary space required is $O(n^{log_{2}7})$.

The previous result is not valid for the optimized algorithm; since we eliminate instantianion of $P_i$ matrices and use only one of matrix $M$ we'll have:

$\begin{aligned}
M(n) &= M(\frac{n}{2})+ \Theta(n^2), \space n>1 \\
M(n) &= 1, \space n = 1
\end{aligned}$

Following the computations $M(n) = M(n/2) + n^2 = M(n/4) + (n/2)^2 + n^2 = \space ... \space = n^2(1+\frac{1}{4}+\frac{1}{4^2}+...+\frac{1}{4^{log_{2}n}}) <= 2n^2$, so we have that the complexity is $O(n^2)$.