## Task1

### The difficulties of the CSR format

The performance of the code based on CSR to compute the sparse matrix vector product (i.e. u=Av where A is the sparse matrix, u and v are the output and input vectors respectively) on superscalar architectures is difficult to optimized due to the two drawbacks of the code:

First, the access locality of vector v is not maintained due to the indirect addressing. 
Second, the fine grained parallelism is not exploited because the number of iterations of the inner loop is small and variable.


### How Ellpack and Ellpack-R solve these difficulties

Ellpack format:

This format stores the sparse matrix on two arrays, one float A[ ], to save the entries, and one integer J[ ], to save the column index of every entry. Both arrays are of dimension N × Max nzr at least, where N is the number of rows and Max_nzr is the maximum number of non-zeros per row in the matrix, with the maximum being taken over all rows. Note that the size of all rows in these compressed arrays A[ ] and J[ ] is the same, because every row is padded with zeros. 

Focusing our interest on the GPU architecture and if every element i of vector u is computed by a thread identified by index x = i and the arrays store their elements in column-major order, then the SpMV based on ELLPACK can improve the performance due to the coalesced global memory access and non-synchronized execution between different thread blocks.


Ellpack-R format:

ELLPACK-R consists of two arrays, A[ ] (float) and J[ ] (integer) of dimension N × Max nzr; and, moreover, an additional integer array called rl[ ] of dimension N (i.e. the number of rows) is included with the purpose of storing the actual length of every row, regardless of the number of the zero elements padded.

The algorithms ELLR-T to compute SpMV with GPUs take advantage of: (1) Coalesced and aligned global memory access. (2) Homogeneous computing within the warps. (3) Reduction of useless computation and unbalance of the threads of one warp. (4) High occupancy.

## Task2

In [1]:
from scipy.sparse.linalg import LinearOperator
import numpy as np

In [2]:
mat = np.array([
    [1, 0, 0, 2, 0, 7],
    [3, 4, 0, 5, 0, 7],
    [6, 0, 7, 8, 9, 4],
    [0, 0, 10, 11, 0, 5],
    [0, 0, 0, 7, 9, 6],
    [7, 6, 4, 0, 5, 7]
])
print(mat)

[[ 1  0  0  2  0  7]
 [ 3  4  0  5  0  7]
 [ 6  0  7  8  9  4]
 [ 0  0 10 11  0  5]
 [ 0  0  0  7  9  6]
 [ 7  6  4  0  5  7]]


In [3]:
from scipy.sparse import csr_matrix
csr_mat = csr_matrix(mat)

In [4]:
print(csr_mat.data)
print(csr_mat.indices)
print(csr_mat.indptr)
print(csr_mat.shape)

[ 1  2  7  3  4  5  7  6  7  8  9  4 10 11  5  7  9  6  7  6  4  5  7]
[0 3 5 0 1 3 5 0 2 3 4 5 2 3 5 3 4 5 0 1 2 4 5]
[ 0  3  7 12 15 18 23]
(6, 6)


In [7]:
import scipy
from scipy.sparse.linalg import LinearOperator
import numpy as np
import numba


class EllpackMatrix:
    
    def __init__(self, mtr):
        
        self.shape = mtr.shape
        
        data = mtr.data
        indices = mtr.indices
        indptr = mtr.indptr
        
        rl = []
        col = 0
        for i in range(self.shape[0]):
            cnt = 0
            if data[col+i]!=0:
                cnt += 1
            rl.append(cnt)
            col+=1
    
        A = np.zeros(shape=(row_num,col_AJ),dtype=dtype)
        J = np.zeros(shape=(row_num,col_AJ),dtype=dtype)
    
    

        
    return A,J,rl
    
                
        #print(self.J)
    
    def __matmul__(self, other):
        operator = LinearOperator(self.shape, matvec=self.matvec)
        result = operator @ other
        #print(result)
        return result
        
    def matvec(self, v):
        
        result = np.zeros(self.A.shape[0])

        for row_ind in range(self.A.shape[0]):
            row = np.zeros(v.shape[0])
            for col_ind in range(self.A.shape[1]):
                if self.A[row_ind,col_ind] != 0:
                    col = int(self.J[row_ind,col_ind])
                    row[col] = self.A[row_ind,col_ind]
            
            for i in range(row.shape[0]):
                result[row_ind] += row[i]*v[i]
        
        return np.array(result)

In [8]:
my_sparse_mat = EllpackMatrix(csr_mat)
x = [1,1,1,1,1,1]
y = my_sparse_mat @ x
print(y)

[10. 19. 34. 26. 22. 29.]


In [126]:
import numba

@numba.jit(nopython=True, parallel=True)
def csr_matvec(data, indices, indptr, shape, x):
    """Evaluates the matrix-vector product with a CSR matrix."""
    # Get the rows and columns
    
    m, n = shape
    
    y = np.zeros(m, dtype=np.float64)
        
    for row_index in numba.prange(m):
        col_start = indptr[row_index]
        col_end = indptr[row_index + 1]
        for col_index in range(col_start, col_end):
            y[row_index] += data[col_index] * x[indices[col_index]]
            
    return y
    

In [141]:
N = 1000
density = 0.5
matrixformat = 'csr'
csr_mat = scipy.sparse.rand(N, N, density=density, format=matrixformat)
my_sparse_mat = EllpackMatrix(csr_mat)
v = np.random.randn(csr_mat.shape[1])
y_ellr = my_sparse_mat @ v

#v2 = np.random.randn(my_sparse_mat.shape[1])
#v3 = np.random.randn(my_sparse_mat.shape[1])
y_csr = csr_mat @ v
#y_csr2 = csr_matvec(csr_mat.data, csr_mat.indices, csr_mat.indptr, csr_mat.shape, v)
# Compare with the Scipy sparse matrix multiplication

#y_csr1 == y_csr2

In [142]:
rel_error = np.linalg.norm(y_csr - y_ellr, np.inf) / np.linalg.norm(y_csr, np.inf)
print(f"Error: {round(rel_error, 2)}.")

Error: 0.0.
