<a href="https://colab.research.google.com/github/AndrewZhang76/gnn_with_spmm/blob/main/inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 10-414/714: Deep Learning Systems - Final Project


## **Sparse Matrix Multiplication on Graph Neural Network**

**By Andrew Zhang, Jinkai Qiu & Yimei Wu**


---

In this project, we are going to implement **sparse matrix** class supported in Needle, **forward and backward pass of sparse matrix multiplication**, and its **application on Graph Neural Network(GNN)**.

In this notebook, we are going to show how to **define sparse matrix**, perform **sparse matrix multiplication** and compare it with normal dense matrix multiplication. In addition, we will also show how it can be used in **GNN training.**



## 1. Clone Required Repo and Install Required Packages

In [1]:
# Code to set up the assignment
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/10714
!mkdir -p final_proj
%cd /content/drive/MyDrive/10714/final_proj
!git clone https://github.com/AndrewZhang76/gnn_with_spmm.git
!pip3 install pybind11

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/10714
/content/drive/MyDrive/10714/final_proj
fatal: destination path 'gnn_with_spmm' already exists and is not an empty directory.


## 2. Build

In [2]:
%cd /content/drive/MyDrive/10714/final_proj/gnn_with_spmm/
!make

/content/drive/MyDrive/10714/final_proj/gnn_with_spmm
-- Found pybind11: /usr/local/lib/python3.10/dist-packages/pybind11/include (found version "2.13.6")
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to

[0m
-- CUDA_FOUND: TRUE
-- Found cuda, building cuda backend
Mon Dec  9 09:09:37 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.

In [3]:
%set_env PYTHONPATH ./python
%set_env NEEDLE_BACKEND nd

env: PYTHONPATH=./python
env: NEEDLE_BACKEND=nd


## 3. Sparse Matrix Definiation.
In this project, we defined a new way to represent a sparse matrix(a type of matrix that contains a significant number of zero elements compared to the total number of elements in the matrix.) - **COO (Coordinate) format**. \
The COO (Coordinate) format is a representation of a sparse matrix that stores only the nonzero elements along with their row and column indices. It is efficient in terms of memory usage for sparse matrices because it avoids storing zero values.\
### Key Components:


1.   Values (data): A list or array of the nonzero elements in the matrix.
2.   Row indices (row): A list or array specifying the row index for each nonzero element.
3. Column indices (col): A list or array specifying the column index for each nonzero element.

###Properties:
1. Flexible: Allows easy manipulation, such as matrix construction from nonzero entries.
2. Duplicates: COO format allows duplicate entries. To obtain the actual matrix, these duplicates need to be summed.
3. Conversion: Often converted to other formats like CSR (Compressed Sparse Row) or CSC (Compressed Sparse Column) for efficient matrix operations.






#### First, we are going to randomly generate a sparse matrix in normal NDArray format. It is a 10 * 10 matrix with 10 non-zero elements.

In [4]:
%cd /content/drive/MyDrive/10714/final_proj/gnn_with_spmm/python/needle
import numpy as np
from backend_ndarray.ndarray import *

np.random.seed(0)
device = cuda()
# Dimensions of the matrix
rows, cols = 10, 10
nonzero_elements = 10

# Initialize a sparse matrix with all zeros
matrix = np.zeros((rows, cols))

# Randomly generate indices for nonzero elements
row_indices = np.random.choice(rows, nonzero_elements, replace=True)
col_indices = np.random.choice(cols, nonzero_elements, replace=True)

# Generate random values for the nonzero elements
values = np.random.random(nonzero_elements)

# Populate the matrix
for r, c, v in zip(row_indices, col_indices, values):
    matrix[r, c] = v

orig_matrix = NDArray(matrix, device=device)
orig_matrix

/content/drive/MyDrive/10714/final_proj/gnn_with_spmm/python/needle


NDArray([[0.         0.         0.         0.         0.         0.
  0.07103606 0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.7991586  0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.87001216 0.0202184  0.        ]
 [0.         0.46147937 0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.9786183  0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.83261985 0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.   

#### Then, we are going to transform it to a sparse matrix.

In [5]:
sparse_matrix = orig_matrix.to_sparse()
sparse_matrix

SparseMatrix(nnz=8, shape=(10, 10),
  data=[0.07103606 0.7991586  0.87001216 0.0202184  0.46147937 0.9786183
 0.83261985 0.77815676],
  row_indices=[0 2 3 3 4 5 7 9],
  col_indices=[6 8 7 8 1 7 1 6])

As you can see, the sparse matrix contains three length 10 array, `data`, `row_indices` and `col_indices`. The `data` represents the values of the all the non-zero elements inside the matrix, while `row_indices` and `col_indices` represents the row and column index of each non-zero element's index.

We can also switch the sparse matrix back to dense matrix by calling `to_dense()` function.



In [6]:
dense_matrix = sparse_matrix.to_dense()
dense_matrix

NDArray([[0.         0.         0.         0.         0.         0.
  0.07103606 0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.7991586  0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.87001216 0.0202184  0.        ]
 [0.         0.46147937 0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.9786183  0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.83261985 0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.   

## 4. Sparse Matrix Multiplication
### COO Format Matrix Multiplication

Matrix multiplication in the **COO (Coordinate)** format involves multiplying two sparse matrices that are represented in the COO format. Since the COO format only stores nonzero elements along with their row and column indices, performing multiplication requires processing each nonzero element and its corresponding coordinates.

#### Step 1: Represent the Matrices in COO Format
Assume two matrices A and B, both stored in COO format. They are represented by the following components:

- **Matrix A**:
  - `A_data`: Nonzero values in matrix A.
  - `A_row`: Row indices of the nonzero values in A.
  - `A_col`: Column indices of the nonzero values in A.

- **Matrix B**:
  - `B_data`: Nonzero values in matrix B.
  - `B_row`: Row indices of the nonzero values in B.
  - `B_col`: Column indices of the nonzero values in B.

Let matrix A be of size m × n and matrix B be of size n × p. The resulting matrix C will be of size m × p.

#### Step 2: Initialize the Resultant Matrix
The resulting matrix C will also be sparse and initially contain only zeros. In COO format, the result will have:

- `C_data`: Nonzero values of the resulting matrix.
- `C_row`: Row indices of the nonzero values in C.
- `C_col`: Column indices of the nonzero values in C.

#### Step 3: Compute Nonzero Elements of the Result
To calculate C = A × B, follow these steps for each nonzero element of A:

1. **Find the corresponding element in matrix B**: For each nonzero element A_ij in A, find the corresponding column indices of B. The row index in B must match the column index in A to compute the dot product.

2. **Perform the multiplication**: For each pair of nonzero elements A_ij and B_jk, multiply them together and add to the corresponding entry in C:
   
   ```
   C_ik = C_ik + A_ij × B_jk
   ```

3. **Store the result**: If C_ik is nonzero after the above addition, store the result in the COO format:
   - Add the value of C_ik to `C_data`.
   - Add the row index i to `C_row`.
   - Add the column index k to `C_col`.

#### Step 4: Handle Sparse Properties
Since the result matrix C is also sparse, ensure that only nonzero values are stored. If the sum of C_ik is zero, it should not be stored in the COO format.
#### Define two sparse matrices.


In [22]:
m, n, p = 50, 50, 50
device = cuda()
nnz = 100
# Initialize a sparse matrix with all zeros
matrix1 = np.zeros((m, n))
matrix2 = np.zeros((n, p))

# Randomly generate indices for nonzero elements
row_indices_1 = np.random.choice(m, nnz, replace=True)
col_indices_1 = np.random.choice(n, nnz, replace=True)
row_indices_2 = np.random.choice(n, nnz, replace=True)
col_indices_2 = np.random.choice(p, nnz, replace=True)

# Generate random values for the nonzero elements
values_1 = np.random.random(nnz)
values_2 = np.random.random(nnz)

# Populate the matrix
for r, c, v in zip(row_indices_1, col_indices_2, values_1):
    matrix1[r, c] = v
for r, c, v in zip(row_indices_2, col_indices_2, values_2):
    matrix2[r, c] = v

dense_matrix1 = NDArray(matrix1, device=device)
dense_matrix2 = NDArray(matrix2, device=device)
dense_matrix1, dense_matrix2

(NDArray([[0.        0.        0.        ... 0.        0.        0.       ]
  [0.        0.        0.        ... 0.        0.        0.       ]
  [0.        0.        0.        ... 0.        0.5361775 0.       ]
  ...
  [0.        0.        0.        ... 0.        0.        0.       ]
  [0.        0.        0.        ... 0.        0.        0.       ]
  [0.        0.        0.        ... 0.        0.        0.       ]], device=cuda()),
 NDArray([[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]], device=cuda()))

In [23]:
# Convert dense matrices to sparse matrices.
sparse_matrix1 = dense_matrix1.to_sparse()
sparse_matrix2 = dense_matrix2.to_sparse()
sparse_matrix1, sparse_matrix2

(SparseMatrix(nnz=98, shape=(50, 50),
   data=[0.01203622 0.9589827  0.9591666  0.9564057  0.86948854 0.69962204
  0.23262699 0.09961493 0.90398395 0.6455702  0.9894098  0.9292914
  0.6658591  0.7832344  0.49030533 0.59098417 0.51730853 0.39902532
  0.5361775  0.6813925  0.6360611  0.92496693 0.06807408 0.2531912
  0.9040444  0.24141861 0.94530153 0.16053882 0.42403224 0.1154843
  0.45860395 0.75677866 0.23274413 0.251941   0.4012595  0.3267009
  0.8286569  0.9518745  0.54380596 0.8149665  0.43040243 0.34851936
  0.990345   0.12886056 0.15941447 0.04600731 0.61848027 0.16295442
  0.03307459 0.4090541  0.28705153 0.5546878  0.4569114  0.25868407
  0.3567069  0.42879573 0.45722345 0.0163285  0.35536885 0.51001686
  0.01560606 0.69002503 0.45813882 0.9088437  0.72416764 0.2775961
  0.8820414  0.8155238  0.42408898 0.63876176 0.3277204  0.24002028
  0.66250455 0.57754296 0.6144647  0.03330463 0.85772264 0.36054555
  0.0653042  0.39843425 0.98549145 0.62889844 0.1871309  0.5757512
  0.84903

#### Matrix Multiplication between dense matrices.

In [24]:
import time
start = time.time()
correct_result = dense_matrix1 @ dense_matrix2
end = time.time()
print(f"Time taken for dense-dense matrix multipliction: {(end - start)*1000} ms")
correct_result

Time taken for dense-dense matrix multipliction: 0.18453598022460938 ms


NDArray([[0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.10966156 0.         0.         ... 0.         0.49193588 0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.03218639]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]], device=cuda())

#### Matrix Multiplication between sparse matrices.

In [25]:
start = time.time()
result = sparse_matrix1 @ sparse_matrix2
result = result.to_dense()
end = time.time()
print(f"Time taken for sparse-sparse matrix multipliction: {(end - start)*1000} ms")
print(correct_result - result != 0)
print(f"Result Correction: \n{np.allclose(correct_result.numpy(), result.numpy())}")

Time taken for sparse-sparse matrix multipliction: 0.7064342498779297 ms
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
Result Correction: 
True


In [20]:
start = time.time()
result = sparse_matrix1 @ dense_matrix2
result = result
end = time.time()
print(f"Time taken for sparse-dense matrix multipliction: {(end - start)*1000} ms")
print(f"Result Correction: \n{np.allclose(correct_result.numpy(), result.numpy())}")

Time taken for sparse-dense matrix multipliction: 0.3457069396972656 ms
Result Correction: 
True


In [21]:
start = time.time()
result = dense_matrix1 @ sparse_matrix2
result = result
end = time.time()
print(f"Time taken for dense-sparse matrix multipliction: {(end - start)*1000} ms")
print(f"Result Correction: \n{np.allclose(correct_result.numpy(), result.numpy())}")

Time taken for dense-sparse matrix multipliction: 0.4584789276123047 ms
Result Correction: 
True


In [13]:

!python -m backend_ndarray.ndarray

Time for dense @ sparse: 0.10132789611816406 ms
Time for sparse @ dense: 0.12445449829101562 ms
Time for sparse @ sparse: 23.067712783813477 ms
Total time: 23.293495178222656 ms
dense @ sparse: True
sparse @ dense: True
sparse @ sparse: True
