# QuantumSim - Computational Improvements

Author: Wouter Pennings<br>
Date: March, 2025

This notebook proposes changes to the modeling approach of QuantumSim.

There are two main areas which this notebook will focus on two problems:
1. Memory limitations
2. Bad performance

Overview of the proposed changes to Quantumsim to lessen these problems:
- Sparse matrices and algorithm implementations
- Native code (JIT compilation)
- Hardware (GPU) acceleration
- Caching/memoization
- Lazy matrix generation
- Some small additional improvements
  - E.G. changing the statevector to a row-based vector.

At the end the of notebook the output and performance of this version will be compared to the current Quantumsim implementation.

The full implementation of Quantumsim with these proposed improvements implemented can be found at [quantumsim_performante.py](quantumsim_performante.py)

The implementations of the proposed changes found in this notebook are not nescecarily optimal, these are to proof that an implementation like this works in alleviating the problems. The Quantumsim version discussed in this notebook, is based upon the minimal one found in the [QuantumSimIntroduction.ipynb](QuantumSimIntroduction.ipynb).
 
## Inherit quantum computer simulation problem

Each additional qubit that a quantum computer/circuit has doubles its computing power, this "property" of quantum computers is what makes them exponentially faster than classical computers. However, it is also the reason why they are difficult to simulate, because each additional qubit requires your computer to be twice as powerful.

## Memory limitations

An a regular 16 GB system there currently is a ceiling of 14 qubits. The unitary matrices (operations) have a space complexity of $O(4^n)$, meaning that every additional qubit quadruples the size of the unitary matrices. 

Amount of elements in a unitary matrix from 14 qubit circuit:

$(2^{14})^2 = 268,435,456\ elements$

Every elements uses 128 bits or 16 bytes, as it is a complex numbers of with the real and imaginary part both are 64 bit floats.

$268,435,456 * 16 = 4,294,967,296\ bytes$

If we do the same calculations for a 15 qubit system, we can see that we are already out of memory in a 16 GB system. These are the elements and bytes for a 15 qubit system:

$(2^{15})^2 = 1,073,741,824\ elements$

$1,073,741,824 * 16 = 17,179,869,184\ bytes$

In [3]:
import numpy as np
import math
import cmath
import time
import scipy.sparse as sparse
import cupy
import cupyx.scipy.sparse as cupysparse
from numba import njit

In [None]:
@njit
def coo_spmv_row(rowIdx, colIdx, values, v):
    """
    Performs sparse matrix-vector (row based) multiplication using COO format.
    
    Parameters:
    - rowIdx (list[int]): Row indices of nonzero elements.
    - colIdx (list[int]): Column indices of nonzero elements.
    - values (list[float]): Nonzero values of the matrix.
    - v (numpy array): Dense vector for multiplication.
    
    Returns:
    - numpy array: Result vector y = A * v
    """
    out = np.zeros(len(v), dtype=values.dtype)  # Initialize output vector
    nnz = len(values)  # Number of nonzero elements

    for i in range(nnz):  # Iterate over nonzero elements
        out[rowIdx[i]] += values[i] * v[colIdx[i]]

    return out

In [None]:
def coo_kron(A, B):
    output_shape = (A.shape[0] * B.shape[0], A.shape[1] * B.shape[1])

    if A.nnz == 0 or B.nnz == 0:
        # kronecker product is the zero matrix
        return sparse.coo_sparse(output_shape)

    # Expand entries of a into blocks
    # When using more then 32 qubits, increase to int64
    row = np.asarray(A.row, dtype=np.int32).repeat(B.nnz)
    col = np.asarray(A.col, dtype=np.int32).repeat(B.nnz)
    data = A.data.repeat(B.nnz)

    row *= B.shape[0]
    col *= B.shape[1]

    # increment block indices
    row = row.reshape(-1,B.nnz)
    row += B.row
    row = row.reshape(-1)

    col = col.reshape(-1,B.nnz)
    col += B.col
    col = col.reshape(-1)

    # compute block entries
    data = data.reshape(-1,B.nnz) * B.data
    data = data.reshape(-1)

    return sparse.coo_sparse((data, (row, col)), shape=output_shape)

In [None]:
def coo_kron_gpu(A:cupysparse.coo_matrix, B:cupysparse.coo_matrix):
    out_shape = (A.shape[0] * B.shape[0], A.shape[1] * B.shape[1])

    if A.nnz == 0 or B.nnz == 0:
        # kronecker product is the zero matrix
        return cupysparse.coo_matrix(out_shape).asformat(format)

    # expand entries of A into blocks
    row = A.row.astype(cupy.int32, copy=True) * B.shape[0]
    row = row.repeat(B.nnz)
    col = A.col.astype(cupy.int32, copy=True) * B.shape[1]
    col = col.repeat(B.nnz)
    data = A.data.repeat(B.nnz) 

    # increment block indices
    row = row.reshape(-1, B.nnz)
    row += B.row
    row = row.ravel()

    col = col.reshape(-1, B.nnz)
    col += B.col
    col = col.ravel()

    # compute block entries
    data = data.reshape(-1, B.nnz) * B.data
    data = data.ravel()

    return cupysparse.coo_matrix((data, (row, col)), shape=out_shape).asformat(format)