In [None]:
# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<img src="./images/nvmath_head_panel@0.5x.png" alt="nvmath-python" />

# Getting Started with nvmath-python: Direct Sparse Solver

## Overview

This notebook illustrates **nvmath-python**'s *direct sparse solver* (DSS), used for solving large linear systems where most coefficients are zeros. Backed by [NVIDIA's cuDSS library](https://developer.nvidia.com/cudss), the `nvmath.sparse.advanced` module is designed for solving linear equations of the form $ A \cdot X = B $, where:
* $A$, the *left-hand side* (LHS), is a known sparse matrix in [CSR format](https://docs.nvidia.com/nvpl/latest/sparse/storage_format/sparse_matrix.html#compressed-sparse-row-csr). All major types of matrices are supported, such as *general*, *symmetric*, *Hermitian*, *symmetric positive definite* (SPD), and *Hermitian positive definite* (HPD).
* $B$, the *right-hand side* (RHS), is either a known dense vector or matrix.
* $X$ is the unknown solution provided by the direct sparse solver.

**Learning Objectives:**
* Understand when and why to use sparse direct solvers for large linear systems
* Download real-world sparse matrices from the [SuiteSparse Matrix Collection](https://sparse.tamu.edu/)
* Apply `nvmath.sparse.advanced.direct_solver` to solve sparse linear systems
* Work with sparse matrices in CSR (Compressed Sparse Row) format using CuPy
* Configure solver options for optimal performance using hybrid CPU-GPU execution
* Validate the accuracy of numerical solutions using residual norms

---
## Introduction

Many real-world problems in scientific computing, engineering, and data science involve solving large systems of linear equations. When these systems have coefficient matrices where most elements are zero (sparse matrices), specialized algorithms become essential for efficient computation. 

A *sparse direct solver* uses factorization techniques (such as LU or Cholesky decomposition) specifically optimized for sparse matrices to find exact solutions to linear systems. Unlike iterative methods that approximate solutions, direct solvers provide accurate results in a predictable number of operations, making them ideal for applications requiring high precision.

**nvmath-python**'s sparse solver module (`nvmath.sparse.advanced`) provides GPU-accelerated direct solving capabilities through NVIDIA's [cuDSS library](https://developer.nvidia.com/cudss). The solver supports:
- CSR (Compressed Sparse Row) format for the $A$ matrix representation
- Various matrix types: general, symmetric, Hermitian, SPD, and HPD matrices
- GPU-only and hybrid CPU-GPU execution spaces for optimal performance on different problem sizes
- Interoperability with Python's scientific computing and AI ecosystems (SciPy, CuPy, PyTorch)

This notebook demonstrates how to use **nvmath-python** to solve sparse linear systems using real-world data from the [SuiteSparse Matrix Collection](https://sparse.tamu.edu/), a widely-used repository of sparse matrices from diverse application domains.

**Prerequisites:** To use this notebook, you will need:
- A computer equipped with an NVIDIA GPU
- Basic understanding of linear algebra and matrix operations
- Familiarity with sparse matrix representations (CSR format)
- Understanding of linear system solving concepts


---
## Setup

This notebook requires the following Python libraries:
- `nvmath`: NVIDIA's mathematical library for Python (with sparse solver support)
- `cupy`: For GPU array operations and sparse matrix handling
- `scipy`: For CPU sparse matrix operations and Matrix Market file reading
- `ssgetpy`: For downloading matrices from the SuiteSparse Matrix Collection
- `cuda-pathfinder`: For locating installed NVIDIA shared and header-only libraries

If you completed previous notebooks, the only additional library you will need to install is:

```bash
pip install ssgetpy
```

For detailed installation instructions, please refer to the [nvmath-python documentation](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#install-nvmath-python).


---
## Downloading Sparse Matrices from SuiteSparse

The [SuiteSparse Matrix Collection](https://sparse.tamu.edu/) (formerly known as the University of Florida Sparse Matrix Collection) is a comprehensive repository containing thousands of sparse matrices from real-world applications spanning fields such as:
- Structural engineering and finite element analysis
- Circuit simulation
- Computational fluid dynamics
- Graph theory and social networks
- Optimization problems

The `ssgetpy` library provides a convenient Python interface for browsing and downloading matrices from this collection. Let's start by searching for a suitable sparse matrix:


In [None]:
import ssgetpy

mtx_obj = ssgetpy.search()[2] # 2 is the index of the matrix in the SuiteSparse Matrix Collection
mtx_obj # Display the matrix object


### Loading the Matrix into Memory

Once we've identified a suitable matrix, we can download it and load it into memory using SciPy's Matrix Market reader. The Matrix Market format (`.mtx` files) is a standard format for storing sparse matrices, and most matrices in the SuiteSparse Collection are available in this format.

The matrix will initially be loaded as a COO (Coordinate) format sparse matrix, which stores the non-zero elements as `(row, column, value)` triplets:


In [None]:
import scipy

path, _ = mtx_obj.download(extract=True) # Download selected matrix and unpack it
spm_coo = scipy.io.mmread(f"{path}/{path.split('/')[-1]}.mtx") # Read the unpacked Matrix Market file into COO matrix

---
## Solving Sparse Linear Systems with nvmath-python

Now that we have our sparse matrix loaded, we can use **nvmath-python**'s direct solver to solve the linear system $ A \cdot x = b $. The process involves:

1. **Converting to CSR format**: The solver expects matrices in CSR (Compressed Sparse Row) format, which is more efficient for matrix-vector operations than COO format.
2. **Transferring to GPU**: We'll use CuPy to create GPU-resident sparse matrices and vectors.
3. **Defining the right-hand side**: For this example, we'll use a simple vector of ones.
4. **Solving the system**: Call `nvmath.sparse.advanced.direct_solver` to compute the solution.
5. **Validating the result**: Check the accuracy by computing the residual $ \|A \cdot x - b\| $.

Let's implement this workflow:


In [None]:
import nvmath
import cupy as cp
import cupyx as cpx

# Transfer sparse matrix to GPU using CuPy
# CuPy can directly convert from SciPy sparse matrices
a = cpx.scipy.sparse.csr_matrix(spm_coo)

# Create the right-hand side vector (a dense vector of ones)
b = cp.ones(a.shape[1])

# Solve the linear system A * x = b using nvmath-python's direct solver
x = nvmath.sparse.advanced.direct_solver(a, b)

# Validate the solution by computing the residual norm ||A * x - b||
# A small residual indicates an accurate solution
residual = cp.linalg.norm(a @ x - b)
print(f"Solution computed successfully!")
print(f"Residual L2 norm ||A*x - b|| = {residual:.2e}")
print(f"Matrix size: {a.shape[0]} Ã— {a.shape[1]}")
print(f"Number of non-zero elements: {a.nnz}")

**Understanding the Residual:**

The residual norm $ \|A \cdot x - b\| $ measures how close our computed solution $ x $ is to the exact solution. A very small residual (close to machine precision) indicates that the solver successfully found an accurate solution. For double-precision floating-point arithmetic, residuals on the order of $ 10^{-10} $ to $ 10^{-15} $ are typical for well-conditioned problems.

**Note on Performance Warning:**

You may see a warning about "`No multithreading interface library was specified`." This indicates that the solver is not using multi-threading for CPU operations during the planning phase. While this doesn't affect the GPU computation performance significantly, it can slow down the initial setup. We'll address this in the next section by configuring hybrid execution.


---
## Configuring Hybrid CPU-GPU Execution

The **nvmath-python** sparse solver supports *hybrid execution*, which intelligently distributes work between the CPU and GPU for optimal performance. During the solving process, certain operations (like symbolic factorization and pivoting) may benefit from CPU execution, while the bulk of numerical computations are performed on the GPU.

To enable hybrid execution with multi-threading support, we need to:
1. **Create a cuDSS handle**: This is a low-level handle to the cuDSS library.
2. **Load a suitable cuDSS multithreading interface library**: We will use a pre-built GNU OpenMP interface library shipped with cuDSS.
3. **Configure solver options**: Pass these settings to the solver via `DirectSolverOptions`.

Let's configure and run the solver with hybrid execution:


In [None]:
from cuda.pathfinder import load_nvidia_dynamic_lib
import os

# Locate the cuDSS library and find the OpenMP multithreading layer
loaded_dl = load_nvidia_dynamic_lib("cudss")
gomp_lib_path = os.path.dirname(loaded_dl.abs_path) + "/libcudss_mtlayer_gomp.so.0"

# Create a cuDSS handle for low-level library access
h = nvmath.bindings.cudss.create()

# Configure solver options with the handle and multithreading library
o = nvmath.sparse.advanced.DirectSolverOptions(
    handle=h,
    multithreading_lib=gomp_lib_path
)

# Solve the linear system with hybrid CPU-GPU execution
x = nvmath.sparse.advanced.direct_solver(a, b, options=o, execution="hybrid")

# Validate the solution
residual = cp.linalg.norm(a @ x - b)
print(f"Hybrid execution completed successfully!")
print(f"Residual L2 norm ||A*x - b|| = {residual:.2e}")

# Clean up: destroy the cuDSS handle
nvmath.bindings.cudss.destroy(h)

**Key Observations:**

- With hybrid execution configured, you should no longer see the multithreading warning.
- The residual remains very small, confirming the solution accuracy is maintained.
- Hybrid execution can provide performance benefits, especially during the planning and symbolic factorization phases.
- The `execution` parameter accepts values like `"hybrid"` (GPU-CPU), or `"device"` (GPU-only).

## Exercise: Implement a sparse solver for multiple RHS using nvmath-python

**nvmath-python** supports batched operands in the sparse solver. We distinguish explicit batching, where the samples of a batch are a sequence of matrices (or vectors for the RHS), and implicit batching, where the samples are inferred from 3D or higher-dimensional tensors for the LHS and RHS. The batching for the LHS and RHS can be independent - the LHS can be batched explicitly while the RHS can be batched implicitly and *vice-versa*. Each sample in an explicitly batched system can be of different size (number of equations), resulting in a flexible user interface.

In this exercise you will implement an implicit batching for multiple RHS, represented as a matrix. For simplicity you will use randomly generated RHS. To control the correctness you will compute the L2-norm residual.

### 1. Load matrix object

In [None]:
import ssgetpy

mtx_obj = ssgetpy.search()[2] # 2 is the index of the matrix in the SuiteSparse Matrix Collection
mtx_obj # Display the matrix object

### 2. Download Matrix Market file as COO matrix

In [None]:
import scipy

path, _ = mtx_obj.download(extract=True) # Download selected matrix and unpack it
spm_coo = scipy.io.mmread(f"{path}/{path.split('/')[-1]}.mtx") # Read the unpacked Matrix Market file into COO matrix

### 3. Transfer to GPU and run the solver

In [None]:
import nvmath
import cupy as cp
import cupyx as cpx
from cuda.pathfinder import load_nvidia_dynamic_lib
import os


# Locate the cuDSS library and find the OpenMP multithreading layer
loaded_dl = load_nvidia_dynamic_lib("cudss")
gomp_lib_path = os.path.dirname(loaded_dl.abs_path) + "/libcudss_mtlayer_gomp.so.0"

# Create a cuDSS handle for low-level library access
h = nvmath.bindings.cudss.create()

# Configure solver options with the handle and multithreading library
o = nvmath.sparse.advanced.DirectSolverOptions(
    handle=h,
    multithreading_lib=gomp_lib_path
)


# Transfer sparse matrix to GPU using CuPy
# CuPy can directly convert from SciPy sparse matrices
# TODO: Convert COO matrix to CSR format (more efficient for the solver)

# Create the right-hand side vectors for implicit batching. Note that the RHS must be a column-major array
# TODO: Create a random RHS matrix with the same number of columns as the number of unknowns in the matrix

# Solve the linear system with GPU-only execution
# TODO: Solve the linear system with GPU-only execution. Do not forget to pass the options to the solver.

# Validate the solution
# TODO: Compute the L2-norm residual

# Clean up: destroy the cuDSS handle
# TODO: Destroy the cuDSS handle

---
## Conclusion

In this notebook, we explored **nvmath-python**'s sparse direct solver capabilities for solving large linear systems with sparse matrices. We demonstrated how to work with real-world sparse matrices from the SuiteSparse Matrix Collection and leverage GPU acceleration for efficient computation.

**Key Takeaways:**

- **Sparse direct solvers** are essential for efficiently solving linear systems where the coefficient matrix is sparse (mostly zeros).
- **nvmath-python** provides a high-level interface to NVIDIA's cuDSS library through `nvmath.sparse.advanced.direct_solver`.
- The solver seamlessly integrates with Python's scientific computing ecosystem, accepting **SciPy**, **CuPy**, and **PyTorch** sparse matrices.
- **CSR (Compressed Sparse Row)** format is the primary sparse matrix format for the solver, offering efficient storage and computation.
- **Hybrid CPU-GPU execution** can optimize performance by distributing work intelligently between CPU and GPU.
- Solution accuracy can be validated by computing the **residual norm** $ \|A \cdot x - b\| $, which should be close to machine precision for well-conditioned problems.
- The solver supports various matrix types including general, symmetric, Hermitian, SPD, and HPD matrices.

**Practical Applications:**

Sparse direct solvers are widely used in:
- Finite element analysis and structural engineering
- Circuit simulation and electronic design automation
- Computational fluid dynamics
- Optimization problems with sparse constraints
- Graph algorithms and network analysis
- Image processing and computer graphics

**Next Steps:**

- Explore other nvmath-python tutorials:
  - [01_kernel_fusion.ipynb](01_kernel_fusion.ipynb) - Learn about kernel fusion optimization
  - [02_mem_exec_spaces.ipynb](02_mem_exec_spaces.ipynb) - Understanding memory and execution spaces
  - [03_stateful_api.ipynb](03_stateful_api.ipynb) - Stateful APIs and autotuning
  - [04_callbacks.ipynb](04_callbacks.ipynb) - FFT callbacks
  - [05_device_api.ipynb](05_device_api.ipynb) - Device APIs
- Try solving different types of sparse systems from the SuiteSparse Collection
- Experiment with different execution modes (`"device"`, `"hybrid"`) and measure performance
- Explore advanced solver options for specialized matrix types (symmetric, positive definite, etc.)

---
## References

- NVIDIA nvmath-python documentation, "Sparse Solver", https://docs.nvidia.com/cuda/nvmath-python/latest/. Accessed: November 4, 2025.
- NVIDIA cuDSS library, "Direct Sparse Solver", https://developer.nvidia.com/cudss. Accessed: November 4, 2025.
- Davis, Timothy A., et al., "The SuiteSparse Matrix Collection", ACM Transactions on Mathematical Software, 45(2), 1-25, 2019.
- Saad, Yousef, "Iterative Methods for Sparse Linear Systems", 2nd Edition, SIAM, 2003.
- Davis, Timothy A., "Direct Methods for Sparse Linear Systems", SIAM, 2006.
- Amestoy, Patrick R., et al., "A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling", SIAM Journal on Matrix Analysis and Applications, 23(1), 15-41, 2001.
- Matrix Market format specification, https://math.nist.gov/MatrixMarket/formats.html. Accessed: November 4, 2025.
