sparseconverter

Format detection, identifiers and converter matrix for a range of numerical array formats (backends) in Python, focusing on sparse arrays.

Usage

Basic usage:

import numpy as np
import sparseconverter as spc

a1 = np.array([
    (1, 0, 3),
    (0, 0, 6)
])

# array conversion
a2 = spc.for_backend(a1, spc.SPARSE_GCXS)

# format determination
print("a1 is", spc.get_backend(a1), "and a2 is", spc.get_backend(a2))

a1 is numpy and a2 is sparse.GCXS

See examples/ directory for more!

Description

This library can help to implement algorithms that support a wide range of array formats as input, output or for internal calculations. All dense and sparse array libraries already do support format detection, creation and export from and to various formats, but with different APIs, different sets of formats and different sets of supported features -- dtypes, shapes, device classes etc.

This project creates an unified API for all conversions between the supported formats and takes care of details such as reshaping, dtype conversion, and using an efficient intermediate format for multi-step conversions.

Features

Supports Python 3.7 - (at least) 3.12
Defines constants for format identifiers
Various sets to group formats into categories:
- Dense vs sparse
- CPU vs CuPy-based
- nD vs 2D backends
Efficiently detect format of arrays, including support for subclasses
Get converter function for a pair of formats
Convert to a target format
Find most efficient conversion pair for a range of possible inputs and/or outputs

That way it can help to implement format-specific optimized versions of an algorithm, to specify which formats are supported by a specific routine, to adapt to availability of CuPy on a target machine, and to perform efficient conversion to supported formats as needed.

Supported array formats

numpy.ndarray
numpy.matrix -- to support result of aggregation operations on scipy.sparse matrices
cupy.ndarray
sparse.COO
sparse.GCXS
sparse.DOK
scipy.sparse.coo_matrix
scipy.sparse.csr_matrix
scipy.sparse.csc_matrix
cupyx.scipy.sparse.coo_matrix
cupyx.scipy.sparse.csr_matrix
cupyx.scipy.sparse.csc_matrix

Still TODO

PyTorch arrays
SciPy sparse arrays as opposed to SciPy sparse matrices.
More detailed cost metric based on more real-world use cases and parameters.

Changelog

0.4.0 (in development)

Better error message in case of unknown array type: #37

0.3.4

Support for Python 3.12 #26
Packaging update: Tests for conda-forge #27

0.3.3

Perform feature checks lazily #15

0.3.2

Detection and workaround for pydata/sparse#602.
Detection and workaround for cupy/cupy#7713.
Test with duplicates and scrambled indices.
Test correctness of basic array operations.

0.3.1

Include version constraint for sparse.

0.3.0

Introduce conversion_cost() to obtain a value roughly proportional to the conversion cost between two backends.

0.2.0

Introduce result_type() to find the smallest NumPy dtype that accomodates all parameters. Allowed as parameters are all valid arguments to numpy.result_type(...) plus backend specifiers.
Support cupyx.scipy.sparse.csr_matrix with dtype=bool.

0.1.1

Initial release

Known issues

conda install -c conda-forge cupy on Python 3.7 and Windows 11 may install cudatoolkit 10.1 and cupy 8.3, which have sporadically produced invalid data structures for cupyx.sparse.csc_matrix for unknown reasons. This doesn't happen with current versions. Running the benchmark function benchmark_conversions() can help to debug such issues since it performs all pairwise conversions and checks for correctness.

Notes

This project is developed primarily for sparse data support in LiberTEM. For that reason it includes the backend CUDA, which indicates a NumPy array, but targeting execution on a CUDA device.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github		.github
examples		examples
scripts		scripts
src/sparseconverter		src/sparseconverter
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.cfg		setup.cfg

License

LiberTEM/sparseconverter

Folders and files

Latest commit

History

Repository files navigation