Skip to content

antononcube/Raku-Math-SparseMatrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Math::SparseMatrix

Actions Status Actions Status Actions Status

License: Artistic-2.0

Raku package for sparse matrix algorithms:

  • Implements (some of) the algorithms described (and spelled-out in FORTRAN) in the book "Sparse Matrix Technology" by S. Pissanetzky, [SP1].

  • Provides convenient interface to accessing sparse matrix elements, rows, column, and sub-matrices.


Motivation

Sparse Matrix Algebra (SMA) is a "must have" for many computational workflows. Here is a (non-exhaustive) list given in the order of my preferences:

  • Recommendation Systems (RS)
    • I make recommenders often during Exploratory Data Analysis (EDA).
    • For me, RS are "first order regression."
    • I also specialize in the making of RS.
    • I implemented a Raku recommender without SMA, "ML::StreamsBlendingRecommender", [AAp1], but it is too slow for "serious" datasets.
      • Still useful; see [AAv1].
  • Latent Semantic Analysis (LSA)
    • LSA is one my favorite Unsupervised Machine Learning (ML) workflows.
    • That means that this SMA package should have algorithms facilitating the programming of:
      • Singular Value Decomposition (SVD)
      • Non-Negative Matrix Factorization (NNMF)
  • Graphs
    • There is a natural (representation) connection between sparse matrices and graphs.
    • Many graph algorithms can leverage (fast) SMA.
    • So far (2024-09-25) the algorithms in "Graph", [AAp2], do not use SMA and that is feature- and speed limiting.
  • Optimization
    • For large scale optimization problems using SMA is a must.
      • Since their constraints are given with sparse matrices.
  • Partial Differential Equations (PDE) solving

Usage examples

Here is a simple sparse matrix in Compressed Sparse Row (CSR) format:

use Math::SparseMatrix;
use Math::SparseMatrix::Utilities;

my $nrow = 5;
my $ncol = 8;
my $density = 0.2;
my $tol = 0.01;
my $type = 'CSR';

my $matrix1 = generate-random-sparse-matrix($nrow, $ncol, :$density, :$tol, :$type):!decorated;
say $matrix1;
# Math::SparseMatrix::CSR(:specified-elements(6), :dimensions((5, 8)), :density(0.15))

Here it is "pretty printed":

$matrix1.print;
# .    .    .    .    .    .    .    .   
# .    .    .    .    .    .    .    .   
# .    .    0.35 0.85 .    .    0.52 .   
# .    .    .    .    .    0.92 .    .   
# .    .    0.43 .    .    0.07 .    .

Here 10 is multiplied with all elements:

my $matrix2 = $matrix1.multiply(10); 

$matrix2.print;
# .   .   .   .   .   .   .   .  
# .   .   .   .   .   .   .   .  
# .   .   3.5 8.5 .   .   5.2 .  
# .   .   .   .   .   9.2 .   .  
# .   .   4.3 .   .   0.7 .   .

Here is the dot-product of the original matrix with its transpose:

my $matrix3 = $matrix1.dot($matrix1.transpose); 

$matrix3.print;
# .                   .                   .                   .                   .                  
# .                   .                   .                   .                   .                  
# .                   .                   1.1154              .                   0.1505             
# .                   .                   .                   0.8464              0.06440000000000001
# .                   .                   0.1505              0.06440000000000001 0.18979999999999997

Special features

Here are few features that other SMA packages typically do not provide.

Named rows and columns

It is very convenient to have named rows and columns that are respected (or preserved) in the typical SMA operations.

Here is an example:

my $smat = Math::SparseMatrix.new($matrix1, row-names => 'a' .. 'e', column-names => 'A' .. 'H'); 

$smat.print;
# –––––––––––––––––––––––––––––––––––––––––––
#     A    B    C    D    E    F    G    H   
# ––┼––––––––––––––––––––––––––––––––––––––––
# a │ .    .    .    .    .    .    .    .   
# b │ .    .    .    .    .    .    .    .   
# c │ .    .    0.35 0.85 .    .    0.52 .   
# d │ .    .    .    .    .    0.92 .    .   
# e │ .    .    0.43 .    .    0.07 .    .

Here is the dot-product of that matrix with its transpose:

my $smat2 = $smat.dot($smat.transpose); 

$smat2.round(0.02).print;
# ––––––––––––––––––––––––––––
#     a    b    c    d    e   
# ––┼–––––––––––––––––––––––––
# a │ .    .    .    .    .   
# b │ .    .    .    .    .   
# c │ .    .    1.12 .    0.16
# d │ .    .    .    0.84 0.06
# e │ .    .    0.16 0.06 0.18

Implicit value

The sparse matrices can have an implicit value that is different from 0. For example, adding a number to a sparse matrix produces another sparse matrix with different implicit value:

my $matrix3 = $matrix1.add(10);
# Math::SparseMatrix::CSR(:specified-elements(6), :dimensions((5, 8)), :density(0.15))
$matrix3.implicit-value
# 10

Here is the pretty print:

$matrix3.print(:iv)
# 10    10    10    10    10    10    10    10   
# 10    10    10    10    10    10    10    10   
# 10    10    10.35 10.85 10    10    10.52 10   
# 10    10    10    10    10    10.92 10    10   
# 10    10    10.43 10    10    10.07 10    10

Remark: Currently, the implicit values are ignored in dot.


Design

General

  • There should be a "main" class, Math::SpareMatrix that:
    • Provides the SMA functionalities
    • Delegates to concrete sparse matrix classes that are based on different representation formats
    • Can have named rows, columns, and dimensions
    • Gives access to sparse matrix elements, rows, columns, and sub-matrices
  • The default or "main" core sparse matrix class should use Compressed Sparse Row (CSR) format.
  • Also, a class using Dictionary Of Keys (DOK) format should be provided.
  • The core sparse matrix classes do not have named rows, columns, and dimensions.
  • Ideally, a class using NativeCall should be implemented at some point.
    • It looks like this is "a must", since the CSR and DOK classes are fairly slow.
    • Both "plain C" and macOS Accelerate implementations should be made.
  • The most important operation is Matrix-Vector Dot Product.
    • The current design is to use one-row or one-column matrices for the vectors.
    • Dense vectors are (of course) also supported

Object-Oriented Programming (OOP) architecture

Here is a corresponding diagram:

classDiagram
    class Abstract["Math::SparseMatrix::Abstract"] {
        <<abstract>>
        +value-at()
        +row-at()
        +column-at()
        +row-slice()
        +AT-POS()
        +print()
        +transpose()
        +add()
        +multiply()
        +dot()
    }

    class CSRStruct {
        <<C struct>>
    }
    
    class NativeCSR["Math::SparseMatrix::CSR::Native"] {
        $row_ptr
        $col_index
        @values
        nrow
        ncol
        implicit_value
    }

    class NativeAdapater["Math::SparseMatrix::NativeAdapter"] {
        +row-ptr
        +col-index
        +values
        +nrow
        +ncol
        +implicit-value
    }    

    class CSR["Math::SparseMatrix::CSR"] {
        @row-ptr
        @col-index
        @values
        nrow
        ncol
        implicit-value
    }
    
    class DOK["Math::SparseMatrix::DOK"] {
        %adjacency-map
        nrow
        ncol
        implicit-value
    }
    
    class SparseMatrix["Math::SparseMatrix"] {
        Abstract core-matrix
        +AT-POS()
        +print()
        +transpose()
        +add()
        +multiply()
        +dot()
    }
    
    CSR --> Abstract : implements
    DOK --> Abstract : implements
    NativeAdapater --> Abstract : implements
    SparseMatrix --> Abstract : Hides actual component class
    SparseMatrix *--> Abstract
    NativeAdapater *--> NativeCSR
    NativeCSR -- CSRStruct : reprents
Loading

Implementation details

  • Again, the most important operation is Matrix-Vector Dot Product.
    • It has to be as fast as possible.
    • There are two Dot Product implementations for CSR:
      • Direct
      • Symbolic-&-numeric
    • (Currently) the direct one is 20-50% faster.
  • It seems it is a good idea to provide for some operations symbolic (or sparse matrix elements pattern) methods.
    • For example:
      • add-pattern / add
      • dot-pattern / dot-numeric
  • It is important to have access methods / operators.
    • All three are used in the accessor implementation: AT-POS, postcircumfix:<[ ]>, postcircumfix:<[; ]> .

Performance

  • Performance of CSR and DOK sparse matrices is not good: between 40 to 150 times slower than Wolfram Language.
    • (Using the same matrices, of course.)
  • It is somewhat surprising that DOK is faster than CSR.
    • (Using pure-Raku.)
  • NativeCall based implementations are ≈ 100 times faster.

Acknowledgements

Thanks to @lizmat and @tony-o for helping figuring out the proper use of postcircumfix:<[]> and postcircumfix:<[; ]> in order to have the named rows and columns functionalities.


References

Books

[SP1] Sergio Pissanetzky, Sparse Matrix Technology, Academic Pr (January 1, 1984), ISBN-10: 0125575807, ISBN-13: 978-0125575805.

Packages

[AAp1] Anton Antonov, ML::StreamsBlendingRecommender Raku package, (2021-2024), GitHub/antononcube.

[AAp2] Anton Antonov, Graph Raku package, (2024), GitHub/antononcube.

[AAp3] Anton Antonov, Math::SparseMatrix::Native Raku package, (2024), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, "TRC 2022 Implementation of ML algorithms in Raku", (2022), YouTube/antononcube.

About

Raku package for sparse matrix technology.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages