# Python interface to the SuiteSparse Matrix Collection

This notebook walks you through some of the features of the `ssgetpy` package that provides a search and download interface for the [Suite Sparse](https://suitesparse.com) matrix collection. 

The simplest way to install `ssgetpy` is via:
```
pip install ssgetpy
```

This installs both the `ssgetpy` Python module as well as the `ssgetpy` command-line script. 


This notebook only covers the library version of `ssgetpy`. To get more information on the command-line script run:
```
$ ssgetpy --help
```

Before proceeding with the rest of this notebook, please install `ssgetpy` into your environment. If you are running this notebook under Binder, `ssgetpy` will already be installed for you. If you are running this notebook in Google Colaboratory, the following cell will install `ssgetpy`: 

In [1]:
ipy = get_ipython()
if 'google.colab' in str(ipy):
  import sys
  ipy.run_cell('!{sys.executable} -m pip install ssgetpy')

First import `ssgetpy` via:

In [2]:
import ssgetpy

## Basic query interface

The primary interface to `ssgetpy` is via `ssgetpy.search`.  Running `search` without any arguments returns the first 10 matrices in the collection:

In [3]:
ssgetpy.search()

Id,Group,Name,Rows,Cols,NNZ,DType,2D/3D Discretization?,SPD?,Pattern Symmetry,Numerical Symmetry,Kind,Spy Plot
1,HB,1138_bus,1138,1138,4054,real,No,Yes,1.0,1.0,power network problem,
2,HB,494_bus,494,494,1666,real,No,Yes,1.0,1.0,power network problem,
3,HB,662_bus,662,662,2474,real,No,Yes,1.0,1.0,power network problem,
4,HB,685_bus,685,685,3249,real,No,Yes,1.0,1.0,power network problem,
5,HB,abb313,313,176,1557,binary,No,No,0.0,0.0,least squares problem,
6,HB,arc130,130,130,1037,real,Yes,No,0.76,0.0,materials problem,
7,HB,ash219,219,85,438,binary,No,No,0.0,0.0,least squares problem,
8,HB,ash292,292,292,2208,binary,No,No,1.0,1.0,least squares problem,
9,HB,ash331,331,104,662,binary,No,No,0.0,0.0,least squares problem,
10,HB,ash608,608,188,1216,binary,No,No,0.0,0.0,least squares problem,


Notice that search result comes with minimal Jupyter integration that shows some metadata along with the distribution of the non-zero values. Click on the group or name link to go a web page in the SuiteSparse matrix collection that has much more information about the group or the matrix respectively.

### Query filters

You can add more filters via keyword arguments as follows:

|Argument | Description | Type | Default | Notes |
|---------|-------------|------|---------|-------| 
|`rowbounds` | Number of rows | `tuple`: `(min_value, max_value)` | `(None, None)`| `min_value` or `max_value` can be `None` which implies "don't care" |
|`colbounds` | Number of columns | `tuple`: `(min_value, max_value)` | `(None, None)` | |
|`nzbounds`  | Number of non-zeros | `tuple`: `(min_value, max_value)` | `(None, None)`| |
|`isspd`     | SPD? | `bool` or `None` | `None` | `None` implies "don't care" |
|`is2d3d` | 2D/3D Discretization? | `bool` or `None` | `None` | |
| `dtype` | Non-zero data type | `real`, `complex`, `binary` or `None` | `None` | |
| `group` | Matrix group | `str` or `None` | `None` | Supports partial matches; `None` implies "don't care" |
| `kind` | Problem domain | `str` or `None` | `None` | Supports partial matches; `None` implies "don't care" |
| `limit` | Max number of results | `int` | 10 | |

> Note that numerical and pattern symmetry filters are not yet supported.

As an example of using the above filters, here is a query that returns five, non-SPD matrices with $1000\leq \text{NNZ} \leq 10000$:

In [4]:
ssgetpy.search(nzbounds=(1000,10000), isspd=False, limit=5)

Id,Group,Name,Rows,Cols,NNZ,DType,2D/3D Discretization?,SPD?,Pattern Symmetry,Numerical Symmetry,Kind,Spy Plot
5,HB,abb313,313,176,1557,binary,No,No,0.0,0.0,least squares problem,
6,HB,arc130,130,130,1037,real,Yes,No,0.76,0.0,materials problem,
8,HB,ash292,292,292,2208,binary,No,No,1.0,1.0,least squares problem,
10,HB,ash608,608,188,1216,binary,No,No,0.0,0.0,least squares problem,
12,HB,ash958,958,292,1916,binary,No,No,0.0,0.0,least squares problem,


## Working with search results
The result of a search query is a collection of `Matrix` objects. The collection can be sliced using the same syntax as for vanilla Python `list`s as shown below:

In [5]:
result = ssgetpy.search(kind='structural', nzbounds=(1000,10000))
result[:4]

Id,Group,Name,Rows,Cols,NNZ,DType,2D/3D Discretization?,SPD?,Pattern Symmetry,Numerical Symmetry,Kind,Spy Plot
24,HB,bcsstk02,66,66,4356,real,Yes,Yes,1.0,1.0,structural problem,
26,HB,bcsstk04,132,132,3648,real,Yes,Yes,1.0,1.0,structural problem,
27,HB,bcsstk05,153,153,2423,real,Yes,Yes,1.0,1.0,structural problem,
28,HB,bcsstk06,420,420,7860,real,Yes,Yes,1.0,1.0,structural problem,


An individual element in the collection can be used as follows:

In [6]:
small_matrix = result[0]
small_matrix

Id,Group,Name,Rows,Cols,NNZ,DType,2D/3D Discretization?,SPD?,Pattern Symmetry,Numerical Symmetry,Kind,Spy Plot
24,HB,bcsstk02,66,66,4356,real,Yes,Yes,1.0,1.0,structural problem,


In [7]:
small_matrix.nnz

4356

We can download a matrix locally using the `download` method:

In [8]:
small_matrix.download()

('C:\\Users\\drdar\\AppData\\Roaming\\ssgetpy\\MM\\HB\\bcsstk02.tar.gz',
 'C:\\Users\\drdar\\AppData\\Roaming\\ssgetpy\\MM\\HB\\bcsstk02.tar.gz')

The `download` methods supports the following arguments:

|Argument| Description | Data type | Default value | Notes|
|--------|-------------|-----------|---------------|------|
|`format`| Sparse matrix storage format | One of (`'MM', 'RB', 'MAT'`) | `MM` | `MM` is Matrix Market, `RB` is Rutherford-Boeing and `MAT` is MATLAB MAT-file format|
|`destpath` | Path to download | `str` | `~/.ssgetpy` on Unix `%APPDATA%\ssgetpy` on Windows | The full filename for the matrix is obtained via `os.path.join(destpath, format, group_name, matrix_name + extension)`where `extention` is `.tar.gz` for `MM` and `RB` and `.mat` for `MAT`|
|`extract` | Extract TGZ archive? | `bool` | `False` | Only applicable to `MM` and `RB` formats |

The return value is a two-element `tuple` containing the local path where the matrix was downloaded to along with the path for the extracted file, if applicable. 

Note that `download` does not actually download the file again if it already exists in the path. 

In [9]:
small_matrix.download()

('C:\\Users\\drdar\\AppData\\Roaming\\ssgetpy\\MM\\HB\\bcsstk02.tar.gz',
 'C:\\Users\\drdar\\AppData\\Roaming\\ssgetpy\\MM\\HB\\bcsstk02.tar.gz')

In [10]:
small_matrix.download(extract=True)

('C:\\Users\\drdar\\AppData\\Roaming\\ssgetpy\\MM\\HB\\bcsstk02',
 'C:\\Users\\drdar\\AppData\\Roaming\\ssgetpy\\MM\\HB\\bcsstk02.tar.gz')

Finally, `download` also works directly on the output of `search`, so you don't have to download one matrix at a time. For example, to download the first five matrices in the previous query, you could use:

In [11]:
result[:5].download()

HBox(children=(FloatProgress(value=0.0, description='Overall progress', max=5.0, style=ProgressStyle(descripti…




In [1]:
import ssgetpy

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
result = ssgetpy.search(kind='structural', nzbounds=(0,10000), limit = 10000)
len(result)


88

In [10]:
small_matrix = result[1]
import os
current_dir = os.getcwd()
small_matrix.download(format="MAT", destpath=current_dir, extract=True)

bcsstk02: 20480B [00:01, 13240.30B/s]                            


('/home/qxj/AutoSparse/suitsparce/bcsstk02.mat',
 '/home/qxj/AutoSparse/suitsparce/bcsstk02.mat')