# 2 - Building and using matrices

Before we dive into it, let's think about what we need to actually build a matrix. What specific data would you need? What don't you need?

## Exercise

Please think about the minimal set of information you would need to build a *sparse matrix* using [scipy.sparse.coo_matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html) (sparse matrices store only non-zero values). Then, create this information as Numpy arrays and actually build a sparse matrix.

Here is the matrix you should build:

$$\begin{bmatrix} 0 & 1 \\ 2 & 3 \end{bmatrix}$$

## Hint

You will need three Numpy arrays: one for the data, one for the row indices, and one for the column indices.

## Solution

In [1]:
import numpy as np
from scipy import sparse

data = np.array([1, 2, 3])
rows = np.array([0, 1, 1])
cols = np.array([1, 0, 1])

matrix = sparse.coo_matrix((data, (rows, cols)), (2, 2))
matrix.toarray()

array([[0, 1],
       [2, 3]])

## `bw_processing`

We can run into difficulties when we want to store this data. The library `bw_processing` helps us create data packages, which can store this matrix-building data on variety of file systems. You can read the [`bw_processing` README](github.com/brightway-lca/bw_processing) for more information, and can see the [PyFilesystem2 Docs](https://docs.pyfilesystem.org/en/latest/) for more on the filesystems that can be used.

Let's define this same matrix in `bw_processing`.

Matrices by definition are two-dimensional, so we know that to build matrices we will always need to specify the row and column indices of the data. We combine these two arrays into a single Numpy [structured array](https://numpy.org/doc/stable/user/basics.rec.html), which uses the labels `row` and `col`.

In [2]:
import bw_processing as bwp
import numpy as np

indices_array = np.array([(0, 1), (1, 0), (1, 1)], dtype=bwp.INDICES_DTYPE)
indices_array

array([(0, 1), (1, 0), (1, 1)], dtype=[('row', '<i4'), ('col', '<i4')])

In [3]:
indices_array['row']

array([0, 1, 1], dtype=int32)

In [4]:
bwp.INDICES_DTYPE

[('row', numpy.int32), ('col', numpy.int32)]

The data array is the same as before:

In [5]:
data_array = np.array([1, 2, 3])
data_array

array([1, 2, 3])

This is all we need to create a data package:

In [24]:
dp = bwp.create_datapackage()

dp.add_persistent_vector(
    matrix="some name",
    data_array=data_array,
    name="some name",
    indices_array=indices_array,
)

But before this gets too abstract, let's do the same for our example system:

<img src='images/simple-graph.png' width='400'>

Here we will need three data packages - one for each matrix. Our basic matrix equation is:

$h = CB \cdot diag ( A^{-1}f )$

Where **A** is the technosphere matrix, **B** is the biosphere matrix, and **C** is the characterization matrix.

Do nodes go in the matrices? Do edges?

In [None]:
(we build our datapackages here)

## `matrix_utils`

A datapackage is just a package... of data. Not a matrix. Let's build one using `matrix_utils`!

In [27]:
import matrix_utils as mu

In [31]:
mapped_matrix = mu.MappedMatrix(packages=[technosphere], matrix="technosphere_matrix")
mapped_matrix.matrix.toarray()

array([[0., 1.],
       [2., 3.]])

In [33]:
mapped_matrix.packages

{<bw_processing.datapackage.FilteredDatapackage at 0x11a408be0>: [<matrix_utils.resource_group.ResourceGroup at 0x11a4088b0>]}

Why is this matrix mapped?

We can now use `bw2calc` just as before.

In [1]:
(create an LCA with our datapackages)

Why did we never have to switch to our class project?

Can we fix one big modelling error in our bike inventory?