In [1]:
import numpy as np
import gprob as gp

Regular random variables in gprob, those produced by `normal`, are always stored in a way suitable to represent the most general multivariate distributions, even when the actual distributions are trivial products. 

For stacks of mutually independent multivariate distributions, there is a dedicated representation called sparse normal variables. Such variables are created via the `iid` function, e.g. like

In [2]:
gp.iid(gp.normal(), 3)

SparseNormal(mean=[0. 0. 0.], var=[1. 1. 1.], iaxes=(0,))

iaxes ("independence axes") here are the array axes along which the variables at different indices are independent.

To see the point, compare the memory footprints of identical product distributions stored as a regular (dense) and sparse arrays

In [3]:
import sys

def getsizeofnormal(x):
    """Estimates the memory footprint of a normal variable in bytes."""
    x_ = x * np.ones(x.shape)  # Converts all views to new arrays.
    return sys.getsizeof(x_.a) + sys.getsizeof(x_.b) + sys.getsizeof(x_.lat)

sz = 1000

x = gp.normal(size=sz)
y = gp.iid(gp.normal(), sz)

print(f"Dense: {getsizeofnormal(x) / 2**20:.3f} MB")
print(f"Sparse: {getsizeofnormal(y) / 2**10:0.3} kB")

Dense: 7.672 MB
Sparse: 16.1 kB


Sparse varaibles can be used in gprob functions in the same way as the dense ones. There are limitations for them, however:

* When sparse operations are combined together in arithmetic operations, stacking or concatentaion, all operands and the operation result must have identical independence axes. There is no way to create a dense random variable from a sparse one now. 
* When sparse varaibles are indexed, the independence axes must be taken as a whole using full slices, `:`, or ellipses, `...`.
* Vector operations such as `@` or `einsum` cannot contract independence axes.
* Reshaping cannot affect independence axes.

There are other subtleties, but the above rules are the main ones to be aware of. Below they are illustrated for two variables, `x` and `y`, both of which have 10 by 10 shapes, and consist of sub-distributions independent along the 0th axis.

In [4]:
x = gp.iid(gp.normal(size=10), 10)
y = gp.iid(gp.normal(size=10), 10)

In [5]:
x + y  # This is permitted.

try:
    x + y.T  # But this is not, because the independence axes 
             # of x and y.T are different.
except ValueError as e:
    print(str(e))

Incompatible locations of the independence axes of the operands: (0,), (1,). Combining sparse normal variables requires them to have the same numbers of independence axes at the same positions in the shape and in the same order.


In [6]:
# These operations are permitted.
gp.stack([x, y])
gp.concatenate([x, y], axis=1)

# But these are not.
try:
    gp.stack([x, y.T])
except ValueError as e:
    print(str(e))

try:
    gp.concatenate([x, y], axis=0)
except ValueError as e:
    print(str(e))

Incompatible locations of the independence axes of the operands: (0,), (1,). Combining sparse normal variables requires them to have the same numbers of independence axes at the same positions in the shape and in the same order.
Concatenation along independence axes is not allowed.


In [7]:
c = np.ones((10, 10))

x @ c  # This is permitted, because the matrix multiplication 
       # contracts a regular axis.

try:
    c @ x  # But this is not, because the matrix multiplication would contract 
           # the independence axis.
except ValueError as e:
    print(str(e))

Matrix multiplication contracting over independence axes is not supported. Axis 0 of operand 2 is contracted.


In [8]:
x.reshape((10, 2, 5))  # Reshaping is permitted for dense axes.

try:
    x.reshape((2, 5, 10))  # But not for sparse axes.
except ValueError as e:
    print(str(e))

try:
    x.reshape((100,))  # And neither for their mixtures.
except ValueError as e:
    print(str(e))

Reshaping that affects independence axes is not supported. Axis 0 is affected by the requested shape transformation (10, 10) -> (2, 5, 10).
Reshaping that affects independence axes is not supported. Axis 0 is affected by the requested shape transformation (10, 10) -> (100,).


Another distinction between the regular and sparse variables is the shape of their covaraince. When calculating covariances between sparse variables, only diagonals are returned between the independence axes, as all the elements at non-diagonal indices for those axes are zero. Compare, for example, the covariance between the sparse variables below

In [9]:
x = gp.iid(gp.normal(), 4)
y = gp.iid(gp.normal(), 4) + np.array([1, 2, 3, 4]) * x

gp.cov(x, y)

array([1., 2., 3., 4.])

with the covariance between two regular variables with the same correlation properties

In [10]:
x = gp.normal(size=4)
y = gp.normal(size=4) + np.array([1, 2, 3, 4]) * x

gp.cov(x, y)

array([[1., 0., 0., 0.],
       [0., 2., 0., 0.],
       [0., 0., 3., 0.],
       [0., 0., 0., 4.]])