SPFlow is an open-source functional-oriented Python package for Probabilistic Circuits (PCs) with ready-to-use implementations for Sum-Product Networks (SPNs). PCs are a class of powerful deep probabilistic models - expressible as directed acyclic graphs - that allow for tractable querying. This library provides routines for creating, learning, manipulating and interacting with PCs and is highly extensible and customizable.

Creating Models:

In this section we will show how to create simple Sum-Product networks based on sum and product layers

Scopes

Scope objects represent scopes of features, denoted by indices. A scope contains two parts. 
The query of the scopes indicate the features, the scope represents. 
The evidence of the scopes contains any features that the query features are conditionals of (empty for non-conditional scopes).
Scopes are generally only explicitly specified for leaf distributions, and inferred recursively for nodes higher in the graph structure.

In [1]:
from spflow.meta.data import Scope

scope = Scope([0,4,3], [1,2]) # scope over indices 0,4,3 conditional on indices 1,2


Modules

Models in SPFlow are built from modules, that can represent one or more nodes in the form of a layer.
A layer is defined by its event shape (out_features, out_channel, num_repetitions). The number of output features and
the number of output channel are mandatory. The number of output features is implicitly derived from the scope of the module,
whereas the number of output channel either has to be passed explicitly or can be derived from the module parameters.
The number of repetitions is optional, and therefore it can be None.
In general, each module expects one input Module. Exceptions will be discussed later in the notebook.

Leaf Layer

The leaf layer represents the lowest layer in the SPN and can be created for different distributions.
The event shape can be derived either from given parameters or by explicitly passing the shape as intitialization parameter.
For example a leaf layer with a Normal distribution can be initialized via

- init parameters: Then a random mean and variance is generated or,
- distribution parameter shape: If mean and variance is given, the event shape can be derived from the shape of the parameters

In [2]:
import torch
from spflow.modules.leaf import Normal

out_feature = 2
out_channels = 4
scope_normal = Scope([0,1])

mean = torch.randn((out_feature,out_channels))
std = torch.rand((out_feature,out_channels))

normal_layer = Normal(scope=scope_normal, mean=mean, std=std) # init via parameters

normal_layer2 = Normal(scope=scope_normal, out_channels=out_channels)


The following distributions are implemented as leaf module: Bernoulli, Binomial, Categorical, Exponential, Gamma,
Geometric, Hypergeometric, LogNormal, NegativeBinomial, Normal, Poisson and Uniform

Sum layer

The sum layer calculates the sum over the weighted output of its input module.
Similar to the leaf layer, the sum layer can be initialized by explicitly passing the number of out_channel or
by passing the weight matrix. From the weight matrix, the module can derive the event shape

In [3]:
from spflow.modules import Sum

sum_out_channels = 5
weights = torch.rand((out_feature, out_channels, sum_out_channels))
weights /= torch.sum(weights, axis=1, keepdims=True)

sum_layer = Sum(inputs=normal_layer, weights=weights)

sum_layer2 = Sum(inputs=normal_layer, out_channels=5)

Product layer

The product layer calculates the product over all input features




In [4]:
from spflow.modules import Product

product_layer = Product(inputs=normal_layer)

Now we can create a model by stacking sum and product layers at will with a leaf layer as lowest layer:

In [5]:
out_channels = 4
scope_normal = Scope([0,1])

normal_layer = Normal(scope=scope_normal, out_channels=out_channels)
product_layer = Product(inputs=normal_layer)
spn = Sum(inputs=product_layer, out_channels=1)


Inference

For inference, SPFlow offers the computation of the log-likelihoods of data for models. Inference takes a two-dimensional data set and returns a two-dimensional array containing the (log-)likelihoods, where each row corresponds to an input data instance.
The number of columns corresponds to the number of outputs of the module inference is performed on.
Missing data (i.e., NaN values) is implicitly marginalized over. This means that a completely missing data instance (all NaN row) outputs a likelihood of 1 (and corresponding log-likelihood of 0).


In [6]:
from spflow import log_likelihood

num_instances = 10
num_feature = 2

data = torch.randn((num_instances,num_feature))
ll = log_likelihood(spn, data)


Sampling

SPFlow can also sample data from models, possibly in the presence of evidence.
Generating a sample from a model can be done as follows:

In [7]:
from spflow import sample

samples = sample(spn)

# Drawing multiple samples at once can be done similary, by providing the number of target samples to generate:
samples = sample(spn, 100)


Sampling with evidence:
In the case of evidence, a partially filled data set is passed to sample instead.
The routine fills the data tensor in-place, taking specified evidence into account.
Keeping track of the return value should not be necessary in this case, since the input data set should be modified in-place.
However, it is good-practice to (re-)assign it nonetheless.
Note, that sample(model) and sample(model, n) are simply convenient aliases that create an empty data set of appropriate shape to fill with generated values.


In [8]:
# one feature has given values
evidence = torch.randn((num_instances,1))

nan_tensor = torch.full((num_instances,1), fill_value=torch.nan)
data = torch.concatenate((nan_tensor,evidence), dim=1)

data = sample(spn, data)

Training models:
Both EM and a Gradient Descent are already implemented within this library.

Expectation Maximization:
Each module has an EM method which allows applying EM on any architecture.
The necessary mle methods are also already implemented in each of the given distributions.

Gradient descent:
To train your model with gradient descent, there is a template method train_gradient_descent, in which all necessary hyperparameters can be selected as needed.

In [9]:
from spflow.learn import expectation_maximization

num_instances = 10
num_feature = 2
out_channels = 4
scope_normal = Scope([0,1])

normal_layer = Normal(scope=scope_normal, out_channels=out_channels)
product_layer = Product(inputs=normal_layer)
spn = Sum(inputs=product_layer, out_channels=1)

data = torch.randn((num_instances,num_feature))
expectation_maximization(module=spn, data=data)

tensor([-9.8831, -2.7096, -2.7096])

Gradient descent:
To train your model with gradient descent, there is a template method train_gradient_descent, in which all necessary hyperparameters can be selected as needed.

In [10]:
from spflow.learn import train_gradient_descent
from torch.utils.data import DataLoader, TensorDataset

num_instances = 10
num_feature = 2
out_channels = 4
scope_normal = Scope([0,1])

normal_layer = Normal(scope=scope_normal, out_channels=out_channels)
product_layer = Product(inputs=normal_layer)
spn = Sum(inputs=product_layer, out_channels=1)

data = torch.randn((num_instances,num_feature))
dataset = TensorDataset(data)
dataloader = DataLoader(dataset, batch_size=10)

train_gradient_descent(spn, dataloader)



Advanced layers

In addition to the basic sum and product layers, there are also more advanced layers that are implemented in this library:
Elementwise-Sum layer, Outer-product layer, Elementwise-Product layer.

All layers above expect either a list of modules or a Split module as input.

A Split module is a module that takes a single module as input and splits it for further usage.
In general, in a Split module you can define the number of splits and the dimension you want to split along (dim 1 = feature dim, dim 2 = channel dim)
There are two specific types of Split modules that are already implemented:
The first one is the SplitHalves module.
It splits the module in the middle e.g. [0,1,2,3,4,5] -> [0,1,2],[3,4,5].
The other Split module is the SplitAlternate module.
It splits the module in an alternating fashion e.g. [0,1,2,3,4,5] -> [0,2,4],[1,3,5]

A Split module can be created like this:

In [11]:
from spflow.modules.ops.split_halves import SplitHalves
from spflow.modules.ops.split_alternate import SplitAlternate

scope_normal = Scope([0,1,2,3,4,5])
out_channels = 4

normal_layer = Normal(scope=scope_normal, out_channels=out_channels)

split_halves = SplitHalves(inputs=normal_layer, dim=1, num_splits=2)
print(split_halves.feature_to_scope)

split_alternate = SplitAlternate(inputs=normal_layer, dim=1, num_splits=2)
print(split_alternate.feature_to_scope)

[[0, 1, 2], [3, 4, 5]]
[[0, 2, 4], [1, 3, 5]]


Elementwise Sum

The elementwise sum module calculates the elementwise sum over the output channel of the input modules.
Therefore, the input modules need to have the same number of output channel or number of output channel must be broadcastable.

In [12]:
from spflow.modules.elementwise_sum import ElementwiseSum

scope_normal = Scope([0,1,2,3,4,5])
leaf_out_channels = 4
sum_out_channels = 2

normal_layer_1 = Normal(scope=scope_normal, out_channels=leaf_out_channels)
normal_layer_2 = Normal(scope=scope_normal, out_channels=leaf_out_channels)

elementwise_sum = ElementwiseSum(inputs=[normal_layer_1,normal_layer_2], out_channels=sum_out_channels)


Elementwise Product

In contrast to the default product layer, where the module multiplies the input along the feature dimension,
the elementwise product layer calculates the elementwise product over the output channel.
The input either has to be a list of inputs with disjoint scopes or a split module.

Example:
input 1 with 3 out_channel(OC) [OC_1_0,OC_1_1,OC_1_2]
input 2 with 3 out_channel(OC) [OC_2_0,OC_2_1,OC_2_2]
result: [OC_1_0 * OC_2_0,OC_1_1 * OC_2_1,OC_1_2 * OC_2_2]

In [13]:
from spflow.modules.elementwise_product import ElementwiseProduct
scope_normal_1 = Scope([0,1,2])
scope_normal_2 = Scope([3,4,5])
leaf_out_channels = 3

normal_layer_1 = Normal(scope=scope_normal_1, out_channels=leaf_out_channels)
normal_layer_2 = Normal(scope=scope_normal_2, out_channels=leaf_out_channels)

elementwise_prod = ElementwiseProduct(inputs=[normal_layer_1, normal_layer_2])
print("num_features: ", elementwise_prod.out_features)
print("num_channel: ", elementwise_prod.out_channels)



num_features:  3
num_channel:  3


Outer Product

This product layer calculates the outer product over the output channel.
Again, the input either has to be a list of inputs with disjoint scopes or a split module.

Example:
input 1 with 3 out_channel(OC) [OC_1_0,OC_1_1,OC_1_2]
input 2 with 3 out_channel(OC) [OC_2_0,OC_2_1,OC_2_2]
result: [OC_1_0 * OC_2_0, OC_1_0 * OC_2_1, OC_1_0 * OC_2_2]
        [OC_1_1 * OC_2_0, OC_1_1 * OC_2_1, OC_1_1 * OC_2_2]
        [OC_1_2 * OC_2_0, OC_1_2 * OC_2_1, OC_1_2 * OC_2_2]



In [14]:
from spflow.modules.outer_product import OuterProduct
scope_normal_1 = Scope([0,1,2])
scope_normal_2 = Scope([3,4,5])
leaf_out_channels = 3

normal_layer_1 = Normal(scope=scope_normal_1, out_channels=leaf_out_channels)
normal_layer_2 = Normal(scope=scope_normal_2, out_channels=leaf_out_channels)

elementwise_prod = OuterProduct(inputs=[normal_layer_1, normal_layer_2])
print("num_features: ", elementwise_prod.out_features)
print("num_channel: ", elementwise_prod.out_channels)

num_features:  3
num_channel:  9


Sampling Context

The SamplingContext class controls the sampling process and is passed to the sampling routing. It is mostly used internally, although users can use it manually if needed.
It consists of 3 parts:
1. channel index: The channel index is a tensor that indicates the channel from which the corresponding feature is supposed to be sampled for each instance.
Therefore it has the form (batch_size, out_features) with values between [0, out_channel].
2. mask: The mask is applied on the samples and indicates which samples should be considered. It has the same shape as the channel index
3. repetition index: If the model has a repetition dimension the repetition index indicates from which repetition each instance is supposed to be sampled.
If the model has no repetition dimension the index is None.

In the example below, we define that for the first instance feature 0 gets sampled from channel 0, feature 1 from channel 1 etc.
For the second instance, we use channel 0 for all features.
Additionally we define via the repetition index that we sample the first instance from repetition 0 and the second instance from repetition 1.

In [15]:
from spflow.meta.dispatch import SamplingContext

scope = Scope([0,1,2,3,4,5])
module = Normal(scope=scope, out_channels=6, num_repetitions=3)

# Setup sampling context
n_samples = 2
data = torch.full((n_samples, 6), torch.nan)
channel_index = torch.tensor([[0,1,2,3,4,5],[0,0,0,0,0,0]], dtype=torch.int64)
mask = torch.full((n_samples, 6), True, dtype=torch.bool)

repetition_index = torch.tensor([0,1], dtype=torch.int64)

sampling_ctx = SamplingContext(channel_index=channel_index, mask=mask, repetition_index=repetition_index)

# Sample
samples = sample(module, data, sampling_ctx=sampling_ctx)



Dispatch:

Internally, SPFlow uses dispatch to call the correct implementation based on the specified module classes.
All dispatched functions accept a dispatch_ctx keyword argument, taking a DispatchContext instance.
Amongst other things, this takes care of memoization. In most cases, the dispatch context is created automatically. However, there are a few scenarios in which users might want to deal with the dispatch context directly. For example, for routines using memoization, like (log-)likelihood, the dispatch context stores the outputs of all modules:



In [16]:
from spflow.meta.dispatch import DispatchContext
from spflow import log_likelihood

# create dispatch context
dispatch_ctx = DispatchContext()

# compute log likelihoods
log_likelihoods = log_likelihood(spn, data, dispatch_ctx=dispatch_ctx)

# inspect cached log-likelihood outputs
print(dispatch_ctx.cache['log_likelihood'])


{Normal(
  D=2, C=4, R=None
  (distribution): Normal()
): tensor([[[ -28.2992,   -1.3575,   -1.1106,  -11.2995],
         [  -5.0287,   -3.8192,   -2.4297,  -30.9220]],

        [[-118.7968,    0.9387,   -0.8218,   -8.4038],
         [  -2.6555,  -79.7616,  -22.5136,   -2.4236]]],
       grad_fn=<IndexPutBackward0>), Product(
  D=1, C=4, R=None
  (inputs): Normal(
    D=2, C=4, R=None
    (distribution): Normal()
  )
): tensor([[[ -33.3279,   -5.1767,   -3.5402,  -42.2216]],

        [[-121.4522,  -78.8230,  -23.3354,  -10.8274]]],
       grad_fn=<SumBackward1>), Sum(
  D=1, C=1, R=None, weights=(1, 4, 1)
  (inputs): Product(
    D=1, C=4, R=None
    (inputs): Normal(
      D=2, C=4, R=None
      (distribution): Normal()
    )
  )
): tensor([[[ -3.9496]],

        [[-12.3731]]], grad_fn=<ViewBackward0>)}


Implemented structures:

Rat-Spn:

The rat-spn is a model structure based on ... .  It builds a deep network structure by recursively partitioning the features (variables) into random subsets and alternating between sum and product layers.
The following hyperparameters define the structure of the model:
- leaf_modules: A list of Leaf modules that define the base distributions and their corresponding scope,
- n_root_nodes: The number of classes,
- n_region_nodes: The number of sums / number of output channel for each sum layer,
- num_repetitions: The number of repetitions,
- depth: Depth of the network,
- outer_product: True if you want to use the outer product Layer and False for the elementwise product layer ,
- split_halves: True if you want to use the Split halves split module and False for the alternating split module
- num_splits: The number of splits for each split module, default=2,

The Rat SPN module can be used just like any other module.
For classification, the module already has an implementation for the posterior

In [17]:
from spflow.modules.rat.rat_spn import RatSPN

depth = 3
n_region_nodes = 10
num_leaves = 10
num_repetitions = 5
n_root_nodes = 1
num_feature = 128

scope = Scope(list(range(0, num_feature)))

leaf_layer = Normal(scope=scope, out_channels=5)

model = RatSPN(
    leaf_modules=[leaf_layer],
    n_root_nodes=n_root_nodes,
    n_region_nodes=n_region_nodes,
    num_repetitions=num_repetitions,
    depth=depth,
    outer_product=True,
    split_halves=True,
)

It is also possible to use different base distributions for different scope sections

In [18]:
from spflow.modules.leaf import Binomial

depth = 3
n_region_nodes = 10
num_leaves = 10
num_repetitions = 5
n_root_nodes = 1
num_feature = 128

scope_1 = Scope(list(range(0, num_feature//2)))
scope_2 = Scope(list(range(num_feature//2, num_feature)))

leaf_layer_1 = Normal(scope=scope_1, out_channels=5)
leaf_layer_2 = Binomial(scope=scope_2, out_channels=5, n=torch.tensor(2))

model = RatSPN(
    leaf_modules=[leaf_layer_1,leaf_layer_2],
    n_root_nodes=n_root_nodes,
    n_region_nodes=n_region_nodes,
    num_repetitions=num_repetitions,
    depth=depth,
    outer_product=True,
    split_halves=True,
)


Another structure that is already implemented is the LearnSPN, based on ... .
In contrast to the Rat-Spn, this structure learns the SPN architecture from given data.
The following hyperparameters define the structure of the model:
- leaf_modules: A list of Leaf modules that define the base distributions and their corresponding scope
- out_channels: The number of output channels of each sum layer,
- min_features_slice: The number of features from which splitting is no longer performed,
- min_instances_slice: The number of instances from which clustering is no longer performed,
- clustering_method: The clustering method, default= kmeans,
- partitioning_method: The partitioning method, default = rdc

In [19]:
from sklearn.datasets import make_moons
from spflow.learn.learn_spn import learn_spn

# example on the make_moons dataset
torch.manual_seed(0)
X, y = make_moons(n_samples=1000, noise=0.1, random_state=42)

scope = Scope(list(range(2)))
normal_layer = Normal(scope=scope, out_channels=4)

spn = learn_spn(
    torch.tensor(X, dtype=torch.float32),
    leaf_modules=normal_layer,
    out_channels=1,
    min_instances_slice=70,
    min_features_slice = 2
)



used 6 iterations (0.0152s) to cluster 1000 items into 2 clusters
used 5 iterations (0.0s) to cluster 497 items into 2 clusters
used 2 iterations (0.0s) to cluster 276 items into 2 clusters
used 6 iterations (0.0041s) to cluster 101 items into 2 clusters
used 3 iterations (0.0s) to cluster 175 items into 2 clusters
used 6 iterations (0.0096s) to cluster 101 items into 2 clusters


  weights = torch.tensor(w).T.unsqueeze(0).unsqueeze(-1)  # shape(1, num_clusters, 1)


used 8 iterations (0.0065s) to cluster 221 items into 2 clusters
used 8 iterations (0.0065s) to cluster 128 items into 2 clusters
used 7 iterations (0.0046s) to cluster 105 items into 2 clusters
used 4 iterations (0.01s) to cluster 93 items into 2 clusters
used 6 iterations (0.0015s) to cluster 503 items into 2 clusters
used 6 iterations (0.0067s) to cluster 208 items into 2 clusters
used 6 iterations (0.0071s) to cluster 97 items into 2 clusters
used 4 iterations (0.0s) to cluster 111 items into 2 clusters
used 5 iterations (0.0s) to cluster 295 items into 2 clusters
used 5 iterations (0.009s) to cluster 184 items into 2 clusters
used 4 iterations (0.0s) to cluster 75 items into 2 clusters
used 2 iterations (0.0046s) to cluster 109 items into 2 clusters
used 6 iterations (0.0s) to cluster 111 items into 2 clusters
