Sum-Product networks (SPNs) are a deep architecture capable of modeling joint distributions supporting tractable exact integration. Bayesian networks (BNs) allow for direct modeling of independencies and relationships between variables, but integrating them (say, to normalize) is generally intractable. In this thesis, we propose an approach of compiling BNs into SPNs, that is, capturing the distribution of one model in another to leverage the advantages of both frameworks. By doing so we achieve tractable integration in a SPN that simultaneously captures a BN’s relationships and independencies.
My work focuses on investigating and motivating a methodology for compiling Bayesian networks with the structure X → Y into SPNs. We then present the compilation of complex yet tractable linear Gaussian graphical probabilistic models into SPNs, providing a comparative analysis of the resulting representations. Following these results I present a generalized algorithm and framework for compiling any BN with continuous variables bounded to a finite interval into a SPN.
Compiling (translating) a bayesian network into a SPN is essentially asking the question: How can we choose the function
The faithful bayesian network
If we restrict ourselves to one-dimensional leaves1, it should be clear that we cannot construct
because our grammar restricts us from to multiplying dependent terms (because of the partition requirement). The right hand side of the product will always depend on both
Have we failed? No. We just have to lower our expectations. We have proven that we cannot exactly translate even a simple, two node bayesian network into a SPN. We will have to make due with an approximation.
One helpful takeaway from the previous section is that we cannot capture covariances with the product-unit. Since we refuse to offload all the work onto the leaf-units, we will have to use sum-nodes to create a correlation between our variables.
Let's motivate our final chosen approach by starting off with the following observation, where
by the integral-extraction properties of
Since
The above expression is fully compatible with the SPN framework, as it is a finite mixture of products of independent leaf nodes. As we increase the cardinality of our partition, we get better point-wise precision.The only assumption used is, that a finite partition of the domain of X exist. This highlights another limitation of this translation procedure: The resulting SPN can never induce a distribution with infinite support.
The slopyform distribution
Defining a LGPGM is pretty intuitive.
from lgpgm import noise
A = "A" @ noise # noise is an i.i.d. gaussian with mean 0 and variance 1. The @ operator assigns a name to the variable.
B = "B" @ (A + 1.2*noise + 1) # They can be composed with other variables, and more noise can be added.
C = "C" @ (A + noise + 4)
D = "D" @ (B + 0.3*C + noise - 5)
K = "K" @ (noise - 6)
J = "J" @ (0.5 * K - 3 + 0.2*noise)
A & K # Here, we join the two pgms, which have no connecting edges between them.
display(A.get_graph(detailed=False))
display(A.get_graph(detailed=True))
We can construct an SPN approximating this multivariate normal with the following code.
import spnhelp
spn = spnhelp.lgpgm_to_spn(A, crit=spnhelp.CRIT_bounded_deviation, crit_param = 0.1, sloped=False) # eps can be lowered to get a better approximation
spnhelp.plot_marginals(spn, A)
print(get_number_of_nodes(spn), "nodes, with depth of", get_depth(spn))
Now, let's try and sample from it to see how well it captures the covariance:
scope = A.get_scope(across_factors=True)
print(K.get_Σ()) # printing the true covariance
print(A.get_Σ()) # printing the true covariance
samples = spnhelp.sample_from_spn(spn, 10000) # sampling from the spn
cov = np.cov(samples, rowvar=False).round(1) # computing covariance of the samples
print(pd.DataFrame(cov, index=scope, columns=scope))
The covariance is pretty close to the true covariance. It would be better with a lower epsilon, but this is just a toy example and it grows exponentially in size with decreasing epsilons. The mean will be captured perfectly (not shown here).
One can approximate gaussians by a mixture of uniforms or slopyforms, both with disjoint support. The library allows one to give error bounds that the approximation should fall within, here are two kinds shown below. A slopyform is a uniform distribution with a likelihood proportional to the input.
Multiplying two of these approximations together gives us our first simple SPN of a factorized model, using the following Bayesian network:
A = "A" @ noise
B = "B" @ noise
A & B
The last three are constructed with the slopyform approximation, which can be seen to give a lot better results.
Let's introduce a dependence between the two variables in the following way: we can see that more components might be needs
A = "A" @ noise
B = "B" @ (0.5*A + noise)
For more examples of compiling LGPGMs and other interesting Bayesian networks see the included report (thesis.pdf
)
Footnotes
-
Why not just use a single all-encompassing multivariate leaf? Yes, you could make one big leaf to capture the entire joint distribution, after which we would have offloaded all the integration problems to our leaf. In the hopes of generalize to any continuous joint distribution, we will refrain from doing this. ↩
-
We call it an interesting distribution when there exists some pair of correlated variables. If they are all independent, the bayesian network would be a bunch of disjoint nodes, and the factorization would be uninteresting, and the first SPN that comes to mind would be trivial, $M(x_{1:n}) = \prod_{i=1}^n pdf_{x_i}(x_i)$ ↩