ProbabilisticCircuits.jl offers various parameter and structure learning algorithms for PCs. In this example, we will demonstrate how to generate a particular PC model termed Hidden Chow-Liu Tree (HCLT) and use it to learn a state-of-the-art generative model on MNIST. 

We start by importing ProbabilisticCircuits.jl and other required packages:

In [1]:
using ProbabilisticCircuits
using MLDatasets
using CUDA

┌ Info: Precompiling ProbabilisticCircuits [2396afbe-23d7-11ea-1e05-f1aa98e17a44]
└ @ Base loading.jl:1423


We first load the MNIST dataset from MLDatasets.jl and move them to GPU:

In [2]:
mnist_train_cpu = collect(transpose(reshape(MNIST.traintensor(UInt8), 28*28, :)))
mnist_test_cpu = collect(transpose(reshape(MNIST.testtensor(UInt8), 28*28, :)))
mnist_train_gpu = cu(mnist_train_cpu)
mnist_test_gpu = cu(mnist_test_cpu)
println("Dataset summary:\n - Number of training examples: $(size(mnist_train_cpu, 1))\n - Number of test examples: $(size(mnist_test_cpu, 1))\n - Number of features: $(size(mnist_train_cpu, 2))")

Dataset summary:
 - Number of training examples: 60000
 - Number of test examples: 10000
 - Number of features: 784


We move on to generate the HCLT structure. `hclt` constructs a smooth and structured-decomposable PC whose structure depends on the input samples. Specifically, it computes the pairwise mutual information (MI) between the MNIST features (i.e., pixels), and use the pairwise MI matrix to determine the PC structure, such that highly correlated features are placed "closer" in the PC to facilitate learning. In the following, `bits` is the number of bits to truncate to speedup the pairwise MI computation, and `latents` specifies the size of the generated HCLT.

In [3]:
bits = 4
latents = 32
println("Generating HCLT structure with $latents latents... ");
trunc_train = cu(mnist_train_cpu .÷ 2^bits)
@time pc = hclt(trunc_train, latents; num_cats = 256, pseudocount = 0.1, input_type = Categorical)
init_parameters(pc; perturbation = 0.4)
println("Number of free parameters: $(num_parameters(pc))")

Generating HCLT structure with 32 latents... 
 24.339145 seconds (81.65 M allocations: 6.668 GiB, 5.43% gc time, 49.14% compilation time)
Number of free parameters: 6980767


To facilitate efficient parameter learning on GPUs, we first convert `pc` into an equivalent GPU-friendly low-level representation termed bits-circuit:

In [4]:
print("Moving pc to GPU... ")
CUDA.@time bpc = CuBitsProbCircuit(pc);

Moving pc to GPU...   1.961610 seconds (17.83 M CPU allocations: 1.218 GiB, 7.18% gc time) (7 GPU allocations: 76.784 MiB, 0.00% memmgmt time)


We are now ready to train the parameters of the PC. This is done by calling the high-level API `mini_batch_em`:

In [5]:
num_epochs        = 100
batch_size        = 512
pseudocount       = 0.1
param_inertia     = 0.2 # Equivalent to 1-[minibatch stepsize]
param_inertia_end = 0.9 # If specified, param_inertia will be annealed linearly during training

@time mini_batch_em(bpc, mnist_train_gpu, num_epochs; batch_size, pseudocount, 
                    param_inertia, param_inertia_end);

Mini-batch EM iter 1; train LL -892.6227
Mini-batch EM iter 2; train LL -824.05615
Mini-batch EM iter 3; train LL -817.87933
Mini-batch EM iter 4; train LL -814.094
Mini-batch EM iter 5; train LL -810.924
Mini-batch EM iter 6; train LL -808.42163
Mini-batch EM iter 7; train LL -806.13763
Mini-batch EM iter 8; train LL -804.1805
Mini-batch EM iter 9; train LL -802.2172
Mini-batch EM iter 10; train LL -800.28815
Mini-batch EM iter 11; train LL -798.7425
Mini-batch EM iter 12; train LL -797.2427
Mini-batch EM iter 13; train LL -795.4564
Mini-batch EM iter 14; train LL -794.0235
Mini-batch EM iter 15; train LL -792.53564
Mini-batch EM iter 16; train LL -791.11176
Mini-batch EM iter 17; train LL -789.676
Mini-batch EM iter 18; train LL -788.25336
Mini-batch EM iter 19; train LL -786.91315
Mini-batch EM iter 20; train LL -785.5561
Mini-batch EM iter 21; train LL -784.1423
Mini-batch EM iter 22; train LL -782.6588
Mini-batch EM iter 23; train LL -781.3976
Mini-batch EM iter 24; train LL -780.

Now we evaluate the trained PC:

In [6]:
train_ll = loglikelihood(bpc, mnist_train_gpu; batch_size)
test_ll = loglikelihood(bpc, mnist_test_gpu; batch_size)
println("Train_ll: $(train_ll)\nTest LL: $(test_ll)")

Train_ll: -663.66864
Test LL: -672.6025


Finally, we want to copy back the learned parameters from the bit circuit `bpc` to the original PC `pc`:

In [7]:
update_parameters(bpc)