# Learning a Circuit

TODO: explain what are we going to do (high level)

In [1]:
import time
import numpy as np
import matplotlib.pyplot as plt

## Load MNIST Dataset

TODO: stress we can use any library to load data sets, and everything will work

Load the training and test splits of MNIST, and preprocess them by flattening the tensor images.

In [2]:
from torchvision import transforms, datasets
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: (255 * x.view(-1)).long())
])
data_train = datasets.MNIST('datasets', train=True, download=True, transform=transform)
data_test = datasets.MNIST('datasets', train=False, download=True, transform=transform)
num_variables = data_train[0][0].shape[0]
height, width = 28, 28
print(f"Number of variables: {num_variables}")

Number of variables: 784


## Constructing the Symbolic Circuit Representation

TODO: refactor this, explain symbolic circuits as an intermediate representation before compilation (link section)

From the region graph definition above, we now construct the symbolic circuit representation. Note that this circuit representation is _not_ executable, i.e., you cannot do learn it or do inference with it. It will be compiled later, by choosing a backend such as torch.

To do so, we first define the factories that will be used to construct symbolic layers. Note that we choose the parameterization at the symbolic level. That is, we guarantee non-negative parameters by passing them through an exponential function. Moreover, we can choose how to parameterize the categorical distributions used to model the distribution of pixel values in the 0-255 range. In this case, we use a log softmax function. We choose to initialize the weights of the circuit by sampling from a normal distribution.

In [3]:
from cirkit.templates import circuit_template

In [4]:
symbolic_circuit = circuit_template.image_data(
    (1, 28, 28),
    input_layer='categorical',
    num_input_units=32,
    sum_product_layer='cp',
    num_sum_units=32,
    sum_weight_param='softmax'
)

TODO: discuss structural properties

We can retrieve some information about the circuit and its structural properties as follows.

In [5]:
print(f'Smooth: {symbolic_circuit.is_smooth}')
print(f'Decomposable: {symbolic_circuit.is_decomposable}')
print(f'Number of variables: {symbolic_circuit.num_variables}')
print(f'Number of channels per variable: {symbolic_circuit.num_channels}')

Smooth: True
Decomposable: True
Number of variables: 784
Number of channels per variable: 1


## Compiling the Symbolic Circuit

TODO: explain compilation procedure, we choose the torch backend

We are ready to compile the symbolic circuit constructed above into another one that we can learn and/or do inference. To do so, we have to choose a compilation backend. In this case, we choose torch as a backend.

In [6]:
import random
import numpy as np
import torch
random.seed(42)
np.random.seed(42)
device = torch.device('cuda')  # The device to use
torch.manual_seed(42)
torch.cuda.manual_seed(42)

We first need to instantiate a circuit pipeline context and specify the backend to be used, as well as optional compilation flags, e.g., whether to fold the circuit or which inference semiring to use. Finally, we use the pipeline context to compile the symbolic circuit.

In [7]:
from cirkit.pipeline import compile

In [13]:
%%time
circuit = compile(symbolic_circuit)

# Print some statistics
print(f"Number of layers: {len(list(symbolic_circuit.layers))}")
num_parameters = sum(p.numel() for p in circuit.parameters() if p.requires_grad)
print(f"Number of learnable parameters: {num_parameters}")

Number of layers: 2097
Number of learnable parameters: 7491712
CPU times: user 600 μs, sys: 0 ns, total: 600 μs
Wall time: 493 μs


## Training and Testing

TODO: refactor this comment, stress the user can choose any optimizer

We are now ready to learn the parameters and do inference First, we wrap our data into PyTorch data loaders by specifying the batch size. Then, we initialize any PyTorch optimizer, e.g. SGD with momentum in this case.

In [11]:
from torch import optim
from torch.utils.data import DataLoader
train_dataloader = DataLoader(data_train, shuffle=True, batch_size=256, drop_last=True, num_workers=4)
test_dataloader = DataLoader(data_test, shuffle=False, batch_size=256, num_workers=4)
optimizer = optim.Adam(circuit.parameters(), lr=0.01)

In [12]:
# Move circuit to device
circuit = circuit.to(device)

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

In [10]:
start_time = time.perf_counter()
num_epochs = 3
step_idx = 0
running_loss = 0.0
for epoch_idx in range(num_epochs):
    for i, (batch, _) in enumerate(train_dataloader):
        batch = batch.to(device).unsqueeze(dim=1)   # Add a channel dimension
        log_likelihoods = circuit(batch)            # Compute the log output of the circuit
        loss = -torch.mean(log_likelihoods)         # The loss is the negative average log-likelihood
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        running_loss += loss.detach() * len(batch)
        step_idx += 1
        if step_idx % 100 == 0:
            print(f"Step {step_idx}: Average NLL: {running_loss / (100 * len(batch)):.3f}")
            running_loss = 0.0
end_time = time.perf_counter()
print(f"Training time: {end_time - start_time:.1f} seconds")

NameError: name 'train_dataloader' is not defined

We then evaluate our model on test data by computing the average log-likelihood and bits per dimension.

In [None]:
circuit.eval()
pf_circuit.eval()

with torch.no_grad():
    test_lls = 0.0
    log_pf = pf_circuit()  # Compute the log partition function of the circuit (just once as we are evaluating)
    for batch, _ in test_dataloader:
        batch = batch.to(device).unsqueeze(dim=1)   # Add a channel dimension
        log_output = circuit(batch)                 # Compute the log output of the circuit
        lls = log_output - log_pf                   # Compute the log-likelihood
        test_lls += lls.sum().item()
    average_ll = test_lls / len(data_test)
    bpd = -average_ll / (num_variables * np.log(2.0))
    print(f"Average test LL: {average_ll:.3f}")
    print(f"Bits per dimension: {bpd:.3f}")

TODO: show people we can do marginals, use integrate in cirkit.pipeline