# Conditional Circuits

Let's assume we want to parameterize a circuit by means of a neural network, i.e., build and learn a _conditional circuit_. We can do so in cirkit in three steps:
1. we instantiate the symbolic circuit we want to parameterize;
2. we call a functional that takes the symbolic circuit and returns another one that contains the additional information for the parameterization we want;
3. we compile the symbolic circuit by firstly registering the parameterization to the compiler.

We start by instantiating a symbolic circuit on MNSIT images.

In [1]:
import sys
sys.path.insert(0, "../")

In [2]:
from cirkit.templates import data_modalities, utils

symbolic_circuit = data_modalities.image_data(
    (1, 28, 28),                 # The shape of MNIST image, i.e., (num_channels, image_height, image_width)
    region_graph='quad-tree-4',  # Select the structure of the circuit to follow the QuadTree-4 region graph
    input_layer='categorical',   # Use Categorical distributions for the pixel values (0-255) as input layers
    num_input_units=64,          # Each input layer consists of 64 Categorical input units
    sum_product_layer='cp',      # Use CP sum-product layers, i.e., alternate dense layers with Hadamard product layers
    num_sum_units=64,            # Each dense sum layer consists of 64 sum units
)

Note that we did not specify any parameterization for the sum layer parameters and the logits of the Categorical input layers.

Then, we call the functional ```cirkit.symbolic.functional.model_parameterize``` to obtain another symbolic circuit that stores the additional information on how we want to parameterize it.

In [3]:
import cirkit.symbolic.functional as SF
from cirkit.symbolic.layers import CategoricalLayer, SumLayer

parametrization_map = {
    "sum-layers": list(symbolic_circuit.sum_layers)
}

# Assume there exists a model called "my-neural-network" we will specify at compile-time later
symbolic_conditional_circuit = SF.condition_circuit(
    symbolic_circuit,                          # The symbolic circut we want to parameterize
    gate_functions=parametrization_map
)

The ```model_parameterize``` functional also returns the shapes of the tensors to parameterize.

In [4]:
# The parameterize function returned the shape specfication of the tensors we will need to return
gf_specs = symbolic_conditional_circuit.gate_function_specs
gf_specs

{'sum-layers.weight.0': (1048, 64, 64), 'sum-layers.weight.1': (1, 1, 64)}

Before compiling our conditional circuit, we define the gating function. As long as the gating function outputs a tensor compatible with the shape specified by the gating functions specifications, they can be any arbitrary function.

Note that the function is responsible for providing valid parameters For instance, we have to make sure that for each sum parameter the sum of its weights is $1$.

Let's first parametrize the sum layers of the circuit by randomly sampling their weights and normalizing using a softmax activation. To do so, we define gating functions that take as input an external tensor, say `z`, and outputs tensors with shapes compatible with the specifications.

In [5]:
import torch
from functools import partial

def random_sum_weights(shape, z: torch.Tensor):
    # compute the mean and standard deviation of all the elements in the batch
    mean, stddev = torch.mean(z, dim=-1), torch.std(z, dim=-1)
    # compute weights by randomly sampling
    samples = torch.randn(*shape)
    weight = mean.view(-1, 1, 1, 1) + stddev.view(-1, 1, 1, 1) * samples
    # normalize weights using softmax
    return torch.softmax(weight, dim=-1)

# test that the function outputs proper weights
weights = random_sum_weights(
    (3, *symbolic_conditional_circuit.gate_function_specs["sum-layers.weight.0"]), 
    torch.randn(3, 256)
)

print("Weights shape:", weights.shape)
print("Weight are normalized:", weights.sum(dim=-1).allclose(torch.tensor([1.0])))

Weights shape: torch.Size([3, 1048, 64, 64])
Weight are normalized: True


We can now register the gating functions on the compiler, which will take care of compiling the conditional circuit, keep track of which function to call and execute them efficiently.

In [13]:
from cirkit.pipeline import PipelineContext

# Initialize an pipeline compilation context
# Let's try _without_ folding first
ctx = PipelineContext(semiring="lse-sum", backend='torch', fold=False, optimize=False)

# Register our neural network as an external model
ctx.add_gate_function("sum-layers.weight.0", random_sum_weights)
ctx.add_gate_function("sum-layers.weight.1", random_sum_weights)

# Finally, we compile the conditional circuit
circuit = ctx.compile(symbolic_conditional_circuit)

And evaluate the conditional circuit by specifying the argument for each gating function.

In [15]:
x = torch.randint(256, size=(10, 784))  # The circuit input
z = torch.randn(size=(10, 127))  # Some dummy input to the neural net

# Evaluate the circuit on some input
# Note that we also pass some input to the external model
circuit(x, gate_function_kwargs={'sum-layers.weight.0': {'z': z}, 'sum-layers.weight.1': {'z': z}})

tensor([[[-4357.7739]],

        [[-4355.7095]],

        [[-4356.8057]],

        [[-4352.9370]],

        [[-4356.2651]],

        [[-4353.3125]],

        [[-4371.3247]],

        [[-4357.3862]],

        [[-4361.4160]],

        [[-4363.3970]]], grad_fn=<TransposeBackward0>)

The above parameterization is robust to change in compilation flages, e.g., now enabling folding and layer optimizations.

In [16]:
# folding and optimization is enabled
ctx = PipelineContext(semiring="lse-sum", backend='torch', fold=True, optimize=True)

ctx.add_gate_function("sum-layers.weight.0", random_sum_weights)
ctx.add_gate_function("sum-layers.weight.1", random_sum_weights)

circuit = ctx.compile(symbolic_conditional_circuit)

And evaluate it just like a regular conditional circuit.

In [18]:
circuit(x, gate_function_kwargs={'sum-layers.weight.0': {'z': z}, 'sum-layers.weight.1': {'z': z}})

tensor([[[-4361.3394]],

        [[-4360.3687]],

        [[-4360.0630]],

        [[-4359.3193]],

        [[-4358.3257]],

        [[-4350.2729]],

        [[-4351.1704]],

        [[-4351.4927]],

        [[-4362.0557]],

        [[-4356.7437]]], grad_fn=<TransposeBackward0>)

The conditional parametrization is batch-dependant: for each batch we independently parametrize the model. To see this, let's change the gate function for sums such that it parametrizes all sum layer with all zero weigths only for the first element in the batch. Intuitively, we should see the circuit producing a *stange* likelihood on the first batch and working regularly on the other.

In [26]:
def random_sum_weights_zero_first_sample(shape, z: torch.Tensor):
    # compute the mean and standard deviation of all the elements in the batch
    mean, stddev = torch.mean(z, dim=-1), torch.std(z, dim=-1)
    # compute weights by randomly sampling
    samples = torch.randn(*shape)
    weight = mean.view(-1, 1, 1, 1) + stddev.view(-1, 1, 1, 1) * samples
    # normalize weights using softmax
    weight = torch.softmax(weight, dim=-1)
    
    # set first element in batch to 0
    weight[0] = 0.0

    return weight

# register the new gate function and compile the circuit
ctx = PipelineContext(semiring="lse-sum", backend='torch', fold=True, optimize=True)
ctx.add_gate_function("sum-layers.weight.0", random_sum_weights_zero_first_sample)
ctx.add_gate_function("sum-layers.weight.1", random_sum_weights_zero_first_sample)
circuit = ctx.compile(symbolic_conditional_circuit)

# run the circuit on the same dummy inputs
circuit(x, gate_function_kwargs={'sum-layers.weight.0': {'z': z}, 'sum-layers.weight.1': {'z': z}})

tensor([[[      -inf]],

        [[-4359.3535]],

        [[-4359.1943]],

        [[-4360.5337]],

        [[-4359.6562]],

        [[-4358.2544]],

        [[-4357.5913]],

        [[-4352.2979]],

        [[-4360.2314]],

        [[-4361.7720]]], grad_fn=<TransposeBackward0>)

Indeed, the first batch evaluates to a negative log likelihood equal to $-\infty$.