# How-To: Add an Input Layer

This notebook introduces how to add a custom input layer. (To add sum/product layers, please open an issue to discuss).

- For users: The code may be added to anywhere in your project, just make sure you have proper imports.
- For developers: Please look at comments for each code block to decide where to add the code pieces.

A new layer requires a symbolic definition and the implementation(s) corresponding to the backend(s).

In this notebook, we will illustrate the process with `MyPolynomialLayer` and its `torch` backend, which is a replicate of `PolynomialLayer` in the library.

## Symbolic

In the symbolic part, we will have to:
- Add the definition of the layer;
- Decide the operators supported by this layer;
  - Identify the parameter operations required by the operators.

All the above will not involve any actual tensors, just the configs and shapes.

We can add as many operators as the layer supports, but for illustrative purposes, here we only illustrate with multiplication.

For operators the layer does not support, just leave it out and it will be properly handled.

### Layer

For layer definition, we should include any configs it needs, along with any parameter it includes. The parameter can be constructed from an optionally provided parameter or a factory, or falls back to default which is a new parameter with normal initialization (init may be changed by additional args to `_make_param`).

The basic set of methods to define for a layer consist of:
- `__init__`: A must in most cases. Defines how to instantiate this layer.
- `_{param}_shape`: One for each parameter, if any. Specifies the shape of the parameter.
- `config`: Must-have if `__init__` accepts any args other than `scope`, `num_output_units`, `num_channels` and the params. Should be appended with any configs of the layer to `super().config`.
- `params`: Must-have if the layer has any parameters. Includes all params in a dict.

[cirkit/symbolic/layers.py](../cirkit/symbolic/layers.py)

In [1]:
from typing import Any

from cirkit.symbolic.layers import InputLayer
from cirkit.symbolic.parameters import Parameter, ParameterFactory
from cirkit.utils.scope import Scope


class MyPolynomialLayer(InputLayer):
    def __init__(
        self,
        scope: Scope,
        num_output_units: int,
        num_channels: int,
        *,
        degree: int,
        coeff: Parameter | None = None,
        coeff_factory: ParameterFactory | None = None,
    ):
        if len(scope) != 1:
            raise ValueError("The Polynomial layer encodes a univariate distribution")
        if num_channels != 1:
            raise ValueError("The Polynomial layer encodes a univariate distribution")
        super().__init__(scope, num_output_units, num_channels)
        self.degree = degree
        coeff = self._make_param(coeff, coeff_factory, self._coeff_shape)
        if coeff.shape != self._coeff_shape:
            raise ValueError(f"Expected parameter shape {self._coeff_shape}, found {coeff.shape}")
        self.coeff = coeff

    @property
    def _coeff_shape(self) -> tuple[int, ...]:
        return self.num_output_units, self.degree + 1

    @property
    def config(self) -> dict[str, Any]:
        return {**super().config, "degree": self.degree}

    @property
    def params(self) -> dict[str, Parameter]:
        return {"coeff": self.coeff}

### Parameter Operation

After deciding which operator(s) we want to support, we must define the parameter operations the operator(s) need(s).

Since we are only looking at multiplication here, and the layer only has one parameter `coeff`, we only need to define one patameter operation.

As multiplication is a binary operator, we can inherit from `BinaryParameterOp` to make the best use of existing infrastructure. Alternatively, a more general `ParameterOp` class may be inherited.

The mininum definition should include the `shape` property which defines the output shape of this parameter operation.

Optionally, `__init__` can be redefined with customized instantiation behaviour, and `config` should include any additional args of `__init__`.

[cirkit/symbolic/parameters.py](../cirkit/symbolic/parameters.py)

In [2]:
from cirkit.symbolic.parameters import BinaryParameterOp


class MyPolynomialProduct(BinaryParameterOp):
    @property
    def shape(self) -> tuple[int, ...]:
        return (
            self.in_shapes[0][0] * self.in_shapes[1][0],  # dim Ko
            self.in_shapes[0][1] + self.in_shapes[1][1] - 1,  # dim deg+1
        )

    # -------- unnecessary in this case, directly use inherited --------

    # def __init__(self, in_shape1: tuple[int, ...], in_shape2: tuple[int, ...]):
    #     super().__init__(in_shape1, in_shape2)

    # @property
    # def config(self) -> dict[str, Any]:
    #     return {}

### Layer Operator

After the layer and param op have been defined, we can then define how an operator act on the layer by defining a rule function and registering it to the rules registry.

In order to share the underlying parameters across the operations, `param.ref()` should be passed to build the new parameter from the operators.

And then, the resulting new layer (or can be layers, if needed) should be wrapped in a `CircuitBlock` for return.

[cirkit/symbolic/operators.py](../cirkit/symbolic/operators.py)

In [None]:
from cirkit.symbolic.circuit import CircuitBlock
from cirkit.symbolic.layers import LayerOperator
from cirkit.symbolic.operators import DEFAULT_OPERATOR_RULES
from cirkit.symbolic.parameters import Parameter


def multiply_mypolynomial_layers(sl1: MyPolynomialLayer, sl2: MyPolynomialLayer) -> CircuitBlock:
    if sl1.scope != sl2.scope:
        raise ValueError(
            f"Expected Polynomial layers to have the same scope,"
            f" but found '{sl1.scope}' and '{sl2.scope}'"
        )
    if sl1.num_channels != sl2.num_channels:
        raise ValueError(
            f"Expected Polynomial layers to have the number of channels,"
            f"but found '{sl1.num_channels}' and '{sl2.num_channels}'"
        )

    coeff = Parameter.from_binary(
        MyPolynomialProduct(sl1.coeff.shape, sl2.coeff.shape),
        sl1.coeff.ref(),
        sl2.coeff.ref(),
    )

    sl = MyPolynomialLayer(
        sl1.scope,
        sl1.num_output_units * sl2.num_output_units,
        num_channels=sl1.num_channels,
        degree=sl1.degree + sl2.degree,
        coeff=coeff,
    )
    return CircuitBlock.from_layer(sl)


DEFAULT_OPERATOR_RULES[LayerOperator.MULTIPLICATION].append(multiply_mypolynomial_layers)

## Implementation with Backend

In the backend implementation, we will have to:
- Implement the actual computation for the layer and operator(s);
- Specify the rule that maps the implementation above with the symbolic layer/operator(s).

What has been provided in the symbolic part should has a corresponding implmentation with the backend, although the rules are actually what handles whether and how the symbolic representation is translated.

### `torch` Implementation - Layer

The layer will take in the actual `Tensor` for parameters and input, and should calculate the output `Tensor` in its `forward` function, as in the common practice of `torch`.

The basic set of methods to implement for a layer consist of:
- `__init__`: A must in most cases. Defines how to instantiate this layer. Note that `num_folds` is not specified here but handled automatically in the pipeline, while `num_variables` is implicitly provided as `scope_idx.shape[-1]`.
- `_valid_{param}_shape`: Not requied but recommended. Checks if the parameter has correct shape and folding.
- `fold_settings`: A must in most cases. Contains a shape that helps to decide which layers can be folded (same shape can be stacked). Should be appended with any extra shap (from non-default args as in `config`) that may affect folding, but no need to duplicate.
- `config`: Must-have if `__init__` accepts any args other than `scope_idx`, `num_output_units`, `num_channels`, `semiring` and the params. Should append any configs of the layer to `super().config`.
- `params`: Must-have if the layer has any parameters. Includes all params in a dict.
- `forward`: Must-have in all cases. Defines the actually computation of this layer. It must follow the protocol:
  - The input is the value that the circuit receives, sliced to the corresponding scope, with shape `(fold, channel, batch, variable)`. Note that for simplictiy layers always accept only one batch dimension, while multi batch dim is handled at the circuit level;
  - The output is the value in the space defined by the specified semi-ring, with shape `(fold, batch, output_unit)`.
- `integrate`: TODO: why we need it? what's the protocol?

[cirkit/backend/torch/layers/input.py](../cirkit/backend/torch/layers/input.py)

In [4]:
from torch import Tensor

from cirkit.backend.torch.layers.input import TorchInputLayer, polyval
from cirkit.backend.torch.parameters.parameter import TorchParameter
from cirkit.backend.torch.semiring import Semiring, SumProductSemiring

# The same implementation of the imported polyval.

# def polyval(coeff: Tensor, x: Tensor) -> Tensor:
#     """Evaluate polynomial given coefficients and point, with the shape for PolynomialLayer.

#     Args:
#         coeff (Tensor): The coefficients of the polynomial, shape (F, Ko, deg+1).
#         x (Tensor): The point of the variable, shape (F, H, B, Ki), where H=Ki=1.

#     Returns:
#         Tensor: The value of the polymonial, shape (F, B, Ko).
#     """
#     x = x.squeeze(dim=1)  # shape (F, H=1, B, Ki=1) -> (F, B, 1).
#     y = x.new_zeros(*x.shape[:-1], coeff.shape[-2])  # shape (F, B, Ko).

#     for a_n in reversed(coeff.unbind(dim=2)):  # Reverse iterator of the degree axis, shape (F, Ko).
#         # a_n shape (F, Ko) -> (F, 1, Ko).
#         y = torch.addcmul(a_n.unsqueeze(dim=1), x, y)  # y = a_n + x * y, by Horner's method.
#     return y  # shape (F, B, Ko).


class TorchMyPolynomialLayer(TorchInputLayer):
    def __init__(
        self,
        scope_idx: Tensor,
        num_output_units: int,
        *,
        num_channels: int = 1,
        degree: int,
        coeff: TorchParameter,
        semiring: Semiring | None = None,
    ) -> None:
        num_variables = scope_idx.shape[-1]
        if num_variables != 1:
            raise ValueError("The Polynomial layer encodes a univariate distribution")
        if num_channels != 1:
            raise ValueError("The Polynomial layer encodes a univariate distribution")
        super().__init__(
            scope_idx,
            num_output_units,
            num_channels=num_channels,
            semiring=semiring,
        )
        self.degree = degree
        if not self._valid_parameters_shape(coeff):
            raise ValueError("The number of folds and shape of 'coeff' must match the layer's")
        self.coeff = coeff

    def _valid_parameters_shape(self, p: TorchParameter) -> bool:
        if p.num_folds != self.num_folds:
            return False
        return p.shape == (self.num_output_units, self.degree + 1)

    @property
    def fold_settings(self) -> tuple[Any, ...]:
        return *super().fold_settings, self.degree + 1

    @property
    def config(self) -> dict[str, Any]:
        return {**super().config, "degree": self.degree}

    @property
    def params(self) -> dict[str, TorchParameter]:
        return {"coeff": self.coeff}

    def forward(self, x: Tensor) -> Tensor:
        """Run forward pass.

        Args:
            x (Tensor): The input to this layer, shape (F, H=C, B, Ki=D).

        Returns:
            Tensor: The output of this layer, shape (F, B, Ko).
        """
        coeff = self.coeff()  # shape (F, Ko, dp1)
        return self.semiring.map_from(polyval(coeff, x), SumProductSemiring)

    def integrate(self) -> Tensor:
        raise TypeError("Cannot integrate a PolynomialLayer")

### `torch` Implementation - Operator

The `torch` version of operators also provides a `TorchBinaryParameterOp` for easier implementation, with `TorchParameterOp` for more customization.

The minimal implementation can include only the `shape` of output parameter, and the `forward` that transforms the input parameter(s) to the output.

And optionally, `__init__` may be defined to contain a customized instantiation, with `config` containing additional args and `fold_settings` contaning any additional shapes.

[cirkit/backend/torch/parameters/nodes.py](../cirkit/backend/torch/parameters/nodes.py)

In [5]:
import torch

from cirkit.backend.torch.parameters.nodes import TorchBinaryParameterOp


class TorchMyPolynomialProduct(TorchBinaryParameterOp):
    @property
    def shape(self) -> tuple[int, ...]:
        return (
            self.in_shapes[0][0] * self.in_shapes[1][0],  # dim K
            self.in_shapes[0][1] + self.in_shapes[1][1] - 1,  # dim dp1
        )

    def forward(self, coeff1: Tensor, coeff2: Tensor) -> Tensor:
        if coeff1.is_complex() or coeff2.is_complex():
            fft = torch.fft.fft
            ifft = torch.fft.ifft
        else:
            fft = torch.fft.rfft
            ifft = torch.fft.irfft

        degp1 = coeff1.shape[-1] + coeff2.shape[-1] - 1  # deg1p1 + deg2p1 - 1 = (deg1 + deg2) + 1.

        spec1 = fft(coeff1, n=degp1, dim=-1)  # shape (F, K1, dp1).
        spec2 = fft(coeff2, n=degp1, dim=-1)  # shape (F, K2, dp1).

        # shape (F, K1, 1, dp1), (F, 1, K2, dp1) -> (F, K1, K2, dp1) -> (F, K1*K2, dp1).
        spec = torch.flatten(
            spec1.unsqueeze(dim=2) * spec2.unsqueeze(dim=1), start_dim=1, end_dim=2
        )

        return ifft(spec, n=degp1, dim=-1)  # shape (F, K1*K2, dp1).

    # -------- unnecessary in this case, directly use inherited --------

    # def __init__(
    #     self, in_shape1: tuple[int, ...], in_shape2: tuple[int, ...], *, num_folds: int = 1
    # ) -> None:
    #     super().__init__(in_shape1, in_shape2, num_folds=num_folds)

    # @property
    # def fold_settings(self) -> tuple[Any, ...]:
    #     return super().fold_settings

    # @property
    # def config(self) -> dict[str, Any]:
    #     return {}

### Rules

Now we need to register the mapping between the torch implementations with their symbolic conterparts. It should be simple to define in most cases.

Note that each backend has its own registry instead of one large dict for everything.

[cirkit/backend/torch/rules/layers.py](../cirkit/backend/torch/rules/layers.py)

In [6]:
from cirkit.backend.torch.compiler import TorchCompiler
from cirkit.backend.torch.rules.layers import DEFAULT_LAYER_COMPILATION_RULES


def compile_polynomial_layer(
    compiler: TorchCompiler, sl: MyPolynomialLayer
) -> TorchMyPolynomialLayer:
    coeff = compiler.compile_parameter(sl.coeff)
    return TorchMyPolynomialLayer(
        torch.tensor(tuple(sl.scope)),
        sl.num_output_units,
        num_channels=sl.num_channels,
        degree=sl.degree,
        coeff=coeff,
        semiring=compiler.semiring,
    )


DEFAULT_LAYER_COMPILATION_RULES.update({MyPolynomialLayer: compile_polynomial_layer})

[cirkit/backend/torch/rules/parameters.py](../cirkit/backend/torch/rules/parameters.py)

In [None]:
from cirkit.backend.torch.rules.parameters import DEFAULT_PARAMETER_COMPILATION_RULES


def compile_polynomial_product(
    compiler: TorchCompiler, p: MyPolynomialProduct
) -> TorchMyPolynomialProduct:
    return TorchMyPolynomialProduct(*p.in_shapes)


DEFAULT_PARAMETER_COMPILATION_RULES.update({MyPolynomialProduct: compile_polynomial_product})