# Example: Optimizing over trained graph neural networks

This notebook gives examples where OMLT is used to optimize over trained graph neural networks (GNNs). We follow the below steps:

1.) A general definition of GNNs is provided. For any GNN that fits our definition, it could be transformed into a Dense NN and then exported into OMLT.

2.) We give an example to show how to transform a GNN into a Dense NN. For simplicity, we skip the training process and just use random parameters.

3.) OMLT is used to generate a mixed-interger encoding of the trained GNN. 

4.) We consider two cases: one has fixed graph structure, another one has non-fixed graph structure. For each case, the output of the GNN is minimized.


## Library Setup

This notebook assumes you have a working PyTorch environment to define a Dense NN. This Dense NN is then formulated in Pyomo using OMLT which therefore requires working Pyomo and OMLT installations.

The required Python libraries used in this notebook are as follows:

- `numpy`: used for transformation of parameters

- `torch`: the machine learning language we use to define our Dense NN

- `pyomo`: the algebraic modeling language for Python, it is used to define the optimization model passed to the solver

- `onnx`: used to express trained neural network models

- `omlt`: the package this notebook demonstates. OMLT can formulate machine learning (such as neural networks) within Pyomo

**NOTE:** This notebbook alse assumes you have a working MIP solver executable to solve optimization problems in Pyomo. The open-source solver CBC is called by default. 


## Definition of GNNs

We define a GNN with $L$ layers as follows:

   \begin{equation*}
		\begin{aligned}
			GNN:\underbrace{\mathbb R^{d_0}\otimes\cdots\otimes\mathbb R^{d_0}}_{n \rm{times}}\to\underbrace{\mathbb R^{d_L}\otimes\cdots\otimes\mathbb R^{d_L}}_{n\ \rm{times}}
		\end{aligned}
	\end{equation*}
    
where $V$ is the set of nodes of the input graph, $n=|V|$ is the number of nodes. 

Let $\mathbf{x}_v^{(0)} \in \mathbb{R}^{d_0}$ be the input features for node $v$. Then, the $l$-th layer ($l=1,2,\dots,L$) is defined by:
	\begin{equation*}
		\begin{aligned}
			\mathbf{x}_v^{(l)}=\sigma\left(\sum\limits_{u\in\mathcal N(v)\cup\{v\}}\mathbf{w}_{u\to v}^{(l)}\mathbf{x}_u^{(l-1)}+\mathbf{b}_{v}^{(l)}\right),~\forall v\in V
		\end{aligned}
	\end{equation*}
where $\mathcal N(v)$ is the set of all neighbors of $v$, $\sigma$ could be identity or any activation function.

*Dimensionality:* $\mathbf{x}_u^{(l-1)}\in\mathbb R^{d_{l-1}}, \mathbf{x}_v^{(l)},\mathbf{b}_v^{(l)}\in\mathbb R^{d_l}, \mathbf{w}_{u\to v}^{(l)}\in\mathbb R^{d_l}\times \mathbb R^{d_{l-1}}$.

Stack $\{\mathbf{x}_v^{(l)}\}_{v\in V}$ as a vector $\mathbf{X}^{(l)}\in \mathbb R^{nd_l}$. Rewrite previous definition as:
    \begin{equation*}
        \begin{aligned}
            \mathbf{X}^{(l)}=\sigma\left(\mathbf{W}^{(l)}\mathbf{X}^{(l-1)}+\mathbf{B}^{(l)}\right)
        \end{aligned}
    \end{equation*}
where $\mathbf{W}^{(l)}\in\mathbb R^{nd_[\times nd_{l-1}}$ is a sparse matrix with nonzero sub-matrices $\{\mathbf{w}_{u\to v}^{(l)}\}_{v\in V,u\in\mathcal N(v)\cup\{v\}}$ and $\mathbf{B}^{(l)}\in\mathbb R^{nd_l}$ is the stack of $\{\mathbf{b}_v^{(l)}\}_{v\in V}$.

If the input graph structure is fixed, then weights ($\mathbf{w}_{u\to v}^{(l)}$), bias ($\mathbf{b}_{v}^{(l)}$), and links between layers (determined by $\mathcal N(v)$) are all fixed after the GNN is trained. In this case, the second definition is equivalent to a dense layer. It suffices to define a Dense NN with weights $\mathbf{W}^{(l)}$ and bias $\mathbf{B}^{(l)}$. 


## Formulating Trained GNNs with OMLT: Fixed Graph Structure


### Import Requisite Packages 

In [12]:
# parameters manipulation
import numpy as np
import tempfile

# pytorch for defining Dense NN
import torch 
import torch.nn as nn

# pyomo for optimization
import pyomo.environ as pyo

# omlt for interfacing our neural network with pyomo
from omlt import OmltBlock
from omlt.neuralnet import ReluBigMFormulation
from omlt.neuralnet.network_definition import gnn_layer_definition
from omlt.io.onnx import write_onnx_model_with_bounds, load_onnx_neural_network_with_bounds

### Constrcut a GNN with Random Parameters

We use a simple GNN as an example, which consists of a GraphSAGE layer, an add pooling layer, and a dense layer with single output. Let the input and output features of the GraphSAGE layer are 2 and 3, respectively. 

The GraphSAGE layer is defined by:
    \begin{equation*}
       \mathbf{x}_v^{(l)}=\sigma\left(\mathbf{w_1}^{(l)}\mathbf{x}_v^{(l-1)}+\mathbf{w_2}^{(l)}\sum\limits_{u\in\mathcal N(v)}\mathbf{x}_u^{(l-1)}+\mathbf{b}^{(l)}\right)
    \end{equation*}
where a sum aggregation is used.

For the fixed graph structure, assume that it is a line graph with $N=3$ nodes, i.e., the adjacency matrix $A=\begin{pmatrix}1 & 1 & 0\\1 & 1 & 1\\ 0 & 1 & 1\end{pmatrix}$.

In [13]:
# graph structure
# number of nodes
N = 3
# adjacency matrix
A = np.array([[1,1,0],[1,1,1],[0,1,1]])

# in/out features
in_features = 2
out_features = 3

# architecture of GNN
# sage: in_features to out_features for each node, with ReLU as activation
# add_pool: read out, sum out_features of each node
# dense: out_features to 1
gnn_layers = ['sage', 'add_pool', 'dense']
activations = [True, False, False]

# randomly generate GNN parameters from (-1,1)
# in practice, these paprameters should be extracted from the trained GNN
np.random.seed(123)
sage_w1 = 2.* np.random.rand(out_features, in_features) -1.
sage_w2 = 2.* np.random.rand(out_features, in_features) -1.
sage_b = 2. * np.random.rand(out_features) - 1.

dense_w = 2.* np.random.rand(1, out_features) - 1.
dense_b = 2.* np.random.rand(1) - 1.

### Transforming a GNN into a Dense NN

The GraphSAGE layer could be rewritten as a dense layer with parameters:

   \begin{equation*}
        \mathbf{W}=\begin{pmatrix}
            \mathbf{w_1} & \mathbf{w_2} & \mathbf{0} \\
            \mathbf{w_2} & \mathbf{w_1} & \mathbf{w_2} \\
            \mathbf{0} & \mathbf{w_2} & \mathbf{w_1} \\
        \end{pmatrix},
        \mathbf{B}=\begin{pmatrix}
        \mathbf{b}\\\mathbf{b}\\\mathbf{b}
        \end{pmatrix}
    \end{equation*}
    
It is straightforward to rewritte the add pooling layer into a dense layer. See the following code for details.

See below as a mapping between a GNN and a Dense NN with format "layer (in_channel, out_channel)":

\begin{equation*}
    \begin{aligned}
        \text{GraphSAGE(2, 3)}   &\Rightarrow \text{dense(6, 9)}\\
        \text{add pooling(9, 3)}  &\Rightarrow \text{dense(9, 3)}\\
        \text{dense(3, 1)}      &\Rightarrow \text{dense(3, 1)}
    \end{aligned}
\end{equation*}


In [14]:
# transform a sage layer to dense layer
# N is the number of nodes
# w1,w2,b are parameters in a sage layer
def SAGE_to_Dense(N, A, w1, w2, b):
    out_channel, in_channel = w1.shape
    weight = np.zeros((N*out_channel, N*in_channel))
    bias = np.zeros(N*out_channel)
    for u in range(N):
        for v in range(N):
            if u == v:
                weight[u*out_channel:(u+1)*out_channel, v*in_channel:(v+1)*in_channel] = w2
            else:
                weight[u*out_channel:(u+1)*out_channel, v*in_channel:(v+1)*in_channel] = w1 * A[u,v]
        bias[u*out_channel:(u+1)*out_channel] = b
    return weight, bias

params = []
channels = []
channels.append(N * in_features)

for layer in gnn_layers:
    if layer == 'sage':
        params.append(SAGE_to_Dense(N,A,sage_w1,sage_w2,sage_b))
        channels.append(sage_w1.shape[0] * N)
    elif layer == 'dense':
        params.append((dense_w,dense_b))
        channels.append(w.shape[0])
    elif layer == 'add_pool':
        channels.append(channels[-1] // N)
        w = np.zeros((channels[-1],channels[-2]))
        for i in range(channels[-1]):
            for j in range(N):
                w[i, i+j*channels[-1]] = 1.
        b = np.zeros(channels[-1])
        params.append((w,b))

class PyTorchModel(nn.Module):
    def __init__(self, L, params, activations):
        super().__init__()
        layers = []
        for l in range(L):
            layers.append(nn.Linear(params[l][0].shape[1], params[l][0].shape[0]))
            layers[-1].weight = nn.Parameter(torch.tensor(params[l][0], dtype=torch.float64))
            layers[-1].bias = nn.Parameter(torch.tensor(params[l][1], dtype=torch.float64))
            if activations[l]:
                layers[-1].relu = nn.ReLU()
        self.layer = nn.Sequential(*layers)
        
    def forward(self, x):
        x = self.layer(x)
        return x

model_dense = PyTorchModel(len(channels)-1, params, activations)
print(model_dense)
# for param in model_dense.parameters():
#     print(param)

PyTorchModel(
  (layer): Sequential(
    (0): Linear(
      in_features=6, out_features=9, bias=True
      (relu): ReLU()
    )
    (1): Linear(in_features=9, out_features=3, bias=True)
    (2): Linear(in_features=3, out_features=1, bias=True)
  )
)


### Build a MIP Formulation and Solve the Optimization Problem

We can now export the PyTorch model as an ONNX model and use `load_onnx_neural_network_with_bounds` to load it into OMLT.

In [15]:
dummy_input = torch.zeros(channels[0], dtype=torch.float64)
dummy_input.requires_grad=True
input_bounds = [(-1., 1.) for _ in range(channels[0])]

with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:
    #export neural network to ONNX
    torch.onnx.export(
        model_dense,
        dummy_input,
        f,
        input_names=['input'],
        output_names=['output'],
    )
    #write ONNX model and its bounds using OMLT
    write_onnx_model_with_bounds(f.name, None, input_bounds)
    #load the network definition from the ONNX model
    network_definition = load_onnx_neural_network_with_bounds(f.name)

As a sanity check before creating the optimization model, we can print the properties of the neural network layers from `network_definition`. This allows us to check input/output sizes, as well as activation functions.

In [16]:
for layer_id, layer in enumerate(network_definition.layers):
    print(f"{layer_id}\t{layer}\t{layer.activation}")

0	InputLayer(input_size=[6], output_size=[6])	linear
1	DenseLayer(input_size=[6], output_size=[9])	linear
2	DenseLayer(input_size=[9], output_size=[3])	linear
3	DenseLayer(input_size=[3], output_size=[1])	linear


Finally, we can load `network_definition` as a full-space `ReluBigMFormulation` object.

In [17]:
formulation = ReluBigMFormulation(network_definition)

We now encode the Dense NN in a Pyomo model from the `ReluBigMFormulation` object.

In [18]:
# create pyomo model
m = pyo.ConcreteModel()

# create an OMLT block for the neural network and build its formulation
m.nn = OmltBlock()
m.nn.build_formulation(formulation)

Next, we define the objective function as the single output of the Dense NN and solve the minimization problem using a mixed integer solver.

In [19]:
m.obj = pyo.Objective(expr=(m.nn.outputs[0]))
pyo.SolverFactory('cbc').solve(m, tee=True)

Welcome to the CBC MILP Solver 
Version: 2.10.5 
Build Date: Dec  8 2020 

command line - /rds/general/user/sz421/home/anaconda3/envs/OMLT/bin/cbc -printingOptions all -import /var/tmp/pbs.7796016.pbs/tmpazbklmrp.pyomo.lp -stat=1 -solve -solu /var/tmp/pbs.7796016.pbs/tmpazbklmrp.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
Presolve 1 (-39) rows, 7 (-39) columns and 7 (-114) elements
Statistics for presolved model


Problem has 1 rows, 7 columns (7 with objective) and 7 elements
There are 7 singletons with objective 
Column breakdown:
0 of type 0.0->inf, 0 of type 0.0->up, 0 of type lo->inf, 
7 of type lo->up, 0 of type free, 0 of type fixed, 
0 of type -inf->0.0, 0 of type -inf->up, 0 of type 0.0->1.0 
Row breakdown:
0 of type E 0.0, 0 of type E 1.0, 0 of type E -1.0, 
1 of type E other, 0 of type G 0.0, 0 of type G 1.0, 
0 of type G other, 0 of type L 0.0, 0 of type L 1.0, 
0 of type L other, 0 of type Range 0.0->1.0, 0 of type Range other, 
0 

{'Problem': [{'Name': 'unknown', 'Lower bound': -1.719054146, 'Upper bound': -1.719054146, 'Number of objectives': 1, 'Number of constraints': 40, 'Number of variables': 46, 'Number of nonzeros': 7, 'Sense': 'minimize'}], 'Solver': [{'Status': 'ok', 'User time': -1.0, 'System time': 0.0, 'Wallclock time': 0.0, 'Termination condition': 'optimal', 'Termination message': 'Model was solved to optimality (subject to tolerances), and an optimal solution is available.', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': None, 'Number of created subproblems': None}, 'Black box': {'Number of iterations': 1}}, 'Error rc': 0, 'Time': 0.033590078353881836}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

## Formulating Trained GNNs with OMLT: Non-fixed Graph Structure

When the input graph structure is not fixed, elements in the adjacency matrix $A$ are decision variables. In this case, $\mathcal N(v)$ is not given anymore. Additionally, $\mathbf{w}_{u\to v}^{(l)},\mathbf{b}_v^{(l)}$ may contain the graph information, which makes them be variables.

Assume that $\mathbf{w}_{u\to v}^{(l)},\mathbf{b}_v^{(l)}$ are fixed. Then we can derive a big-M formulation to handle GNN layers with non-fixed graph structure.

Observe that the existence of edge $u\to v$ determines the contribution link from $\mathbf{x}_u^{(l-1)}$ to $\mathbf{x}_v^{(l)}$. Adding binary variables $A_{u,v}$ for all $u,v\in V$, we can formulate GNNs in a bilinear way:
\begin{equation*}
    \begin{aligned}
        \mathbf{x}_v^{(l)}=\sigma\left(\sum\limits_{u\in V}A_{u,v}\mathbf{w}_{u\to v}^{(l)}\mathbf{x}_u^{(l-1)}+\mathbf{b}_{v}^{(l)}\right), \forall v\in V
    \end{aligned}
\end{equation*}

This bilinear formulation involves quadratic constraints. Instead of using binary variables to directly control the existence of contributions between nodes, we introduce auxiliary variables $\mathbf{\bar x}_{u\to v}^{(l-1)}$ to represent the contribution from node $u$ to node $v$ in $l$th layer:
\begin{equation*}
    \begin{aligned}
        \mathbf{x}_v^{(l)}=\sigma\left(\sum\limits_{u\in V}\mathbf{w}_{u\to v}^{(l)}\mathbf{\bar x}_{u\to v}^{(l-1)}+\mathbf{b}_{v}^{(l)}\right), \forall v\in V
    \end{aligned}
\end{equation*}
where
\begin{equation*}
    \begin{aligned}
        \mathbf{\bar x}_{u\to v}^{(l-1)}=\begin{cases}
            0, & A_{u,v}=0\\
            \mathbf{x}_u^{(l-1)}, & A_{u,v}=1
        \end{cases}
    \end{aligned}
\end{equation*}
Assume that each feature is bounded, then the definition of $\mathbf{\bar x}_{u\to v}^{(l-1)}$ could be reformulated using big-M:
\begin{equation*}
    \begin{aligned}
        \mathbf{x}_{u}^{(l-1)}-\mathbf{M}_{u}^{(l-1)}(1-A_{u,v})\le &\mathbf{\bar x}_{u\to v}^{(l-1)}\le \mathbf{x}_{u}^{(l-1)}+\mathbf{M}_{u}^{(l-1)}(1-A_{u,v})\\
        -\mathbf{M}_{u}^{(l-1)}A_{u,v}\le &\mathbf{\bar x}_{u\to v}^{(l-1)}\le \mathbf{M}_u^{(l-1)}A_{u,v}
    \end{aligned}
\end{equation*}
where $|\mathbf{x}_u^{(l-1)}|\le \mathbf{M}_u^{(l-1)}, A_{u,v}\in\{0,1\}$. By adding extra continuous variables and constraints, as well as utilizing the bounds for all features, the big-M formulation replaces the bi-linear constraints by linear constraints.



### Transforming a GNN with Non-fixed Graph Structure into a Dense NN

Since the graph structure is unknown, all $\mathbf{w}_{u\to v}^{(l)},\mathbf{b}_v^{(l)}$ should be provided. We reuse the previous example but this time the parameters in the Dense NN become:

\begin{equation*}
        \mathbf{W}=\begin{pmatrix}
            \mathbf{w_1} & \mathbf{w_2} & \mathbf{w_2} \\
            \mathbf{w_2} & \mathbf{w_1} & \mathbf{w_2} \\
            \mathbf{w_2} & \mathbf{w_2} & \mathbf{w_1} \\
        \end{pmatrix},
        \mathbf{B}=\begin{pmatrix}
        \mathbf{b}\\\mathbf{b}\\\mathbf{b}
        \end{pmatrix}
    \end{equation*}
    
Repeat all process before building formulation for the Dense NN.

In [20]:
# graph structure
# number of nodes
N = 3
# adjacency matrix
A = np.array([[1,1,1],[1,1,1],[1,1,1]])

# in/out features
in_features = 2
out_features = 3

# architecture of GNN
# sage: in_features to out_features for each node, with ReLU as activation
# add_pool: read out, sum out_features of each node
# dense: out_features to 1
gnn_layers = ['sage', 'add_pool', 'dense']
activations = [True, False, False]

# randomly generate GNN parameters from (-1,1)
# in practice, these paprameters should be extracted from the trained GNN
np.random.seed(123)
sage_w1 = 2.* np.random.rand(out_features, in_features) -1.
sage_w2 = 2.* np.random.rand(out_features, in_features) -1.
sage_b = 2. * np.random.rand(out_features) - 1.

dense_w = 2.* np.random.rand(1, out_features) - 1.
dense_b = 2.* np.random.rand(1) - 1.

params = []
channels = []
channels.append(N * in_features)

for layer in gnn_layers:
    if layer == 'sage':
        params.append(SAGE_to_Dense(N,A,sage_w1,sage_w2,sage_b))
        channels.append(sage_w1.shape[0] * N)
    elif layer == 'dense':
        params.append((dense_w,dense_b))
        channels.append(w.shape[0])
    elif layer == 'add_pool':
        channels.append(channels[-1] // N)
        w = np.zeros((channels[-1],channels[-2]))
        for i in range(channels[-1]):
            for j in range(N):
                w[i, i+j*channels[-1]] = 1.
        b = np.zeros(channels[-1])
        params.append((w,b))
        
model_dense = PyTorchModel(len(channels)-1, params, activations)
# print(model_dense)

# for param in model_dense.parameters():
#     print(param)

dummy_input = torch.zeros(channels[0], dtype=torch.float64)
dummy_input.requires_grad=True
input_bounds = [(-1., 1.) for _ in range(channels[0])]

with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:
    #export neural network to ONNX
    torch.onnx.export(
        model_dense,
        dummy_input,
        f,
        input_names=['input'],
        output_names=['output'],
    )
    #write ONNX model and its bounds using OMLT
    write_onnx_model_with_bounds(f.name, None, input_bounds)
    #load the network definition from the ONNX model
    network_definition = load_onnx_neural_network_with_bounds(f.name)

### Build a MIP Formulation and Solve the Optimization Problem

Note that all types of layers are represented as dense layers in OMLT now. Using `gnn_layer_definition` to retrieve GNN layers. The number of nodes `N` and the list of indexes for GNN layers `gnn_layers` should be provided here.

In [21]:
# replace dense layers with GNN layers
gnn_net = gnn_layer_definition(network_definition, N=N, gnn_layers=[1])
    
for layer_id, layer in enumerate(gnn_net.layers):
    print(f"{layer_id}\t{layer}\t{layer.activation}")
    
formulation = ReluBigMFormulation(gnn_net)

0	InputLayer(input_size=[6], output_size=[6])	linear
1	GNNLayer(input_size=[6], output_size=[9])	linear
2	DenseLayer(input_size=[9], output_size=[3])	linear
3	DenseLayer(input_size=[3], output_size=[1])	linear


Before building formulation for GNN layers, one needs to define binary variables for adjacency matrix $A$, which is required when using `build_formulation` to encode GNN layers. 

Here we set the diagonal elements of $A$ be $1$ to guarantee the self contribution of each node. However, one can fix different elements in $A$ based on different problems. For example, fix most elements in $A$ and only optimize over a subset of edges. The extrame case is that fixing all elements, which is equivalent to the case with fixed graph structure.

In [22]:
# create pyomo model
m = pyo.ConcreteModel()

# create an OMLT block for the neural network and build its formulation
m.nn = OmltBlock()

# initialize graph information
m.nn.A = pyo.Var(
    pyo.Set(initialize=range(N)), pyo.Set(initialize=range(N)), within=pyo.Binary
)
# usually, the contribution from node v to itself exists
for i in range(N):
    m.nn.A[i, i].fix(1)

m.nn.build_formulation(formulation)

m.obj = pyo.Objective(expr=(m.nn.outputs[0]))
pyo.SolverFactory("cbc").solve(m, tee=True)

Welcome to the CBC MILP Solver 
Version: 2.10.5 
Build Date: Dec  8 2020 

command line - /rds/general/user/sz421/home/anaconda3/envs/OMLT/bin/cbc -printingOptions all -import /var/tmp/pbs.7796016.pbs/tmp4_rnxuex.pyomo.lp -stat=1 -solve -solu /var/tmp/pbs.7796016.pbs/tmp4_rnxuex.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
Presolve 63 (-49) rows, 33 (-37) columns and 185 (-104) elements
Statistics for presolved model
Original problem has 6 integers (6 of which binary)
Presolved problem has 6 integers (6 of which binary)
==== 16 zero objective 8 different
3 variables have objective of -0.649096
1 variables have objective of -0.635017
4 variables have objective of -0.268763
16 variables have objective of 0
2 variables have objective of 0.312969
1 variables have objective of 0.475991
2 variables have objective of 0.481896
4 variables have objective of 0.533943
==== absolute objective values 8 different
16 variables have objective of 0
4 variables h

{'Problem': [{'Name': 'unknown', 'Lower bound': -2.55499393, 'Upper bound': -2.55499393, 'Number of objectives': 1, 'Number of constraints': 63, 'Number of variables': 33, 'Number of binary variables': 6, 'Number of integer variables': 6, 'Number of nonzeros': 17, 'Sense': 'minimize'}], 'Solver': [{'Status': 'ok', 'User time': -1.0, 'System time': 0.0, 'Wallclock time': 0.0, 'Termination condition': 'optimal', 'Termination message': 'Model was solved to optimality (subject to tolerances), and an optimal solution is available.', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': 0, 'Number of created subproblems': 0}, 'Black box': {'Number of iterations': 0}}, 'Error rc': 0, 'Time': 0.03782343864440918}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

## Conclusion

For cases with fixed graph structure, one needs to transform the trained GNN into a Dense NN before using OMLT. After the transformation step, optimizing over a trained GNN is equivalent to optimizing over the corresponding Dense NN. No extra action is needed when using OMLT to encode the Dense NN.

For cases with non-fixed graph structure, the following actions are required:

- providing all $\mathbf{w}_{u\to v}^{(l)},\mathbf{b}_v^{(l)}$ in transformation step since any of them could be used.
- using `gnn_layer_definition` to retrieve GNN layers after loading ONNX model. The number of nodes in graph `N` and the list of indexes for GNN layers `gnn_layers` should be provided here.
- defining binary variables for adjacency matrix $A$ before using `build_formulation` since these variables are used to formulate GNN layers.