***
***
# **1. Introduction**
***
***

This document summarizes the results of Neural Horizon MPC (*NH-MPC*) implemented with acados. Acados is an open-source framework for fast embedded optimal control. It utilzes CasADi expressions for shorter code sequences and faster code. In addition, acados offers many real-time capable solvers for quadratic programming (*QP*), as well as nonlinear programming (*NLP*). In short, NH-MPC uses a neural network to predict the proportion of costs that should be optimized in a classical MPC formulation. In the experiments, the influence of pruning neural networks used for NH-MPCs is also evaluated. For each subtask there are **notebooks** to create custom [datasets](Dataset_generation.ipynb), [FFNNs](Network_generation.ipynb), [Acados MPCs](Multi_AMPC.ipynb), [NH-MPCs](Multi_NH.ipynb) or even [NH-MPCs with pruned FFNNs](Multi_NH_prun.ipynb). However, there is also a [tutorial notebook](Tutorial_NH_AMPC.ipynb) that provides an example of the entire workflow to create a NH-MPC with acados.

***
***
# **2. Fundamentals**
***
***

## 2.1 Model Predictive Control
***

Model Predictive Control (*MPC*) is a discretized optimization problem that is subject to constraints. The open loop MPC optimizes the cost of a specified horizon, which is the "look into the future". However, only the first input is used in the closed loop MPC as the actual input. Therefore, MPC makes an optimized prediction of a horizon of $M$ into the future, while only taking the first optimized input as the actual input to the plant leading to the next state. The MPC formulation for the open loop with a horizon of $M$ is shown below:


<a id="eq-mpc"></a>

$$
\begin{equation}
    \begin{aligned}
        \{ x_k^* \}_0^M, \{ u_k^* \}_0^{M-1} = & \operatorname*{argmin}_{\{ x_k \}_0^M, \{ u_k \}_0^{M-1}} L \left( \{ x_k \}_0^{M-1}, \{ u_k \}_0^{M-1} \right) + V(x_M) \quad \quad \quad \quad \quad \\
        \text{s.t.} \quad & x_{k+1} = f(x_k, u_k) && \forall k \in [0, \dots, M-1] \\
        & x_k \in \boldsymbol{\mathcal{X}_k} && \forall k \in [0, \dots, M] \\
        & u_k \in \boldsymbol{\mathcal{U}_k} && \forall k \in [0, \dots, M-1] \\
        & x_0 = x_{init}.
    \end{aligned}
    \tag{1}
\end{equation}
$$


## 2.2 Acados and CasADi
***

The paper of [Verschueren, 2021](#verschueren2021), presents the acados open-source framework for fast embedded optimal control. Acados is implemented in *C* to ensure high performance in optimal control while having the ability to quickly design optimal control algorithms with high-level programming languages such as *Pyhon*. The framework includes various solvers, such as *HPIPM*, *OSQP*, *DAQP*, *qpOASES*. Furthermore, acados can be used for MPC as well as MHE problems. It utilizes *BLASFEO* for high-performance linear algebra operations and is compatible with *CasADi* expressions, enabling faster code and shorter instruction sequences. For nonlinear programming (*NLPs*), Gauss-Newton SQP and Exact-Hessian SQP are provided, as well as Real-Time Iteration (*RTI*) and Advanced-Step Real-Time Iteration (*AS-RTI*) algorithms.  

CasADi is an open-source tool for nonlinear optimization and dynamic simulation presented by [Andersson, 2019](#andersson2019). It helps to solve complex optimization problems efficiently by providing a flexible interface like *Python*, for their formulation and solution. At its core, CasADi uses a symbolic framework for algorithmic forward and backward differentiation and it can also be exported into stand-alone C code.

## 2.3 Neural Networks
***

Neural networks can be used to predict or map non-linear functions and thus to predict trajectories. They utilize the nonlinearities of the activation functions which can be used after every neuron. In a feed forward neural network (*FFNN*), each neuron is connected to every neuron in the previous and next layer, while each layer can have multiple neurons. An FFNN consists an input layer one or more hidden layers $L$ and an output layer. The information flows only in one direction and there are no feedback loops. All connection from the last layer to a neuron in the current layer have associated weights $\boldsymbol{w}_j^{(l)}$ by which the output of the last layer neurons $\boldsymbol{a}^{(l-1)}$ is multiplied. In addition, each neuron has a bias $b_j^{(l)}$, which is added to the term. This term is then used as an input of the activation function and thus leads to a general FFNN formulation:

<a id="eq-neural-networks"></a>

$$
\begin{equation}
    \begin{gathered}
        a_j^{l} = \sigma \left( \boldsymbol{a}^{(l-1)} \cdot \boldsymbol{w}_j^{(l)} + b_j^{(l)} \right), \quad \quad \forall l = 1,2, \dots , L
        \\
        \boldsymbol{a}^{(0)} = \boldsymbol{x}, \quad \quad \boldsymbol{y} = \boldsymbol{a}^{(l)} \cdot \boldsymbol{w}_j^{(l+1)}
    \end{gathered}
    \tag{2}
\end{equation}
$$

Here, $\boldsymbol{w}_j^{(l)}$ and $b_j^{(l)}$ are the trainable parameters, which can be denoted as $\theta = \{ \boldsymbol{w}_j^{(l)}, b_j^{(l)} \} \in \mathbb{R}^h$. Whereas FFNNs can be described as $\boldsymbol{y} = \mathrm{FFNN}(\boldsymbol{x}; \theta): \, \mathbb{R}^n \rightarrow \mathbb{R}^m$

## 2.4 Pruning
***

Pruning neural networks referes to removing or cutting weights, biases or whole neurons. There are several different pruning techniques and the three major categories are global unstructured, local unstructured and local structured pruning. Global pruning means that e.g. the smallest weights are selected over the whole network. Local pruning, on the other hand, referes to only selecting e.g. the smallest weights over one layer for all layers. However, structured pruning is defined as pruning entire neurons, including all corresponding parameters. Unstructured pruning, as the name suggests, is not structured, where the parameters are pruned individually.

### 2.4.1 Finetuning

In addition, [Han, 2015](#han2015) presents finetuning, which describes that neural networks can be retrained after pruning to achieve better accuracies. This can also be done iteratively with different pruning amount schedules to achieve even better accuracies and compression. Finetungin is applied as follows:

 1. Randomly initialize a neural network with parameters $\theta_0$.
 1. Train for $j$ epochs, resulting in parameters $\theta_j$.
 1. Prune $\rho\%$ of $\theta_j$, resulting in mask $m$.
 1. Apply mask to network with parameters $\theta_j$ with $m\odot\theta_j$.
 1. Retrain pruned network 

The process of step 3 - 5 can be applied iteratively to find an even better network in terms of compression and accuracy.

### 2.4.2 Lottery-Ticket-Hypothesis

The Lottery-Ticket-Hypothesis was introduced by [Frankle, 2019](#frankle2019). The idea is that every randomly initialized network has a subnetwork that can achieve the accuracy of the original trained network when trained seperately. The algorithm includes the following steps:
 1. Randomly initialize a neural network with parameters $\theta_0$.
 1. Train for $j$ epochs, resulting in parameters $\theta_j$.
 1. Prune $\rho\%$ of $\theta_j$, resulting in mask $m$.
 1. Apply mask to initial network with parameters $\theta_0$ with $m\odot\theta_0$.
 1. Train pruned initial network

Iterative pruning can be achieved with repeating steps 3 - 5. The applied pruning threshold of the survived parameters is $\rho^{\frac{1}{n}}\%$, where $n$ is the number of iterations.

***
***
# **3. Experimental setup**
***
***

In the following, the experimental setup is shown. Hence, the used model, Neural Horizon MPC, dataset and the neural network setup is specified. Furthermore, there is an explanation of how Neural Horizon MPC is implemented with acados and CasADi and also how the workflow is to create a solver. 

## 3.1 Inverse Pendulum on a cart
***

The selected problem is an inverted pendulum on a cart. In continuous time the model can be derived by

<a id="eq-inverted-pendulum"></a>

$$
\begin{equation}
    \begin{gathered}
        \dot{x}_{cart} = v , 
        \quad \quad 
        \dot{v} = \frac{\mu_1(\theta, \omega)\cos(\theta) + F + gm\cos(\theta)\sin(\theta)}{\mu_2(\theta)} \\
        \dot{\theta} = \omega , 
        \quad \quad 
        \dot{\omega} = \frac{\mu_1(\theta, \omega)\cos(\theta) +F\cos(\theta)}{l\mu_2(\theta)}
    \end{gathered}
    \tag{3}
\end{equation}
$$

with $\mu_1(\theta, \omega) = -lm\sin(\theta)\omega^2$ and $\mu_2(\theta) = M + m(1-\cos^2(\theta))$. The cart position is denoted as $x_{cart}$, the velocity as $v$, $\theta$ represents the angle of the pendulum and $\omega$ the
angular velocity. A state $x$ is defined as $\begin{bmatrix} x_{cart} & \theta & v & \omega \end{bmatrix}$. Furthermore, the model is discretized with $\Delta t = 20 \, \text{ms}$ by the Runge-Kutta method of the 4th order. The model parameters are given in [tab. *1*](#tab-model-parameters).

<a id="tab-model-parameters"></a>

<center>
    <strong>Table 1:</strong> Parameters of model

| Parameter             | Value                                 |
| :---                  | ---:                                  |
| $M$                   | $$1\,\text{kg}$$                      |
| $m$                   | $$0.1\,\text{kg}$$                   |
| $g$                   | $$9.81\,\frac{\text{m}}{\text{s}^2}$$ |
| $l$                   | $$0.8\,\text{m}$$                     |
</center>

## 3.2 Neural Horizon MPC
***

This chapter is a brief introduction into the topic of Neural Horizon MPC's (*NH-MPC*), introduced by [Alsmeier, 2024](#alsmeier2024). The main idea behind it, is to replace part of the MPC's horizon with a feed forward neural network (*FFNN*). Therefore, the optimization algorithm needs to optimize the FFNN's output, whereas otherwise it would have to optimize every state and input individually. However, the formulation of the NH-MPC is shown below:

<a id="eq-nh-mpc"></a>

$$
\begin{equation}
    \begin{aligned}
        \{ x_k^* \}_0^M, \{ u_k^* \}_0^{M-1} = & \operatorname*{argmin}_{\{ x_k \}_0^M, \{ u_k \}_0^{M-1}} L \left( \{ x_k \}_0^{M}, \{ u_k \}_0^{M} \right) + \tilde{L} \left( \{ \tilde{x}_k \}_{M+1}^{N-1} \right) + V(\tilde{x}_N) \\
        \text{s.t.} \quad & x_{k+1} = f(x_k, u_k) && \forall k \in [0, \dots, M-1] \\
        & x_k \in \boldsymbol{\mathcal{X}_k} && \forall k \in [0, \dots, M] \\
        & u_k \in \boldsymbol{\mathcal{U}_k} && \forall k \in [0, \dots, M-1] \\
        & x_0 = x_{init} \\
        & \{ \tilde{x}_k \}_{M+1}^N = FFNN(x_M) \\
        & \tilde{x}_k \in \boldsymbol{\mathcal{\tilde{X}}_k} && \forall k \in [M+1, \dots, N]
    \end{aligned}
    \tag{4}
\end{equation}
$$

Here, the Neural Horizon is $N$, while the tail-horizon $N-M$ is approximated by an FFNN. Furthermore, it uses a quadratic stage cost $L \left( \{ x_k \}_0^{M-1}, \{ u_k \}_0^{M-1} \right) = \sum_{k=0}^{M-1} ||x_k||_Q^2 + ||u_k||_R^2$, as well as a quadratic terminal cost $V(x_M) = ||x_M||_P^2$. Note that $||\xi_k||_W^2$ denotes the weighted $l^2$-norm. As seen in [eq. *4*](#eq-nh-mpc), the network takes the *last state* $x_M$ of the MPC as inputs and returns the *predicted states* $\{ \tilde{x}_k \}_{M+1}^N$ as outputs. The selected parameters for the MPC are found in [tab *2*](#tab-mpc-parameters). However, the MPC horizon and Neural Horizon is slected to be $M = 8$ and $N \in \{ 20, 25, 30, 35, 40, 45, 50, 60, 70 \}$ respectively.

<a id="tab-mpc-parameters"></a>

<center>
    <strong>Table 2:</strong> Parameters for MPC

| Parameter             | Value                                                                                                     |
| :---                  | ---:                                                                                                      |
| $Q$                   | $$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1e-5 & 0 \\ 0 & 0 & 0 & 1e-5 \end{bmatrix}$$    |
| $R$                   | $$1e-5$$                                                                                                    |
| $P$                   | $$Q$$                                                                                                       |
| $Q_{NN}$              | $$Q$$                                                                                                       |
| $x_{bnd}$             | $$\begin{bmatrix} 2 & 6\pi & 10 & 10 \end{bmatrix}$$                                                        |
| $u_{bnd}$             | $$80$$                                                                                                      |
| $x_{init}$            | $$\begin{bmatrix} 0 & \pi & 0 & 0 \end{bmatrix}$$                                                           |
| $\Delta t$            | $$0.02 \text{ s}$$                                                                                          |
| $T_{sim}$             | $$3 \text{ s}$$                                                                                             |
</center>

## 3.3 CasADi NH-AMPC implementation
***

The CasADi implementations are realized in the files [inverted_pendulum.py](src/inverted_pendulum.py) and [mpc_classes.py](src/mpc_classes.py), where CasADi is used to generate an MPC or a NH-MPC. 

## 3.4 Acados NH-MPC implementation
***

The file [inverted_pendulum_acados.py](src/inverted_pendulum_acados.py) implements the acados model, as well as the discretization of it. For this, the same approach as in [Alsmeier, 2024](#alsmeier2024) is used. The file [mpc_classes_acados.py](src/mpc_classes_acados.py) contains the acados MPC (*AMPC*) and the NH-AMPC implementations. Here, the latter uses 
$V(x_M) + \tilde{L} \left( \{ \tilde{x}_k \}_{M+1}^{N-1} \right) + V(\tilde{x}_N)$
 of [eq. *4*](#eq-nh-mpc) as a nonlinear least sqares terminal cost, which can be represented in matrix form so that acados can use it:

<a id="eq-acados-terminal-state-weight"></a>

$$
\begin{equation}
    \begin{gathered}
        \bar{V}^e \left( \bar{x}^e \right) = ||\bar{x}^e||_{\bar{W}^e}^2
        \text{ ,} 
        \\
        \bar{x}^e = 
        \begin{bmatrix} 
            x_M & \tilde{x}_{M+1} & \dots & \tilde{x}_{N}
        \end{bmatrix}
        \text{ ,} 
        \quad \quad \quad
        \bar{W}^e = 
        \begin{bmatrix} 
            Q &  &  & \\
            & Q_{NN} &  & \\
            &  & \ddots & \\
            &  &  & Q_{NN} \\
        \end{bmatrix} 
    \end{gathered}
    \tag{5}
\end{equation}
$$

Here $Q_{NN}$ denotes the selected weight for each FFNN predicted states $\{ \tilde{x}_k \}_{M+1}^N$ and $Q$ is the weight for the state $x_M$, resulting in the weight matrix $\bar{W}^e$ in [eq. *5*](#eq-acados-terminal-state-weight) for the terminal cost $\bar{V}^e$ of the AMPC. Note that the predicted terminal weight is also selected to be $Q_{NN}$ for simplicity and that the terminal state for acados $\bar{x}^e$ includes the states $x_{M}$ and $\tilde{x}_{N}$. Furthermore, it can be seen in [eq. *4*](#eq-nh-mpc) that $\{ \tilde{x}_k \}_{M+1}^N$ dependents on $x_{M}$. However, the calculation of the terminal cost $\bar{V}^e$ can be heavy with the shape of the terminal state $\bar{x}^e$ being $\mathbb{R}^{h \times 1}$ and the shape of $\bar{W}^e$ being $\mathbb{R}^{h \times h}$, with $h = (N-M+1) \cdot n_{x}$. 

## 3.5 Dataset setup
***

The two dataset used get the open loop MPC trajectories of the problem with an in acados implemented MPC that uses the solver settings in [tab. *3*](#tab-dataset-setup). Real-time-iteration (*RTI*) was used until convergence to solver the NLP, which is basically an SQP solver. However, RTI was used because the SQP solver does not work as intendet, as soon as NaNs are in the QP results. Note that all failed MPCs are erased and not used for the dataset. The datasets are set up with a horizon of $D=70$ and $30000$ datapoints. Furthermore, all initial states of the dataset are random values in the bounds of the states. In this case, the bounds are tightened to a subset of the MPC state bounds $\boldsymbol{\mathcal{X}}$.
$$\tilde{x}_{bnd} \in \begin{bmatrix} 0.75 & 0.25 & 0.25 & 0.25\end{bmatrix} \odot x_{bnd}$$

The datapoints which are used as features and labels, are denoted as $\Phi_{\gamma}(\cdot)\mid_{x_l}$ and $\Phi_{\gamma}(\cdot)\mid_{x_{l+1},\dots, x_{l-M+N}}$ respectively. The starting point $l$ is selected such that $l \in \left[0, D+M-N \right]$. The right choice can lead to a better FFNN accuracy. However, in this experiments only $l=8$ is used.


<a id="tab-dataset-setup"></a>

<center>
    <strong>Table 3:</strong> Dataset solver setup

| Key                   | Value                     |
| :---                  | ---:                      |
| qp_solver             | 'FULL_CONDENSING_HPIPM'   |
| integrator_type       | 'DISCRETE'                |
| nlp_solver_type       | 'SQP_RTI'                 |
| use_iter_rti_impl     | True                      |
| use_initial_guesses   | True                      |
| rti_tol               | $1e-6$                    |
</center>

For creating the datasets with CasADi MPCs, which utilzes the IPopt solver, the file [data_generation.py](src/data_generation.py) implements that.

With acados, however, data sets can be generated faster. The file [data_generation_acados.py](src/data_generation_acados.py) can be used for this purpose. Although, one can also use the notebook [Dataset_generation.ipynb](Dataset_generation.ipynb) for datageneration with predefined already working solver options.

## 3.6 FFNN setup
***

For the FFNNs, the hidden layers are selected to be $L=3$. However, the number of input neurons is the size of a state 
$$n_{input}=4$$ 
and the neurons per hidden layer are set such that they are the same for each hidden layer, namely 
$$n_{hidden} \in \{12, 16, 24, 32, 48, 64, 96, 128, 192, 256, 384\}.$$ 
In addition, the number of output neurons is 
$$n_{output} = (N - M) \in \{ 12, 17, 22, 27, 32, 37, 42, 52, 62 \},$$
with $M = 8$. 
As previously mentioned, the features and labels of the datasets used for the FFNN are $\Phi_{\gamma}(\cdot)\mid_{x_8}$ and $\Phi_{\gamma}(\cdot)\mid_{x_9,\dots, x_{N}}$ respectively.
The files [neural_network.py](src/neural_network.py) and [neural_horizon.py](src/neural_horizon.py) implement the FFNN in pytorch as well as the CasADi version of the FFNN.

## 3.7 Workflow summary
***

Here a short summary of the workflow creating your own NH-AMPC
1. Generate two datasets with the notebook [Dataset_generation.ipynb](Dataset_generation.ipynb) via acados. (You can use a larger horizon as for the NH-MPC)

1. Generate a FFNN using the notebook [Network_generation.ipynb](Network_generation.ipynb) (Optionally create a pruned FFNN by using the notebook [Pruned_Network_generation.ipynb](Pruned_Network_generation.ipynb)).

1. Create a solver by using the notebook [Multi_NH.ipynb](Multi_NH.ipynb) (or [Multi_NH_prun.ipynb](Multi_NH_prun.ipynb) for pruned FFNNs) <br>
Note that the model is still the *inverted pendulum on a cart*. For changing that, one can use the function *generate_AMPC_trajs* in the file [generate_trajs.py](src/generate_trajs.py) as a blueprint.

1. Inspect the trajectories using the notebook [Show_results.ipynb](Show_results.ipynb).

***
***
# **4. Results**
***
***

The goal of this chapter is to provide some insights that clearify which setups are usefull for the NH-AMPCs in the selected case. The main goal here is to preserve cost while reducing the solving time.

## 4.1 Dataset results
***

[Fig. *1*](#fig-dataset) shows the train (*v5*) and test (*v6*) datasets used in the experiments. It can be seen that the trajectories first expand and then converge to the reference point, especially in $x_{cart}$. It is easy to see that both datasets are similar in their trajectories and the corresponding distributions. For simplicity of the FFNN, the reference state is selected to be $x_{ref} = \boldsymbol{0}$. 

<a id="fig-dataset"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/datasets_70M_30000steps.png" alt="Datasets" title="Datasets" />
    <figcaption style="text-align: center;">
        <b>Figure 1:</b> 200 samples of the datasets with $M=70$
    </figcaption>
</div>

## 4.2 FFNNs $R^2$-scores
***

Each network is trained on the same training dataset and also tested on the same test dataset. The specifications of the datasets are given in [ch. Dataset setup](#35-dataset-setup). After creating these datasets, the FFNNs specified in [ch. FFNN-Setup](#36-FFNN-setup) are trained and tested. Looking at the test $R^2$-scores in [fig. *2*](#fig-network-r2-scores), it becomes clear that the larger FFNNs above $128$ hidden neurons are overfitting. This is especially the case for a larger number of outputs, like $N_{NN} = 52$ or $N_{NN} = 62$. Therefore, in the subsequent chapters, only the numbers of hidden neurons $n_{hidden} \in \{12, 16, 24, 32, 48, 64, 96\}$ are taken into account.

<a id="fig-network-r2-scores"></a>

<div style="text-align: center;">
    <img src="Results_more_TrainedNNs/PNGs/R2_scores.png" alt="R2-scores" title="R2-scores" />
    <figcaption style="text-align: center;">
        <b>Figure 2:</b> Test $R^2$-scores for <i> 10 </i> FFNNs each with $n_{hidden} \in \{12, 16, 24, 32, 48, 64, 96, 128, 192, 256, 384\}$
    </figcaption>
</div>

## 4.3 AMPC compared to CMPC and NH-CMPC
***

[Fig. *3*](#fig-cmpc-nhcmpc-ampc) illustrates the results for the CasADi MPC (*CMPC*) as well as NH-CMPC and Acados MPC (*AMPC*). Here it becomes clear that the solving of the MPC is a lot faster with acados, approximately $100 \times$ faster. However, acados needs time to generate the C-Code, which is not included in this representation. Only the solving times are concluded. Note, that the NH-CMPCs used the same trained FFNNs as the same setup NH-AMPCs. Hence, the result differ in terms of cost than in [Alsmeier, 2024](#alsmeier2024), because of the acados generated datasets. This means th FFNNs are all trained on data generated by AMPCs and not CMPCs anymore. The solver selected for all acados MPCs including the NH-AMPCs is the AS-RTI-D solver with full condensing HPIPM, as shown in [tab. *4*](#tab-acados-solver-setup). The setting comes from the paper [Frey, 2024](#frey2024), where this setting provides the best results regarding relative suboptimality. Finally, the cost of the AMPCs is slightly more optimal with $103.44$, whereas the CMPCs with *IPOPT* achieve a cost of $105.23$.

<a id="fig-cmpc-nhcmpc-ampc"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/CMPC_NH_CMPC_AMPC_results.png" alt="CMPC, NH-CMPC, AMPC results" title="CMPC, NH-CMPC, AMPC results" />
    <figcaption style="text-align: center;"><b>Figure 3:</b> Comparison of CMPC 30M, NH-CMPC 8M 30N and AMPC 30M</figcaption>
</div>


<a id="tab-acados-solver-setup"></a>

<center>

**Table 4:** AMPC solver setup

| Key                   | Value                     |
| :---                  | ---:                      |
| qp_solver             | 'FULL_CONDENSING_HPIPM'   |
| integrator_type       | 'DISCRETE'                |
| nlp_solver_type       | 'SQP_RTI'                 |
| as_rti_iter           | $3$                       |
| as_rti_level          | $3$                       |
| nlp_solver_tol_stat   | $1e-6$                    |
| nlp_solver_max_iter   | $3$                       |

## 4.4 NH-AMPC compared to AMPC
***

In this chapter, the results of multiple setups of the NH-AMPC are proposed. Each setup is executed $10$ times, with differently initialised and trained FFNNs, resulting in $10$ unique NH-AMPCs. [Fig. *4*](#fig-ampc-nhampc-sc-time) gives an overview of the cost over the solving time of the different testet setups. The test setup includes every possible combination of

<a id="eq-nh-setups"></a>

$$
\begin{equation}
    \begin{aligned}
        M & = 8 \\
        N_{NN} & \in \{ 12, 17, 22, 27, 32, 42, 52, 62 \} \\ 
        n_{hidden} & \in \{ 12, 16, 24, 32, 48, 64, 96 \}
        .
    \end{aligned}
    \notag
\end{equation}
$$

Note that $N_{NN} = N - M$. Furthermore, the reference of the AMPC withouth Neural Horizon is shown with the dotted black line, where the time is averaged. The cost is for all $10$ runs the same. Also the costs are clipped to 150, oterwise the figure would not be legible.
 
<a id="fig-ampc-nhampc-sc-time"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/scatter_cost_time_NH.png" alt="NH-AMPCs cost over solving time" title="NH-AMPCs cost over solving time" />
    <figcaption style="text-align: center;"><b>Figure 4:</b> NH-AMPCs cost over solving time with different horizons <br> and different number of neurons per hidden layer</figcaption>
</div>

It is clear that most of the NH-AMPCs with FFNNs that have $n_{hidden} = 96$ are useless, because they have worse solving times and also worse cost. However, all NH-AMPCs in the left bottom corner are worth looking into more detailed.

### 4.4.1 Cost

In [fig. *5*](#fig-ampc-nhampc-bp-cost-nh), where $n_{hidden}=22$, the cost over $N_{NN}$ is shown in boxplots. It can be seen that only the settings $N_{NN} \in \{ 17, 22 \}$ are worth further consideration in terms of cost. [Fig. *6*](#fig-nhampc-hm-cost) gives further information about the influence of $N_{NN}$ and $n_{hidden}$ on the median cost. Note that in [fig. *6*](#fig-nhampc-hm-cost), the median costs are truncated if they are higher than $110$ in order to have a better resolution with the colorbar. Remarkably, there is just a small region where the median costs are good. Especially on the right side of the heatmap with higher $N_{NN}$ the median cost increases and is higher than $110$.

<a id="fig-ampc-nhampc-bp-cost-nh"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_cost_NH.png" alt="NH-AMPCs cost over neural horizon" title="NH-AMPCs cost over neural horizon" />
    <figcaption style="text-align: center;"><b>Figure 5:</b> NH-AMPCs cost over Neural Horizons for $n_{hidden} = 22$</figcaption>
</div>

<a id="fig-nhampc-hm-cost"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/heatmap_cost.png" alt="NH-AMPCs cost heatmap" title="NH-AMPCs cost heatmap" />
    <figcaption style="text-align: center;"><b>Figure 6:</b> Heatmap of NH-AMPCs cost.</figcaption>
</div>

### 4.4.2 Solving time

Looking at [fig. *7*](#fig-ampc-nhampc-bp-time-neuron), it can be concluded that the solving time rises exponentially with the number of neurons in the hidden layers $n_{hidden}$. This is no suprise regarding the growth of parameters needed for the calculations. In this case the number of parameters is calculated with 

<a id="eq-number-of-parameters"></a>

$$
\begin{equation}
    \begin{split}
        n_P &= n_{input} \cdot n_{hidden} + (L - 1) \cdot n_{hidden}^2 + n_{hidden} \cdot n_{output} \\
            &= 4 \cdot n_{hidden} + (3 - 1) \cdot n_{hidden}^2 + n_{hidden} \cdot N_{NN} \\ 
            &= n_{hidden} \cdot (6 \cdot n_{hidden} + N_{NN} )
        ,
    \end{split}
    \tag{4}
\end{equation}
$$

leading to a quadratic parameter and solving time growth $\mathcal{O}(n_{hidden}^2)$. However, it is also clear that the parameter growth and therefore the solving time is only linear with increasing number of state predictions $N_{NN}$, with $\mathcal{O}(N_{NN})$. This can also be observed in [fig. *8*](#fig-ampc-nhampc-bp-time-nh) and in [fig. *9*](#fig-nhampc-hm-time) it becomes even more clear that $n_{hidden}$ has more effect on the solving time than $N_{NN}$. Here, the mean solving time is clipped at the mean solving time of the AMPCs. Therefore, almost all $n_{hidden} = 96$ are at maximum, because they are worse than the standard AMPCs.

<a id="fig-ampc-nhampc-bp-time-neuron"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_time_neurons.png" alt="NH-AMPC solving time over neurons" title="NH-AMPC solving time over neurons" />
    <figcaption style="text-align: center;"><b>Figure 7:</b> NH-AMPCs mean solving time over number of neurons per hidden layer</figcaption>
</div>


<a id="fig-ampc-nhampc-bp-time-nh"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_time_NH.png" alt="NH-AMPC solving time over neural horizon" title="NH-AMPC solving time over neural horizon" />
    <figcaption style="text-align: center;"><b>Figure 8:</b> NH-AMPCs mean solving time over neural horizon</figcaption>
</div>

<a id="fig-nhampc-hm-time"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/heatmap_time.png" alt="NH-AMPCs solving time heatmap" title="NH-AMPCs solving time heatmap" />
    <figcaption style="text-align: center;"><b>Figure 9:</b> Heatmap of NH-AMPCs mean solving time.</figcaption>
</div>

### 4.4.3 Filtered results

Regarding the previously mentioned cost optimal settings for $N_{NN}$, [fig. *10*](#fig-cost-17n) and [fig. *11*](#fig-cost-22n) show filtered results with $N_{NN} = 17$ and $N_{NN} = 22$ respectively.  As it can be seen in [fig. *10*](#fig-cost-17n), the cost is very low for $n_{hidden} \in \{ 24, 32, 48, 64 \}$ with $N_{NN} = 17$. Therefore, it seemed beneficial to prune the best-performing FFNNs, reducing the number of hidden neurons $n_{hidden} = 64 \rightarrow \{ 16, 24, 32, 48 \}$ for comparison purposes. Here $\rightarrow$ indicates that the previous neurons per hidden layer to the left of it are pruned down to the number of hidden neurons to the right of it. Equally, this is done for $N_{NN} = 22$, where $n_{hidden} \in \{ 32, 48, 64 \}$ performed good. However, in [ch. *Pruned FFNNs on NH-AMPCs*](#45-pruned-ffnns-on-nh-ampc), the neuron pruning $n_{hidden} = 64 \rightarrow \{ 16, 24, 32, 48 \}$ is used.

<a id="fig-cost-17n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_cost_neurons_17N.png" alt="NH-AMPC cost 17N" title="NH-AMPC cost 17N" />
    <figcaption style="text-align: center;"><b>Figure 10:</b> NH-AMPCs cost with $N_{NN}=17$</figcaption>
</div>

<a id="fig-cost-22n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_cost_neurons_22N.png" alt="NH-AMPC cost 22N" title="NH-AMPC cost 22N" />
    <figcaption style="text-align: center;"><b>Figure 11:</b> NH-AMPCs cost with $N_{NN}=22$</figcaption>
</div>

[Fig. *12*](#fig-nhampc-results) visualizes the controlled trajectories, where a trend is visible of specific settings, especially with a smaller hidden neuron size. Keep in mind that the results are only displayed for $N_{NN} = 22$, or $N = 30$.

<a id="fig-nhampc-results"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/AMPC_NH_AMPC_results_0_22N.png" alt="AMPC and NH-AMPC results" title="AMPC and NH-AMPC results" />
    <figcaption style="text-align: center;"><b>Figure 12:</b> Comparison of AMPC 30M and NH-AMPC 8M 30N with different hidden layers and 10 differnetly seeded networks each</figcaption>
</div>

## 4.5 Pruned FFNNs on NH-AMPC
***

The following chapter deals with pruning of the FFNNs in a local structured way, to reduce the neurons to the desired number. The definition for the following local structured neuron pruning is $n_{hidden, \, original} \rightarrow n_{hidden, \, pruned}$, where the first parameter denotes the original networks hidden neuron size and the second parameters behind the arrow denotes the pruned network hidden neuron size. However, regarding the pruning techniques described in [ch. *Pruning*](#24-pruning), only the [Lottery-Ticket-Hypothesis](#242-lottery-ticket-hypothesis) is used as the pruning technique. 

### 4.5.1 Pruned $N_{NN} = 17$ FFNNs applied to NH-AMPCs

As mentioned in the last chapter, the selected pruning settings are $n_{hidden} = 64 \rightarrow \{ 16, 24, 32, 48 \}$ in the case of $N_{NN} = 17$. In the following boxplots, the original network AMPC results are always compared to those of the pruned from the $n_{hidden} = 64$ networks. The cost over the hidden neuron size can be visualized in [fig. *13*](#fig-nhampc-bp-pruned-cost-17n). Here, the difference is maringal. $n_{hidden} = 64 \rightarrow \{ 32, 48 \}$ seems to be slightly better, but barely noticeable. However, the $R^2$-Score, illustrated in [fig. *14*](#fig-nhampc-bp-pruned-r2-17n), is better for all of the pruned networks, even though they have exactly the same size.  For the average solving time in [fig. *15*](#fig-nhampc-bp-pruned-time-17n), there is almost no difference.

<a id="fig-nhampc-bp-pruned-cost-17n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_prun_cost_17N.png" alt="NH-AMPC cost over neurons" title="NH-AMPC cost over neurons" />
    <figcaption style="text-align: center;"><b>Figure 13:</b> NH-AMPCs cost over number of neurons per hidden layer for $N_{NN}=17$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-r2-17n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_prun_r2_17N.png" alt="NH-AMPC R2-score over neurons" title="NH-AMPC R2-score over neurons" />
    <figcaption style="text-align: center;"><b>Figure 14:</b> NH-AMPCs $R^2$-score over number of neurons per hidden layer for $N_{NN}=17$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-time-17n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_prun_time_17N.png" alt="NH-AMPC solving time over neurons" title="NH-AMPC solving time over neurons" />
    <figcaption style="text-align: center;"><b>Figure 15:</b> NH-AMPCs solving time over number of neurons per hidden layer for $N_{NN}=17$</figcaption>
</div>

### 4.5.2 Pruned $N_{NN} = 22$ FFNNs applied to NH-AMPCs

The case with $N_{NN} = 22$ is now considered, but the pruning scheme is exactly the same. In terms of cost, there is also nothing clear to say, except perhaps that FFNNs pruned to $n_{hidden} = 32$ perform slightly better, as visible in [fig. *16*](#fig-nhampc-bp-pruned-cost-22n). Here too, the $R^2$-score of the pruned networks is better, as illustrated in [fig. *17*](#fig-nhampc-bp-pruned-r2-22n).

<a id="fig-nhampc-bp-pruned-cost-22n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_prun_cost_22N.png" alt="NH-AMPC cost over neurons" title="NH-AMPC cost over neurons" />
    <figcaption style="text-align: center;"><b>Figure 16:</b> NH-AMPCs cost over number of neurons per hidden layer for $N_{NN}=22$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-r2-22n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_prun_r2_22N.png" alt="NH-AMPC R2-score over neurons" title="NH-AMPC R2-score over neurons" />
    <figcaption style="text-align: center;"><b>Figure 17:</b> NH-AMPCs $R^2$-score over number of neurons per hidden layer for $N_{NN}=22$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-time-22n"></a>

<div style="text-align: center;">
    <img src="Results/PNGs/boxplot_prun_time_22N.png" alt="NH-AMPC solving time over neurons" title="NH-AMPC solving time over neurons" />
    <figcaption style="text-align: center;"><b>Figure 18:</b> NH-AMPCs solving time over number of neurons per hidden layer for $N_{NN}=22$</figcaption>
</div>

## 4.6 Refined results with pruned FFNNs on NH-AMPC
***

The goal of this chapter is to show the refined results with a smaller setup space, but more tested FFNNs and its results on the Neural Horizon.

### 4.6.1 Pruned $N_{NN} = 17$ FFNNs applied to NH-AMPCs

As shown in [fig. *19*](#fig-nhampc-bp-pruned-cost-17n-refined), the cost for the pruned case for $n_{hidden} = 24$, has less variance and is overall closer to the optimal cost. When looking at the $R^2$-score in [fig. *20*](#fig-nhampc-bp-pruned-r2-17n-refined), it becomes even more clear that it is related to the cost of the Neural Horizon. However, the mean time, seems not to drop. The median of the mean time is even marginally higher, as can be seen in [fig. *21*](#fig-nhampc-bp-pruned-time-17n-refined). 

<a id="fig-nhampc-bp-pruned-cost-17n-refined"></a>

<div style="text-align: center;">
    <img src="Refined_Results/PNGs/boxplot_prun_cost_17N.png" alt="NH-AMPC cost over neurons" title="NH-AMPC cost over neurons" />
    <figcaption style="text-align: center;"><b>Figure 19:</b> NH-AMPCs cost over number of neurons per hidden layer for $N_{NN}=17$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-r2-17n-refined"></a>

<div style="text-align: center;">
    <img src="Refined_Results/PNGs/boxplot_prun_r2_17N.png" alt="NH-AMPC R2-score over neurons" title="NH-AMPC R2-score over neurons" />
    <figcaption style="text-align: center;"><b>Figure 20:</b> NH-AMPCs $R^2$-score over number of neurons per hidden layer for $N_{NN}=17$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-time-17n-refined"></a>

<div style="text-align: center;">
    <img src="Refined_Results/PNGs/boxplot_prun_time_17N.png" alt="NH-AMPC solving time over neurons" title="NH-AMPC solving time over neurons" />
    <figcaption style="text-align: center;"><b>Figure 21:</b> NH-AMPCs solving time over number of neurons per hidden layer for $N_{NN}=17$</figcaption>
</div>

### 4.6.2 Pruned $N_{NN} = 22$ FFNNs applied to NH-AMPCs

[Fig. *22*](#fig-nhampc-bp-pruned-cost-22n-refined) shows less variance in the cost for the observed setups $n_{hidden} \in \{24, 32\}$. For $n_{hidden} = 24$, it even has a better median, with a difference of $\approx 0.5$. The $R^2$-score in [fig. *23*](#fig-nhampc-bp-pruned-r2-22n-refined) shows the same behaviour, as in the previous $R^2$-score plots. The pruned ones achieve better $R^2$-scores. Although the mean solving time shown in [fig. *24*](#fig-nhampc-bp-pruned-time-22n-refined), does not decrease with pruning. The behaviour here is very unclear. For $n_{hidden} = 24$, the pruned NH-AMPCs needs more time to solve the problem than the unpruned versions. However, for $n_{hidden} = 32$, it is the other way around.

<a id="fig-nhampc-bp-pruned-cost-22n-refined"></a>

<div style="text-align: center;">
    <img src="Refined_Results/PNGs/boxplot_prun_cost_22N.png" alt="NH-AMPC cost over neurons" title="NH-AMPC cost over neurons" />
    <figcaption style="text-align: center;"><b>Figure 22:</b> NH-AMPCs cost over number of neurons per hidden layer for $N_{NN}=22$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-r2-22n-refined"></a>

<div style="text-align: center;">
    <img src="Refined_Results/PNGs/boxplot_prun_r2_22N.png" alt="NH-AMPC R2-score over neurons" title="NH-AMPC R2-score over neurons" />
    <figcaption style="text-align: center;"><b>Figure 23:</b> NH-AMPCs $R^2$-score over number of neurons per hidden layer for $N_{NN}=22$</figcaption>
</div>


<a id="fig-nhampc-bp-pruned-time-22n-refined"></a>

<div style="text-align: center;">
    <img src="Refined_Results/PNGs/boxplot_prun_time_22N.png" alt="NH-AMPC solving time over neurons" title="NH-AMPC solving time over neurons" />
    <figcaption style="text-align: center;"><b>Figure 24:</b> NH-AMPCs solving time over number of neurons per hidden layer for $N_{NN}=22$</figcaption>
</div>

***
***
# **5. Conclusion**
***
***

The neural horizon algorithm combined with acados is extremely fast. However, the catch is that the neural network used for solving the problem need to be more accurate to achieve a good cost. Especially for higher output spaces $N_{NN}$, the cost drops dramtically and the network is not able to generalize well. Leading to a bad solving performance for the NH-AMPC. 

***
***
# **References**
***
***

<!-- Neural Horizon -->
<a id="alsmeier2024"></a>

__[Alsmeier, H. (2024). *Neural Horizon Model Predictive Control - Increasing Computational Efficiency with Neural Networks*. Publisher.](https://arxiv.org/pdf/2408.09781)__

<!-- Acados -->
<a id="verschueren2021"></a>

__[ Verschueren, R. (2021). *acados — a modular open-source framework for fast embedded optimal control*. Springer.](https://cdn.syscop.de/publications/Verschueren2021.pdf)__

<!-- CasADi -->
<a id="andersson2019"></a>

__[Andersson, J. (2019). *CasADi - A software framework for nonlinear optimization and optimal control*. Springer.](https://optimization-online.org/wp-content/uploads/2018/01/6420.pdf)__

<!-- AS-RTI -->
<a id="frey2024"></a>

__[Frey, J. (2024). *Advanced-Step Real-time Iterations with Four Levels - New Error Bounds and Fast Implementation in acados*. IEEE.](https://ieeexplore.ieee.org/abstract/document/10552826)__

<!-- LTH -->
<a id="frankle2019"></a>

__[Francle, J. (2019). *The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks*. ICLR.](https://arxiv.org/abs/1803.03635)__

<!-- Finetuning -->
<a id="han2015"></a>

__[Han, S. (2015). *Learning both Weights and Connections for Efficient Neural Networks*. NIPS.](https://proceedings.neurips.cc/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf)__