# <span style="color:teal"> Introduction to surrogate modelling in the geosciences </span>

#### Marc Bocquet¹ [marc.bocquet@enpc.fr](mailto:marc.bocquet@enpc.fr) and Alban Farchi¹ [alban.farchi@enpc.fr](mailto:alban.farchi@enpc.fr)
##### (1) CEREA, École des Ponts and EdF R&D, IPSL, Île-de-France, France

During this session, we will apply standard machine learning methods to learn the dynamics of the Lorenz 1996 model. The objective here is to get a preview of how machine learning can be applied to geoscientific models in a low-order models where testing is quick.

### <span style="color:blue"> Importing all modules and define some visualisation functions</span>

In [None]:
import numpy as np
import tensorflow as tf
import toolbox
from tqdm.auto import trange

## <span style="color:green"> I. The Lorenz 1996 model </span>

The Lorenz 1996 (L96, [Lorenz and Emanuel 1998](https://journals.ametsoc.org/view/journals/atsc/55/3/1520-0469_1998_055_0399_osfswo_2.0.co_2.xml)) is a low-order chaotic model commonly used in data assimilation to asses the performance of new algorithms. It represents the evolution of some unspecified scalar meteorological quantity (perhaps vorticity or temperature) over a latitude circle.

The model **dynamics** is driven by the following set of ordinary differential equations (ODEs):
$$
    \forall n \in [1, N_{x}], \quad \frac{\mathrm{d}x_{n}}{\mathrm{d}t} =
    (x_{n+1}-x_{n-2})x_{n-1}-x_{n}+F,
$$
where the indices are periodic: $x_{-1}=x_{N_{x}-1}$, $x_{0}=x_{N_{x}}$, and $x_{1}=x_{N_{x}+1}$, and where the system size $N_{x}$ can take arbitrary values.

In the standard configuration, $N_{x}=40$ and the forcing coefficient is $F=8$. The ODEs are integrated using a fourth-order Runge-Kutta scheme with a time step of $0.05$ model time unit (MTU). The resulting dynamics is **chaotic** with a doubling time of errors around $0.42$ MTU (the corresponding Lyapunov is hence $0.61$ MTU). For comparison, $0.05$ MTU represent six hours of real time and correspond to an average autocorrelation around $0.967$. Finally, the model variability (spatial average of the time standard deviation per variable) is $3.64$.

In this series of experiments, we will try to emulate the dynamics of the L96 model using artificial neural networks (NN).
1. We start by running the **true model** to build a training dataset.
2. We build and **train neural networks** using this dataset.
3. We evaluate the **forecast skill** of the surrogate models (the NNs).

## <span style="color:green"> II. The true model dynamics </span>

Before building the training dataset, let us illustrate the model dynamics.

<span style="color:red"> Exercise </span>
- <span style="color:blue"> II.1. </span> Implement the `tendency()` method of the `Lorenz1996Model` class.
  This method should compute the model tendencies. You may use the
  [`roll()`](https://numpy.org/doc/stable/reference/generated/numpy.roll.html)
  function of `numpy`.
- <span style="color:blue"> II.1. </span> Implement the `forward()` method of the `Lorenz1996Model` class.
  This method should compute an integration step forward in time.
  The Runge--Kutta scheme is explained in the method's docstring.
  A simple straightforward implementation with six statements is more 
  than enough for the present set of experiments.
- <span style="color:blue"> II.2. </span> Implement the true model integration in the `perform_true_model_integration()` function.
  A simple implementation with a `for-loop` should do the job.
- <span style="color:blue"> II.2. </span> Describe the model evolution in the first few time steps and in the long-term.

### <span style="color:blue"> II.1. Defining the true model </span>

In the following cells, we define the true Lorenz 1996 model using standard values: 
- the number of variables $N_{x}$ is set to `Nx=40`;
- the forcing coefficient $F$ is set to `F=8`;
- the integration time step is set to `dt=0.05`.

In [None]:
class Lorenz1996Model:
    """Implementation of the Lorenz 1996 model.
    
    Use the `tendency()` method to compute the model tendencies (i.e., dx/dt)
    and use the `forward()` method to apply an integration step forward in time,
    using a fourth order Runge--Kutta scheme.
    
    Attributes
    ----------
    Nx : int
        The number of variables in the model.
    F : float
        The model forcing.
    dt : float
        The model integration time step.
    """

    def __init__(self, Nx, F, dt):
        """Initialise the model."""
        self.Nx = Nx
        self.F = F
        self.dt = dt

    def tendency(self, x):
        """Compute the model tendencies dx/dt.
        
        The tendencies are computed by batch using
        `numpy` vectorisation.
        
        Parameters
        ----------
        x : np.ndarray, shape (..., Nx)
            Batch of input states.
            
        Returns
        -------
        dx_dt : np.ndarray, shape (..., Nx)
            Model tendencies computed at the input states.
        """
        # TODO: implement it!
        return

    def forward(self, x):
        """Apply an integration step forward in time.
        
        This method uses a fourth-order Runge--Kutta scheme:
        k1 <- dx/dt at x
        k2 <- dx/dt at x + dt/2*k1
        k3 <- dx/dt at x + dt/2*k2
        k4 <- dx/dt at x + dt*k3
        k <- (k1 + 2*k2 + 2*k3 + k4)/6
        x <- x + dt*k
        
        Parameters
        ----------
        x : np.ndarray, shape (..., Nx)
            Batch of input states.
            
        Returns
        -------
        integrated_x : np.ndarray, shape (..., Nx)
            The integrated states after one step.
        """
        # TODO: implement it!
        return

In [None]:
# create model
true_model = Lorenz1996Model(Nx=40, dt=0.05, F=8)

# save some statistics about the model
true_model.model_var = 3.64
true_model.doubling_time = 0.42
true_model.lyap_time = 0.61

### <span style="color:blue"> II.2. Short model integration </span>

In the following cells, we perform a rather short model integration, in order to illustrate the model dynamics. The initial condition is a random field.

In [None]:
def perform_true_model_integration(Nt, Ne=1, seed=None):
    """Perform an integration in time using the true model.
    
    The initial state is a batch of random fields.
    
    Parameters
    ----------
    Nt : int
        The number of integration steps to perform.
    Ne : int
        The batch size.
    seed : int
        The random seed for the initialisation.
        
    Returns
    -------
    xr : np.ndarray, shape (Nt+1, Ne, Nx)
        The integrated batch of trajectories.
    """
    # define rng
    rng = np.random.default_rng(seed=seed)

    # allocate memory
    xt = np.zeros((Nt+1, Ne, true_model.Nx))

    # initialisation
    xt[0] = rng.normal(loc=3, scale=1, size=(Ne, true_model.Nx))
    
    # TODO: implement the model integration for Nt steps
    
    # return the trajectory
    return xt

In [None]:
# short model integration for visualisation purpose
xt_plot = perform_true_model_integration(Nt=400, Ne=1, seed=314)[:, 0]

In [None]:
# plot the trajectory
toolbox.plot_l96_traj(
    xt_plot, 
    true_model,
    linewidth=18,
)

## <span style="color:green"> III. Prepare the dataset </span>

In this section, we prepare the dataset for the entire set of experiments.

<span style="color:red"> Exercise </span>
- <span style="color:blue"> III.2. </span> Implement the `extract_input_output()` function, in which the 
  neural network input and output are extracted from a given
  trajectory. Use `numpy` slicing for this.

### <span style="color:blue"> III.1. A long model integration for the training data</span>

We now use a true model trajectory to make the **training dataset**. This trajectory starts from a random field (different than the one used for the plotting trajectory) and we discard the first $100$ time steps to get rid of the spin-up process.

In [None]:
# long model integration for the training data
xt_train = perform_true_model_integration(Nt=10_000+100, Ne=1, seed=315)[:, 0]

# discard the spin-up process
xt_train = xt_train[100:]

### <span style="color:blue"> III.2. Preprocess the training data </span>

The training dataset is made of input/output pairs, where the input is the state at a given time, and the output is the state at the following time.

In [None]:
def extract_input_output(xt):
    # TODO: extract x (input)
    # TODO: extract y (output)
    # return input/output
    return (x, y)

In [None]:
# extract input/output from the training data
x_train, y_train = extract_input_output(xt_train)

We compute the normalisation using the training data.

In [None]:
# compute input/output mean/std
x_mean = x_train.mean()
y_mean = y_train.mean()
x_std = x_train.std()
y_std = y_train.std()

# define normalisation/denormalisation functions
def normalise_x(x):
    return (x - x_mean)/x_std
def normalise_y(y):
    return (y - y_mean)/y_std
def denormalise_x(x_norm):
    return x_norm*x_std + x_mean
def denormalise_y(y_norm):
    return y_norm*y_std + y_mean

Finally, the training data is normalised. 

In [None]:
# normalise the training data
x_train_norm = normalise_x(x_train)
y_train_norm = normalise_y(y_train)

### <span style="color:blue"> III.3. Shorter model integrations for the validation and testing data</span>

We repeat the same process to make the **validation** and **testing** data. In this case, the trajectory starts from two other random fields (and we still get rid of the spin-up processes) and can be somewhat shorter, but the normalisation must be the same as for the training data.

In [None]:
# short model integration for the validation data
xt_valid = perform_true_model_integration(Nt=1_000+100, Ne=1, seed=316)[:, 0]

# discard the spin-up process
xt_valid = xt_valid[100:]

# extract input/output from the validation data
x_valid, y_valid = extract_input_output(xt_valid)

# normalise the validation data
x_valid_norm = normalise_x(x_valid)
y_valid_norm = normalise_y(y_valid)

In [None]:
# short model integration for the testing data
xt_test = perform_true_model_integration(Nt=1_000+100, Ne=1, seed=317)[:, 0]

# discard the spin-up process
xt_test = xt_test[100:]

# extract input/output from the testing data
x_test, y_test = extract_input_output(xt_test)

# normalise the testing data
x_test_norm = normalise_x(x_test)
y_test_norm = normalise_y(y_test)

### <span style="color:blue"> III.4. An ensemble model integration for the forecast skill data</span>

In order to assess the forecast skill of the surrogate model, we will use a different test dataset, in which we record an ensemble of **trajectories** (instead of an ensemble of input/output pairs). This will allow us to measure the accuracy of the forecast for longer integration times.

In [None]:
# ensemble integration for the forecast skill data
xt_fs = perform_true_model_integration(Nt=400+100, Ne=512, seed=318)

# discard the spin-up process
xt_fs = xt_fs[100:]

## <span style="color:green"> IV. The baseline model: persistence </span>

In this first test series, we use **persistence** as surrogate model. This will provide baselines for our results later on. Persistence is defined as the model for which there is no time evolution.

<span style="color:red"> Exercise </span>
- <span style="color:blue"> IV.3 & IV.4. </span> Comment the evolution of the forecast errors. 

### <span style="color:blue"> IV.1. Evaluate the model</span>

We evaluate the model using two metrics: the test mean-squared error (MSE) and the forecast skill.

The MSE is the loss function that we will use later to train out surrogate models. The test MSE measures the accuracy of the surrogate model over one iteration, i.e. exactly what it has been trained to do, but using unseen data (the "test data" here). Therefore, the test MSE can be used to validate the efficiency of the learning/training process. NB: the test MSE of persistence is a number whose absolute value is not that important per se (because the input and output data have been normalised) but it will be useful to normalise the test MSE of our trained NNs.

The forecast skill measures the accuracy of the surrogate models, averaged over an ensemble of unseen trajectories (the "forecast skill data" here), i.e. for more than one iteration. This is tipycally the kind of tasks we are interested in. However, one must keep in mind that the surrogate models are (in general) not trained to do that, which means that there is no guarantee of the result. 

In [None]:
# compute test MSE
test_mse_baseline = np.mean(np.square(y_test_norm - x_test_norm))

In [None]:
# compute forecast skill
fs_baseline = np.sqrt(np.mean(np.square(xt_fs-xt_fs[0]), axis=2))

### <span style="color:blue"> IV.2. Print the test MSE</span>

In the following cell, we show the value of the test MSE.

In [None]:
# print test MSE
print('-'*100)
print(f'test mse of persistence = {test_mse_baseline}')
print('-'*100)

### <span style="color:blue"> IV.3. Show an example of surrogate model integration</span>

In the following cell, we show one example of model integration.

In [None]:
# compare the true and surrogate model integration for one trajectory
toolbox.plot_l96_compare_traj(
    xt_fs[:, 0],
    np.broadcast_to(xt_fs[0, 0], shape=xt_fs[:, 0].shape),
    true_model,
    linewidth=18,
)

### <span style="color:blue"> IV.4. Show the forecast skill</span>

In the following cells, we plot the average forecast skill, normalised by the model variability. The shadow delimits the 90% confidence interval (percentiles 5 and 95).

In [None]:
# plot the forecast skill
toolbox.plot_l96_forecast_skill(
    dict(
        persistence=fs_baseline,
    ),
    true_model,
    p1=5,
    p2=95,
    xmax=4,
    linewidth=1000,
)

## <span style="color:green"> V. A dense neural network as surrogate model </span>

In this second test series, we train and evaluate a dense NN (sequential NN with only dense layers). 

<span style="color:red"> Exercise </span>
- <span style="color:blue"> V.1. </span> Implement the `make_sequential_network()` function, in which a 
  sequential neural network is created. The neural network should
  take as input the current state and return the forecasted state.
- <span style="color:blue"> V.1. </span> Comment the number of parameters of the built network.
- <span style="color:blue"> V.3. </span> Implement the `compute_forecast_skill()` function, in which we 
  use a surrogate model to predict the trajectories and then
  compute the forecast skill. Use the `predict()` method of
  `tf.keras.Model` inside a `for-loop`.
- <span style="color:blue"> V.4. & V.5 </span> Comment the evolution of the training and validation MSE, as well as the test MSE.
- <span style="color:blue"> V.6. & V.7 </span> Comment the evolution of the forecast errors.

### <span style="color:blue"> V.1. Build the model</span>

In the following cells, we build the surrogate model, using the [sequential API of tensorflow](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential).

For this example, we use $4$ internal layers and $128$ nodes per layer.

In [None]:
def make_dense_network(seed, num_layers, num_nodes, activation):
    """Build a sequential neural network using dense layers.
    
    Parameters
    ----------
    seed : int
        The random seed.
    num_layers : int
        The number of hidden layers.
    num_nodes : int
        The number of nodes per hidden layer.
    activation : str
        The activation function for the hidden layers.
        
    Returns
    -------
    network : tf.keras.Sequential
    """
    # set seed
    tf.keras.utils.set_random_seed(seed=seed)
    # TODO: create a sequential network
    network = ...
    # TODO: add the input layer
    # TODO: add the hidden layers
    # TODO: add the output layer
    # compile the neural network
    network.compile(loss='mse', optimizer='adam')
    # print short summary
    network.summary()
    # return the network
    return network

In [None]:
# construct the dense neural network
dense_network = make_dense_network(seed=319, num_layers=4, num_nodes=128, activation='relu')

### <span style="color:blue"> V.2. Train the model</span>

In the following cells, we train the model for $256$ epochs. We use an EarlyStopping callback to end the training when the validation loss stops improving. This should avoid overfitting.

In [None]:
def train_network(seed, num_epochs, description, patience, network):
    """Train a neural network.
    
    Parameters
    ----------
    seed : int
        The random seed.
    num_epochs : int
        The number of epochs.
    description : str
        The progress bar description.
    patience : int
        The patience for EarlyStopping.
    network : tf.keras.Model
        The network to train.
    
    Returns
    -------
    history : dict
        The training history.
    """
    # set random seed
    tf.keras.utils.set_random_seed(seed=seed)
    # tqdm callback
    tqdm_callback = toolbox.TQDMCallback(description)
    # early stopping callback
    early_stopping_callback = tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=patience,
        verbose=0,
        restore_best_weights=True,
    )
    # train the ML model
    fit = network.fit(
        x_train_norm, 
        y_train_norm,
        verbose=0,
        epochs=num_epochs, 
        validation_data=(x_valid_norm, y_valid_norm),
        callbacks=[tqdm_callback, early_stopping_callback],
    )
    # return training history
    return fit.history

In [None]:
# train the network
fit_dense = train_network(
    seed=320, 
    num_epochs=256, 
    description='DNN training', 
    patience=16, 
    network=dense_network,
)

### <span style="color:blue"> V.3. Evaluate the model</span>

In [None]:
def compute_trajectories(network):
    """Compute the forecast skill trajectories.
    
    Parameters
    ----------
    network : tf.keras.Model
        The model to evaluate.
        
    Returns
    -------
    xt : np.ndarray, shape (Nt, Ne, Nx)
        The trajectories.
    """
    # allocate memory
    (Nt, Ne, Nx) = xt_fs.shape
    xt = np.zeros((Nt, Ne, Nx))
    
    # initialisation
    xt[0] = xt_fs[0]
    
    # TODO: implement the neural network integration
        
    return xt

In [None]:
# compute test MSE
test_mse_dense = dense_network.evaluate(x_test_norm, y_test_norm, verbose=0, batch_size=x_test_norm.shape[0])

In [None]:
# compute forecast skill
xt_dense = compute_trajectories(dense_network)
fs_dense = np.sqrt(np.mean(np.square(xt_fs-xt_dense), axis=2))

### <span style="color:blue"> V.4. Show the training history</span>

In the following cell we plot the training history, that is, the evolution of the training MSE (the `loss`) and the validation MSE (the `val_loss`) as a function of the number of epochs.

In [None]:
# plot the learning history
toolbox.plot_learning_curve(
    fit_dense['loss'],
    fit_dense['val_loss'],
    title='DNN training',
    linewidth=1000,
)

### <span style="color:blue"> V.5. Print the test MSE</span>

In [None]:
# show test MSE
print('-'*100)
print(f'test mse of persistence  = {test_mse_baseline}')
print(f'test mse of DNN          = {test_mse_dense}')
print()
print(f'relative test mse of DNN = {test_mse_dense/test_mse_baseline}')
print('-'*100)

### <span style="color:blue"> V.6. Show an example of surrogate model integration</span>

In [None]:
# compare the true and surrogate model integration for one trajectory
toolbox.plot_l96_compare_traj(
    xt_fs[:, 0],
    xt_dense[:, 0],
    true_model,
    linewidth=18,
)

### <span style="color:blue"> V.7. Show the forecast skill</span>

In [None]:
# plot the forecast skill
toolbox.plot_l96_forecast_skill(
    dict(
        persistence=fs_baseline,
        DNN=fs_dense,
    ),
    true_model,
    p1=5,
    p2=95,
    xmax=4,
    linewidth=1000,
)

## <span style="color:green"> VI. A convolutional neural network as surrogate model </span>

In this third test series, we train and evaluate a convolutional NN (sequential NN with only convolutional layers). 

<span style="color:red"> Exercise </span>
- <span style="color:blue"> VI.1. </span> Implement the `make_convolutional_network()` function, in which a 
  sequential neural network with convolutional layers is created. Do not forget to add periodic padding layers
  where needed.
- <span style="color:blue"> VI.1. </span> Comment the number of parameters of the built network.
- <span style="color:blue"> VI.4. & VI.5 </span> Comment the evolution of the training and validation MSE, as well as the test MSE.
- <span style="color:blue"> VI.6. & VI.7 </span> Comment the evolution of the forecast errors.

### <span style="color:blue"> VI.1. Build the model</span>

In the following cells, we build the surrogate model.

For this example, we use $4$ internal layers and $8$ convolutional filters per layer. The kernel size is set to $5$.

In [None]:
def make_convolutional_network(seed, num_layers, num_filters, kernel_size, activation):
    """Build a sequential neural network with convolutional layers.
    
    Parameters
    ----------
    seed : int
        The random seed.
    num_layers : int
        The number of hidden layers.
    num_filters : int
        The number of convolution filters per hidden layer.
    kernel_size : int
        The convolution kernel size for the hidden layer.
    activation : str
        The activation function for the hidden layers.
        
    Returns
    -------
    network : tf.keras.Sequential
    """
    # set seed
    tf.keras.utils.set_random_seed(seed=seed)
    # reshape layers
    reshape_input = tf.keras.layers.Reshape((true_model.Nx, 1))
    reshape_output = tf.keras.layers.Reshape((true_model.Nx,))
    # padding layer
    border = kernel_size//2
    def apply_padding(x):
        x_left = x[..., -border:, :]
        x_right = x[..., :border, :]
        return tf.concat([x_left, x, x_right], axis=-2)
    padding_layer = tf.keras.layers.Lambda(apply_padding)   
    # TODO: create a sequential network
    network = ...
    # TODO: add the input layer
    # TODO: add the reshape_input layer
    # TODO: add the hidden layers
    # TODO: add the output layer
    # TODO: add the reshape_output layer
    # compile the neural network
    network.compile(loss='mse', optimizer='adam')
    # print short summary
    network.summary()
    # return the network
    return network

In [None]:
# construct the conv neural network
conv_network = make_convolutional_network(seed=321, num_layers=4, num_filters=8, kernel_size=5, activation='relu')

### <span style="color:blue"> VI.2. Train the model</span>

In [None]:
# train the network
fit_conv = train_network(
    seed=322, 
    num_epochs=256, 
    description='CNN training', 
    patience=8, 
    network=conv_network,
)

### <span style="color:blue"> VI.3. Evaluate the model</span>

In [None]:
# compute test MSE
test_mse_conv = conv_network.evaluate(x_test_norm, y_test_norm, verbose=0, batch_size=x_test_norm.shape[0])

In [None]:
# compute forecast skill
xt_conv = compute_trajectories(conv_network)
fs_conv = np.sqrt(np.mean(np.square(xt_fs-xt_conv), axis=2))

### <span style="color:blue"> VI.4. Show the training history</span>

In [None]:
# plot the learning history
toolbox.plot_learning_curve(
    fit_conv['loss'],
    fit_conv['val_loss'],
    title='CNN training',
    linewidth=1000,
)

### <span style="color:blue"> VI.5. Print the test MSE</span>

In [None]:
# print test MSE
print('-'*100)
print(f'test mse of persistence  = {test_mse_baseline}')
print(f'test mse of DNN          = {test_mse_dense}')
print(f'test mse of CNN          = {test_mse_conv}')
print()
print(f'relative test mse of DNN = {test_mse_dense/test_mse_baseline}')
print(f'relative test mse of CNN = {test_mse_conv/test_mse_baseline}')
print('-'*100)

### <span style="color:blue"> VI.6. Show an example of surrogate model integration</span>

In [None]:
# compare the true and surrogate model integration for one trajectory
toolbox.plot_l96_compare_traj(
    xt_fs[:, 0],
    xt_conv[:, 0],
    true_model,
    linewidth=18,
)

### <span style="color:blue"> VI.7. Show the forecast skill</span>

In [None]:
# plot the forecast skill
toolbox.plot_l96_forecast_skill(
    dict(
        persistence=fs_baseline,
        DNN=fs_dense,
        CNN=fs_conv,
    ),
    true_model,
    p1=5,
    p2=95,
    xmax=10,
    linewidth=1000,
)

## <span style="color:green"> VII. A smart neural network as surrogate model </span>

In this third and last test series, we train and evaluate a smart NN. This NN uses a sparse architecture with convolutional NN and controlled nonlinearity to reproduce the **model tendencies**, as well as a Runge-Kutta integration scheme to **emulate the dynamics**. In order to implement this NN, we use both the [functional API](https://www.tensorflow.org/guide/keras/functional) (for the model tendency) and the [subclassing API](https://www.tensorflow.org/guide/keras/custom_layers_and_models) (for the integration scheme) of tensorflow.

In this case, with well-chosen parameters it is possible to reproduce the true dynamics up to machine precision: the model is said to be identifiable.

<span style="color:red"> Exercise </span>
- <span style="color:blue"> VII.1. </span> Comment the number of parameters of the built network.
- <span style="color:blue"> VII.4. & VII.5 </span> Comment the evolution of the training and validation MSE, as well as the test MSE.
- <span style="color:blue"> VII.6. & VII.7 </span> Comment the evolution of the forecast errors.

### <span style="color:blue"> VII.1. Build the model</span>

In [None]:
class SmartNetwork(tf.keras.Model):
    """Smart neural network for the Lorenz 1996 model.
    
    Attributes
    ----------
    dt : float
        The integration time step.
    tendency : tf.keras.Model
        The network to compute the tendencies.
    """
    
    def __init__(self, num_filters, kernel_size, dt=0.05, **kwargs):
        """Initialise the smart network.
        
        Parameters
        ----------
        num_filters : int
            Number of filters to use in the convolutional layer.
        kernel_size : int
            Size of the convolutional kernel.
        dt : float
            Integration time step.
        kwargs : dict
            Additional parameters forwarded to `tf.keras.Model.__init__()`.
        """
        super().__init__(**kwargs)
        self.dt = dt
        
        # reshape layers
        reshape_input = tf.keras.layers.Reshape((true_model.Nx, 1))
        reshape_output = tf.keras.layers.Reshape((true_model.Nx,))
        
        # padding layer
        border = kernel_size//2
        def apply_padding(x):
            x_left = x[..., -border:, :]
            x_right = x[..., :border, :]
            return tf.concat([x_left, x, x_right], axis=-2)
        padding_layer = tf.keras.layers.Lambda(apply_padding)
        
        # convolutional layers
        conv_layer_1 = tf.keras.layers.Conv1D(num_filters, kernel_size)
        conv_layer_2 = tf.keras.layers.Conv1D(1, 1)
        
        # network for the model tendencies
        x_in = tf.keras.Input(shape=(true_model.Nx,))
        # reshape the input to be able to use convolutional layers
        x = reshape_input(x_in)
        # apply convolution with periodic padding
        x = padding_layer(x)
        x1 = conv_layer_1(x)
        # construct non-linear terms
        x2 = x1 * x1
        # concatenate linear and non-linear terms
        x3 = tf.concat([x1, x2], axis=-1)
        # combine all channels into one
        # there is no actual convolution here 
        # because the kernel_size is one for this layer
        x_out = conv_layer_2(x3)
        # reshape the output after the convolutional layers
        x_out = reshape_output(x_out)
        # pack everything into a tf.keras.Model
        self.tendency = tf.keras.Model(inputs=x_in, outputs=x_out)
    
    @tf.function
    def call(self, x):
        """Apply the network."""
        dx_dt_0 = self.tendency(x)
        dx_dt_1 = self.tendency(x+0.5*self.dt*dx_dt_0)
        dx_dt_2 = self.tendency(x+0.5*self.dt*dx_dt_1)
        dx_dt_3 = self.tendency(x+self.dt*dx_dt_2)
        dx_dt =  (dx_dt_0 + 2*dx_dt_1 + 2*dx_dt_2 + dx_dt_3)/6
        return x + self.dt*dx_dt
    
def make_smart_network(seed, num_filters, kernel_size):
    """Build a sequential neural network.
    
    Parameters
    ----------
    seed : int
        The random seed.
    num_filters : int
        The number of filters.
    kernel_size : int
        The convolution kernel.
        
    Returns
    -------
    network : SmartNetwork
        The smart network.
    """
    # set seed
    tf.keras.utils.set_random_seed(seed=seed)
    # create the network
    network = SmartNetwork(
        num_filters=num_filters, 
        kernel_size=kernel_size, 
        dt=true_model.dt,
    )
    # compile the neural network
    network.compile(loss='mse', optimizer='adam')
    # print short summary
    network.tendency.summary()
    # return the network
    return network

In [None]:
# construct the smart neural network
smart_network = make_smart_network(seed=323, num_filters=6, kernel_size=5)

### <span style="color:blue"> VII.2. Train the model</span>

In [None]:
# train the network
fit_smart = train_network(
    seed=324, 
    num_epochs=128, 
    description='smart NN training', 
    patience=8, 
    network=smart_network,
)

### <span style="color:blue"> VII.3. Evaluate the model</span>

In [None]:
# compute test MSE
test_mse_smart = smart_network.evaluate(x_test_norm, y_test_norm, verbose=0, batch_size=x_test_norm.shape[0])

In [None]:
# compute forecast skill
xt_smart = compute_trajectories(smart_network)
fs_smart = np.sqrt(np.mean(np.square(xt_fs-xt_smart), axis=2))

### <span style="color:blue"> VII.4. Show the training history</span>

In [None]:
# plot the learning history
toolbox.plot_learning_curve(
    fit_smart['loss'],
    fit_smart['val_loss'],
    title='Smart network training',
    linewidth=1000,
)

### <span style="color:blue"> VII.5. Print the test MSE</span>

In [None]:
# print test MSE
print('-'*100)
print(f'test mse of persistence        = {test_mse_baseline}')
print(f'test mse of DNN                = {test_mse_dense}')
print(f'test mse of CNN                = {test_mse_conv}')
print(f'test mse of smart net          = {test_mse_smart}')
print()
print(f'relative test mse of DNN       = {test_mse_dense/test_mse_baseline}')
print(f'relative test mse of CNN       = {test_mse_conv/test_mse_baseline}')
print(f'relative test mse of smart net = {test_mse_smart/test_mse_baseline}')
print('-'*100)

### <span style="color:blue"> VI.6. Show an example of surrogate model integration</span>

In [None]:
# compare the true and surrogate model integration for one trajectory
toolbox.plot_l96_compare_traj(
    xt_fs[:, 0],
    xt_smart[:, 0],
    true_model,
    linewidth=18,
)

### <span style="color:blue"> VI.7. Show the forecast skill</span>

In [None]:
# plot the forecast skill
toolbox.plot_l96_forecast_skill(
    dict(
        persistence=fs_baseline,
        DNN=fs_dense,
        CNN=fs_conv,
        smart=fs_smart,
    ),
    true_model,
    p1=5,
    p2=95,
    xmax=20,
    linewidth=1000,
)