# Train DNN

This file constructs a DNN and trains it to approxmiate the solution of the NR-Solver for the state equations.

### Details on the implementation:
Some grid states are fixed a priori and excluded from the DNN approximation. These are:
- End of line temperatures for active edges
- Nodal temperatures for nodes, whose only inflow is given by an active edge
- pressures at one node in the supply side and return side with fixed pressure levels

This reduces the number of "latent" outputs of the DNN with respect to the number of grid states.
The mapping of the latent outputs to the corresponding state in the full grid state vector is encoded in a binary
mapping Matrix in the state_equations_object (SE.mask_matrix_full)
For the DNN we combine this matrix with the variance of the outputs into a scaling layer for the outputs.
This allows us, to calculate all losses on the non-normalised values.
Similarly, we encode the input-scaling as a additional layer into the DNN. This allows the MCMC algorithm to run
directly in the space of heat exchanges.

Additional Details of the DNN:
- Loss: weighted MSE: L = $\frac{1}{N} \Sigma_{i}  (\lambda_{i} (x_i - \hat{x}_i))^2 $;
$\lambda_{i} = 500$ if $x_i$ is a mass flow;  else $\lambda_{i} = 1$.
- Use Early stopping: monitor loss on validation set and stop training with a patience of 20 epochs.
- Monitor MAPE for each type of state variable individually $(T, \dot{mf}, p,  T^{end})$

### Settings:

In [21]:
testcase='loop'      # either 'loop' or 'tree'
save_model = False    # whether to save the final DNN
n_train = 50_000     # number of training samples
n_epochs = 5       # maximum number of trainin epochs

### Imports

In [22]:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping

import libdnnmcmc.se_NN_lib as NN
from libdnnmcmc.utility import import_training_data, load_scenario
tfd = tfp.distributions

### Load Setup:
import the testcase specific settings:
- SE: State equations object, bundling all state equations and supporting information.
- _: Virtual measurement, used to model the measurement during MCMC sampling, not needed here.
- d_prior_dist: prior distribution over the heat exchanger powers.
- data_file: file path to the training data

In [23]:
SE, _, d_prior_dist, data_file = load_scenario(testcase)



### Load training Data

In [24]:
n_train_val = int(n_train / 0.8)    # during training 20% of the data-points are used as validation for early stopping
x_data, y_data = import_training_data(data_file, n_train_val)
x_train = x_data[:n_train, :]
y_train = y_data[:n_train, :]
x_val = x_data[n_train:, :]
y_val = y_data[n_train:, :]

### Encode Data scaling as NN-layers

- mask_matrix: Encodes the mapping between the DNN latent outputs space and the full grid state space.
- inputs scaling: min-max scaling, minimal power set to 0, max power gathered from training data.
- output scaling: estimate output std using the linear model.
- Encode input and output scaling in fixed layers

In [25]:
mask_matrix = SE.mask_matrix_full
n_demands = SE.n_demands
n_nodes = SE.n_nodes
n_edges = SE.n_edges

# linear scaling parameters for inputs - scale to 0-1
input_scale = tf.constant(1 / np.max(x_train, axis=0), dtype=tf.float64)
input_offset = tf.zeros_like(input_scale)
layer_input_scale = NN.MyScalingLayer(offset=tf.expand_dims(input_offset, axis=-1),
                                      scaling=input_scale,
                                      # mapping_matrix=tf.sparse.eye(n_demands),
                                      name='downscaled')

# linear scaling parameters outputs - scale to zero mean ~ unit variance based on linearisation model
prior_state_mean = tf.concat([SE.T, SE.mf, SE.p, SE.T_end], axis=0)
jac = SE.evaluate_state_equations('demand jacobian')
prior_state_cov = jac @ d_prior_dist.covariance() @ tf.transpose(jac)
output_scale = tf.math.sqrt(tf.math.sqrt(tf.linalg.diag_part(prior_state_cov)))
layer_output_scale = NN.MyScalingLayer(offset=prior_state_mean,
                                       scaling=output_scale,
                                       mapping_matrix=SE.mask_matrix_full,
                                       name='upscaled')

## Build the DNN

In [26]:
''' DNN-Model: '''
n_latent_outputs = tf.shape(mask_matrix)[1]

inputs = keras.Input(shape=(n_demands,))
x = layer_input_scale(inputs)
x = layers.Dense(100, activation='relu')(x)
x = layers.Dense(250, activation='relu')(x)
x = layers.Dense(250, activation='relu')(x)
x = layers.Dense(n_latent_outputs, activation='linear', name='unscaled')(x)
pred_states = layer_output_scale(x)
model = keras.Model(inputs=inputs, outputs=[pred_states], name='DtoS_model')
model.summary()

Model: "DtoS_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 4)]               0         
                                                                 
 downscaled (MyScalingLayer)  (None, 4)                8         
                                                                 
 dense_6 (Dense)             (None, 100)               500       
                                                                 
 dense_7 (Dense)             (None, 250)               25250     
                                                                 
 dense_8 (Dense)             (None, 250)               62750     
                                                                 
 unscaled (Dense)            (None, 70)                17570     
                                                                 
 upscaled (MyScalingLayer)   (None, 82)                1

The Non-trainable params are the params in the layers _downscaled_ and _upscaled_.<br />
The number of params in _downscaled_ is given by 2 times the number of demands.<br />
The number of params in _upscaled_ is given by 2 times the number of grid-state-variables which are not apriori fixed. <br />
The number o trainable parameters in the other layers result from the chosen layer size

In [27]:
''' Loss functions: '''
# Weighted Mean squared error:
L_WMSE = NN.LossWeightedMSE(n_nodes, n_edges, lambda_T=1, lambda_mf=5.e2, lambda_p=1, lambda_Tend=1)

# '''
model.compile(
    loss=L_WMSE,
    optimizer=keras.optimizers.Adam(),
    metrics={'upscaled': [keras.metrics.MeanAbsolutePercentageError(),
                          NN.MetricMAPE_T(n_nodes, n_edges),
                          NN.MetricMAPE_mf(n_nodes, n_edges),
                          NN.MetricMAPE_p(n_nodes, n_edges),
                          NN.MetricMAPE_Tend(n_nodes, n_edges)
                          ]})

early_stopping_callback = EarlyStopping(
    monitor='val_loss',
    patience=20,
    mode='min',
    restore_best_weights=True
)

## Train the DNN

In [28]:
history = model.fit(x_train, y_train, validation_data=(x_val, y_val),
                    batch_size=32, epochs=n_epochs,
                    callbacks=[early_stopping_callback])

Epoch 1/5
Epoch 2/5

KeyboardInterrupt: 

In [None]:
# plot the errors during training
NN.plot_history(history)

## save the model:

In [None]:
if save_model:
    path = f'./models/model_paper_{testcase}'
    model.save(path)