# Pretraining Neural Network

This is the main program structure that calls the sub functions and explains what happens in the code. The purpose of this program is to create the neural network model and train the model by fitting it to the given data. The trained model predicts based on the data given as an input and the target parameter used during the training. 

To be more specific, the example case in our project predicts the plywood sheet shrinkage. We used the tensorflow keras regression model for the task.

### Importing the Neural network program

This line imports the Neural_Network.py file that contains the following functions:
- `get_database_data`
- `create_and_fit`
- `model_stat`

In [1]:
from Pretraining.Neural_Network import *

**With `help`-funtion you can view how to use the functions and which gives more detailed information on the parameters you can change:**

```python
# Example code:
help(get_database_data)
```

#### Getting data with `get_database_data` function

The next line calls the function get_database_data which fetches the data from the database and creates the dataframe named df_features. This function also calls couple of additional programs: ssh_connector.py and chuncker.py.

The database connection requires correct connection configurations. These can be changed in the `variables.env` file.

For more information about these programs and the `get_database_data` parameters use the `help`-function.

In [2]:
df_features = get_database_data(env_path="./variables.env", table_name="Preprocessed3", ssh=True, columns_to_drop=['peelFile','m_dThickness','traindevtest','dryFile'])

In [3]:
# Content of the dataframe:
df_features

Unnamed: 0,m_uWidth,m_uLength,B1MoistureAvg,B1TemperatureAvg,B1DensityAvg,B1KnotWidthSum,B1KnotCount,B1DecayWidthSum,B1DecayCount,B1AllOtherDefectWidthSum,...,B9DensityAvg,B9KnotWidthSum,B9KnotCount,B9DecayWidthSum,B9DecayCount,B9AllOtherDefectWidthSum,B9AllOtherDefectCount,dryMoisturePercentage,dryWidth,dryShrinkage
0,1750.23265,1599.76152,113.903722,33.013574,997.511787,125.39330,11,26.9537,2,0.00000,...,1028.177020,157.62055,19,39.25865,2,16.99255,2,8.063503,1633.089727,0.933070
1,1750.23265,1599.18000,100.958000,34.138437,960.649786,0.00000,0,0.0000,0,21.09420,...,956.783924,12.89090,1,0.00000,0,7.03140,1,1.762412,1609.685322,0.919698
2,1750.23265,1601.50608,87.856679,35.469453,826.694515,21.68015,2,0.0000,0,0.00000,...,831.718097,49.21980,3,0.00000,0,16.40660,4,0.619572,1614.120061,0.922232
3,1750.23265,1600.34304,119.887972,35.878984,946.037900,22.26610,3,0.0000,0,0.00000,...,962.829528,15.82065,3,0.00000,0,450.00960,1,0.682885,1620.398997,0.925819
4,1750.23265,1600.92456,87.329798,36.486844,848.591452,35.15700,5,0.0000,0,0.00000,...,849.948758,40.43055,5,0.00000,0,0.00000,0,0.649544,1601.162675,0.914828
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
132,1750.23265,1607.32128,84.348092,27.585250,824.974558,60.93880,3,0.0000,0,26.36775,...,844.168201,44.53220,4,0.00000,0,0.00000,0,1.397261,1606.290275,0.917758
133,1750.23265,1600.92456,91.102849,28.084896,866.985357,65.04045,10,0.0000,0,0.00000,...,886.793473,87.30655,9,68.55615,2,0.00000,0,1.308987,1628.156770,0.930252
134,1750.23265,1601.50608,96.572480,36.125411,955.628195,56.25120,7,0.0000,0,0.00000,...,952.507358,67.97020,9,0.00000,0,0.00000,0,4.549444,1602.033861,0.915326
135,1750.23265,1601.50608,89.173291,35.613887,914.581487,96.68175,11,0.0000,0,0.00000,...,918.198549,79.10325,9,94.92390,7,0.00000,0,4.294441,1594.211502,0.910857


### Parameter setting selection

**Use the parameters in the next section to adjust:**

- **Data related parameters:** Select the target parameter, define the size of the test-set and select the normalization type.
- **Model creation parameters:** Define the number of input nodes, hidden layer nodes and output nodes. Set the activation function (typically relu is used) and define the size of the dropout (e.g. None, 0.2, 0.5) to prevent potential overfitting.
- **Model fitting parameters:** Define the amount of training epochs, [loss function](https://keras.io/api/losses/regression_losses/), [optimizer](https://keras.io/api/optimizers/#available-optimizers), learning rate (lr), [metrics](https://keras.io/api/metrics/) to be reported in the epoch results and decide if you want the predicted results and the true values to be shown after each epoch.
- **Model saving parameters:** Define if you want the model to be saved and where it will be saved and in which format (SavedModel folder or .h5 file). This will be the model that will be used later to future training with larger dataset. So after saving you need to move this to the actual server which will do the future training using the web user interface.
- **Model stats saving parameters:** Define if you want the training model statistics to be saved and where to save the .csv file.


In [5]:
# Data related parameters:
df_target='dryShrinkage'
test_size=0.2
normalization='standard' # 'standard' or 'minmax'

# Model creation parameters:
inputs = 84 # df_features.shape[1]
hidden_sizes = [256,128,64,32,16]
outputs = 1
activation = 'relu'
dropout = 0.2 # None or float

# Model fitting parameters:
EPOCHS = 5
loss = 'mse'
optimizer = tf.keras.optimizers.Adam
lr = 0.001
metrics = ['mae', 'mse', 'mape'] # Optional custom functions: distance_from_true, precentage_from_true
with_predictions = False

# Model saving parameters:
save_model = False
path='./models/SavedModel'

# Model stats saving parameters:
save_to_csv=True
csv_path='./logs/'


# Do Not Touch This One: This variable stores the previous parameter settings. The order of the parameters is critical.
parameters=(df_features, df_target, test_size, normalization, inputs, hidden_sizes, outputs, activation, EPOCHS, loss, optimizer, lr, metrics, dropout, with_predictions, save_model, path, csv_path)

### Creating the model and training the neural network

The next line calls the function `create_and_fit` which creates the model and fits the model to the data during the training of the neural network. It returns the values for `model`, `history`, `train`, `test` and `model_name`. For more information use the `help`-function.

`%%time` tracks the duration of the processing time of the create_and_fit function.

In [None]:
%%time
model, history, train, test, model_name, y_train, y_test, train_norm, test_norm = create_and_fit(*parameters)

Train set size: (109, 84) / Train sets target size: (109,)
Test set size: (28, 84) / Test sets target size: (28,) 

Scaling done. Next is model creation

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 256)               21760     
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 128)               32896     
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense_2 (Dense)             (None, 64)                8256      
                                                                 
 dropout_2 (Dropout)         (None

### Saving and creating model_stats csv

The next line calls the function `model_stat` which creates the dataframe that summarizes the essential information of the model structure and the training parameters used during the training of the neural network. It also saves the model statistics dataframe into a .csv file if so configured in the parameter settings.

In [8]:
stats = model_stat(model, history, df_features, df_target, test_size, normalization, inputs, hidden_sizes, outputs, activation, EPOCHS, loss, optimizer, lr, metrics, dropout, train, test, csv_path, save_to_csv, model_name)

In [9]:
# Content of the model_stat:
stats

Unnamed: 0,0
Model structure,<keras.engine.sequential.Sequential object at ...
Inputs,84
Hidden layers,"[256, 128, 64, 32, 16]"
Output,1
Activation,relu
Optimizer,<class 'keras.optimizer_v2.adam.Adam'>
Learning rate,0.001
Normalization,standard
Loss fuction,mse
Epochs,5


### Log file of the epoch results

The next line reads the .csv file created as a result of the `create_and_fit` function.

In [11]:
logs = pd.read_csv('./logs/2021-12-09 11:40:25.497703-log.csv')

In [12]:
# Content of the training epoch results log file
logs

Unnamed: 0,epoch,loss,mae,mape,mse,val_loss,val_mae,val_mape,val_mse
0,0,1.040649,0.988321,107.26075,1.040649,0.516075,0.712334,76.695084,0.516075
1,1,0.533641,0.66986,72.59552,0.533641,0.207295,0.424754,45.74892,0.207295
2,2,0.282733,0.468134,50.719929,0.282733,0.079149,0.233447,25.081427,0.079149
3,3,0.26795,0.408378,44.357079,0.26795,0.055446,0.192614,20.659218,0.055446
4,4,0.257489,0.383784,41.797688,0.257489,0.067559,0.224276,24.105291,0.067559
