## Sprint 10 - updated version

After getting a feedback, we applied suggested fixes for sprint 10 deep learning part:

- Fit two separate scalers:  
one over the entire input data and  
one over the entire output data


After these changes were applied, we executed experiments again. The following parameters and options were tested:

* batch_sizes  - 16, 32, 64, 128
* epochs  -100, 500, 1000
* activations - 'relu', 'selu', 'tanh'
* architecture (neurons in each layer) - 64; 128; 256; (64, 32); (128, 64)

# Machine learning research section

## Adam optimizer

The best result with automated tests and Adam optimizer was a model with 2 hidden layers, first layer having 128 neurons and second - 64, the activation function ReLU and a learning rate of 1e-4. The batch size in the best result is 128 and 1000 epochs. This resulted in the mean absolute error being 0.0397 and mean squared error being 0.0031.



## Stochastic gradient descent optimizer

The best result with automated tests and SGD optimizer was a model with 2 hidden layers, first layer having 128 neurons and second - 64, the activation function Tanh and a learning rate of 1e-4. The batch size in the best result is 32 and 1000 epochs. This resulted in the mean absolute error being 0.0398 and mean squared error being 0.0032.






## Nesterov Adam optimizer

The best result with automated tests and Nadam optimizer was a model with 2 hidden layers, first layer having 64 neurons and second - 32, the activation function ReLu and a learning rate of 1e-4. The batch size in the best result is 128 and 100 epochs. This resulted in the mean absolute error being 0.0422 and mean squared error being 0.0035.




From the automated tests it seems the the best architecture and the best parameters for the model is:
- Adam optimizer (*Adam(learning_rate = 1e-4)*)
- fitted with batch size of 128, 1000 epochs
- 2 hidden layers, 128 and 64 neurons with the ReLU activation function

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import glob
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.utils import shuffle

In [None]:
#constants
num_of_inputs = 26
num_of_outputs = 26

In [None]:
data = pd.DataFrame()
for files in glob.glob('*_merged.csv'):
  d = pd.read_csv(files)
  data = pd.concat([data,d],axis=0)
data.shape

(23987, 96)

In [None]:
data.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Frame,Pose,Pose_Score,Nose_score,Nose_X_Coord,Nose_Y_Coord,LeftEye_score,LeftEye_X_Coord,LeftEye_Y_Coord,RightEye_score,RightEye_X_Coord,RightEye_Y_Coord,LeftEar_score,LeftEar_X_Coord,LeftEar_Y_Coord,RightEar_score,RightEar_X_Coord,RightEar_Y_Coord,LeftShoulder_score,LeftShoulder_X_Coord,LeftShoulder_Y_Coord,RightShoulder_score,RightShoulder_X_Coord,RightShoulder_Y_Coord,LeftElbow_score,LeftElbow_X_Coord,LeftElbow_Y_Coord,RightElbow_score,RightElbow_X_Coord,RightElbow_Y_Coord,LeftWrist_score,LeftWrist_X_Coord,LeftWrist_Y_Coord,RightWrist_score,RightWrist_X_Coord,RightWrist_Y_Coord,LeftHip_score,LeftHip_X_Coord,...,FrameNo,head_x,head_y,head_z,left_shoulder_x,left_shoulder_y,left_shoulder_z,left_elbow_x,left_elbow_y,left_elbow_z,right_shoulder_x,right_shoulder_y,right_shoulder_z,right_elbow_x,right_elbow_y,right_elbow_z,left_hand_x,left_hand_y,left_hand_z,right_hand_x,right_hand_y,right_hand_z,left_hip_x,left_hip_y,left_hip_z,right_hip_x,right_hip_y,right_hip_z,left_knee_x,left_knee_y,left_knee_z,right_knee_x,right_knee_y,right_knee_z,left_foot_x,left_foot_y,left_foot_z,right_foot_x,right_foot_y,right_foot_z
0,0,26,27,0,0.655831,0.993477,273.007834,1009.915685,0.988301,254.240474,1024.839597,0.981556,255.661543,1000.727941,0.731087,271.439195,1041.418666,0.385952,276.141888,976.865513,0.779818,338.016955,1079.338461,0.567354,349.80008,962.226168,0.575574,240.126619,1137.927764,0.560392,243.300596,901.982512,0.13135,105.446721,1139.219205,0.517373,103.555565,874.089284,0.231592,587.716132,...,27,0.013125,0.76769,0.016975,-0.1344,0.55792,0.026116,-0.23058,0.80156,-0.039267,0.15317,0.54221,0.019385,0.25349,0.78255,-0.034044,-0.25871,1.01,-0.098156,0.27352,1.0008,-0.10205,-0.064062,0.048522,-0.03546,0.083012,0.045385,-0.040149,-0.11567,-0.3627,-0.049812,0.11445,-0.38705,-0.032298,-0.12462,-0.7339,-0.049147,0.11816,-0.73437,-0.05849
1,1,27,28,0,0.613827,0.993368,272.870526,1010.081804,0.988391,254.171676,1024.759945,0.982182,255.689709,1000.377692,0.734645,270.953928,1041.859123,0.380509,276.037587,976.845164,0.777726,338.242424,1079.14292,0.554936,350.733095,961.744201,0.572586,240.081118,1137.49557,0.543105,242.8832,901.567692,0.136428,104.504332,1139.397962,0.504939,103.515364,874.223371,0.206954,589.144867,...,28,0.013139,0.76703,0.016671,-0.13439,0.55792,0.02587,-0.23022,0.80156,-0.039252,0.15321,0.54231,0.019179,0.25362,0.78132,-0.034038,-0.25727,1.01,-0.098092,0.27432,1.0008,-0.10202,-0.063768,0.048384,-0.035447,0.083212,0.045353,-0.04014,-0.1152,-0.36424,-0.051272,0.11472,-0.38644,-0.03354,-0.12461,-0.73518,-0.049845,0.11794,-0.73534,-0.058871
2,2,28,29,0,0.616458,0.992321,272.446694,1009.098878,0.988987,253.733421,1023.635307,0.982231,255.291728,1000.09243,0.757978,270.887374,1040.81426,0.395599,274.858078,977.857769,0.767602,337.309908,1079.511514,0.573403,350.596995,962.638706,0.586219,238.813969,1138.983025,0.544667,243.702899,902.284827,0.03923,93.306244,1137.526624,0.456074,101.548887,875.156868,0.224446,588.449288,...,29,0.013177,0.76616,0.015813,-0.13397,0.55757,0.025667,-0.22984,0.80156,-0.039234,0.15334,0.54155,0.018876,0.25476,0.78061,-0.034037,-0.25589,1.0091,-0.098032,0.27542,1.0008,-0.10197,-0.063421,0.047696,-0.035423,0.083461,0.044645,-0.040129,-0.11481,-0.36576,-0.053177,0.11521,-0.38635,-0.035453,-0.12471,-0.73625,-0.051154,0.11778,-0.73722,-0.059684
3,3,29,30,0,0.626826,0.99237,272.878498,1008.953404,0.988711,254.000736,1023.68853,0.981702,255.377566,999.837624,0.774546,270.621712,1040.393927,0.391449,273.927825,978.350099,0.768187,336.862481,1079.437516,0.562235,350.707741,962.317466,0.566379,241.677129,1137.541196,0.561406,243.707463,901.252746,0.106802,102.554181,1137.392365,0.471692,101.467682,873.330147,0.237681,589.192264,...,30,0.013234,0.76524,0.014532,-0.13355,0.55695,0.025575,-0.22928,0.80127,-0.039421,0.15351,0.54063,0.018614,0.2561,0.77736,-0.033977,-0.25448,1.0076,-0.097969,0.27725,1.0002,-0.10189,-0.062726,0.046813,-0.034977,0.084082,0.043731,-0.040102,-0.11462,-0.36651,-0.056456,0.11606,-0.38678,-0.039129,-0.12476,-0.73736,-0.053214,0.11782,-0.73823,-0.060639
4,4,30,31,0,0.656187,0.992199,272.552117,1008.530946,0.989474,253.512955,1023.707535,0.982186,255.120958,999.857743,0.786832,270.719317,1040.153683,0.376655,274.050199,977.895391,0.771492,337.537906,1079.643322,0.555928,350.230158,962.086911,0.592517,240.537102,1137.865899,0.563793,243.040456,901.359304,0.551228,114.411065,1135.57596,0.511387,114.932842,867.126963,0.219999,589.392992,...,31,0.013457,0.76193,0.012078,-0.13312,0.55481,0.025396,-0.22732,0.79641,-0.040943,0.15394,0.53958,0.018403,0.25797,0.77305,-0.033895,-0.25208,1.0054,-0.10071,0.28009,0.99628,-0.10214,-0.062056,0.04516,-0.033615,0.085121,0.042273,-0.039195,-0.11417,-0.36754,-0.064164,0.1194,-0.38693,-0.046703,-0.12462,-0.73756,-0.056391,0.11798,-0.73823,-0.061967


In [None]:
# DATA PREPROCESSING
# Remove frame, score, z and unnamed columns
data = data.loc[:, ~data.columns.str.contains('^Unnamed|^Frame|^Pose|Eye|Ear|_z|_score|_Score')]
# Randomize data
shuffled_data = shuffle(data, random_state=42)
# Split into inputs and labels
raw_features = shuffled_data.iloc[:, 0:num_of_inputs]
raw_labels = shuffled_data.drop(raw_features.columns, axis=1)
# Scale data
# Scale features and labels
feature_scaler = MinMaxScaler()
label_scaler = MinMaxScaler()
features = feature_scaler.fit_transform(raw_features)
labels = label_scaler.fit_transform(raw_labels)

In [None]:
# Split data into train, tes and validation sets
X_train, Xval, y_train, yval = train_test_split(features, labels, test_size=0.3)
X_val, X_test, y_val, y_test = train_test_split(Xval, yval, test_size=0.333)

In [None]:
print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(16790, 26)
(4800, 26)
(2397, 26)


In [None]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(26,)))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(26))

In [None]:
model.compile(loss='mean_absolute_error', optimizer=tf.keras.optimizers.Adam(learning_rate = 1e-4), metrics = ['mse'])

In [None]:
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=6)
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=1000, verbose=0, batch_size=128, callbacks=[es])

<tensorflow.python.keras.callbacks.History at 0x7fec47104c50>

In [None]:
# Test data evaluation
model.evaluate(X_test, y_test, batch_size=128, verbose=2)

19/19 - 0s - loss: 0.0493 - mse: 0.0047


[0.04928087443113327, 0.0047427089884877205]

# Software development section

## PoseNet to Kinect Pipeline

1. Designed pipeline based on the same software architecture as the one for Kinect.

ML Pipelines description for `Kinect` and `PoseNet to Kinect` models:

Deep learning pipeline to simplify model training, testing and deployment for conducting the inferance. The pipeline consists of the following modules:
* Data creator - Reads the data from the given folder. Processes it into separate datasets in order to train and test the model.
* Model - Serves as a wrapper for the model architecture to simplify the model compilation, saving, serving.
* Trainer - Defines the pipeline for the model training. Integrates Model, Data creator with the configurate hyperparameters to start the training. Includes early stopping callback to stop the traininng session if no loss decrease was made in the given N epochs. Aslo contains the Tensorboard callback for interactive visualization of optimization process and logs saving.
* Estimator - the module to serve the model and conduct the inferance on a given set of data
* Config - Includes multi-level fancy dictionary for project configuration.

2. Extended the pipeline architecture for both models:
* Implemented scalers for input and output of the model data. Created save/load pipeline for scalers data creator and estimator.
* Added test split option
* Fixed the dataset being partly fixed
* Added configs for both models individually
3. Code clean up

### Adam optimizer results

- Before Feedback:  

All tests were executed varying the following parameters:

- batch sizes: 16, 32, 64
- epochs: 20, 100, 500, 1000
- activation functions: ReLU, SELU, tanh
- learning rate for optimizer: varies between 3e-4 and 3e-6

Each parameter combination was tested 5 times and an average of the test results was taken.

The following neural network architectures were tested:
1. 1 hidden layer with 32 neurons, best result: mae= 0.052521, mse= 0.005415, batch size= 64, epochs= 100, activation= relu
2. 1 hidden layer with 64 neurons, best result: mae= 0.055399, mse= 0.005902, batch size= 64, epochs= 100, activation= relu
3. 1 hidden layer with 128 neurons, best result: mae= 0.06783, mse= 0.007983, batch size= 64, epochs= 100, activation= relu
4. 2 hidden layers with 64 and 32 neurons, best result: mae= 0.050266, mse= 0.00486, batch size= 64, epochs= 100, activation= relu
5. **2 hidden layers with 128 and 64 neurons, best result: mae= 0.049150, mse= 0.004465, batch size= 64, epochs= 100, activation= relu**  

- After Feedback:  
Adam(learning_rate = 1e-4)


| Neurons | Activation function | Epochs | Batch size | Results on test set|
|---------|---------------------|--------|------------|--------------------|
64, None | relu | 100 | 16 |  mae (loss): 0.0563, mse: 0.0059 |
128, None | relu | 100 | 16 |  mae (loss): 0.0850, mse: 0.0140 |
256, None | relu | 100 | 16 |  mae (loss): 0.0790, mse: 0.0100 |
64, 32 | relu | 100 | 16 |  mae (loss): 0.0476, mse: 0.0047 |
128, 64 | relu | 100 | 16 |  mae (loss): 0.0466, mse: 0.0042 |
64, None | selu | 100 | 16 |  mae (loss): 0.0595, mse: 0.0068 |
128, None | selu | 100 | 16 |  mae (loss): 0.0788, mse: 0.0104 |
256, None | selu | 100 | 16 |  mae (loss): 0.1132, mse: 0.0187 |
64, 32 | selu | 100 | 16 |  mae (loss): 0.0516, mse: 0.0050 |
128, 64 | selu | 100 | 16 |  mae (loss): 0.0521, mse: 0.0050 |
64, None | tanh | 100 | 16 |  mae (loss): 0.0653, mse: 0.0073 |
128, None | tanh | 100 | 16 |  mae (loss): 0.0704, mse: 0.0091 |
256, None | tanh | 100 | 16 |  mae (loss): 0.1757, mse: 0.0466 |
64, 32 | tanh | 100 | 16 |  mae (loss): 0.0547, mse: 0.0077 |
128, 64 | tanh | 100 | 16 |  mae (loss): 0.0714, mse: 0.0087 |
64, None | relu | 500 | 16 |  mae (loss): 0.0527, mse: 0.0052 |
128, None |  relu | 500 | 16 |  mae (loss): 0.0656, mse: 0.0115 |
256, None | relu | 500 | 16 |  mae (loss): 0.0963, mse: 0.0140 |
64, 32 | relu | 500 | 16 |  mae (loss): 0.0494, mse: 0.0047 |
128, 64 | relu | 500 | 16 |  mae (loss): 0.0417, mse: 0.0033 |
64, None | selu | 500 | 16 |  mae (loss): 0.0682, mse: 0.0078 |
128, None | selu | 500 | 16 |  mae (loss): 0.0670, mse: 0.0076 |
256, None | selu | 500 | 16 |  mae (loss): 0.1296, mse: 0.0246 |
64, 32 | selu | 500 | 16 |  mae (loss): 0.0484, mse: 0.0045 |
128, 64 | selu | 500 | 16 |  mae (loss): 0.0578, mse: 0.0062 |
64, None | tanh | 500 | 16 |  mae (loss): 0.0676, mse: 0.0099 |
128, None | tanh | 500 | 16 |  mae (loss): 0.0817, mse: 0.0116 |
256, None | tanh | 500 | 16 |  mae (loss): 0.1668, mse: 0.0413 |
64, 32 | tanh | 500 | 16 |  mae (loss): 0.0507, mse: 0.0063 |
128, 64 | tanh | 500 | 16 |  mae (loss): 0.0600, mse: 0.0071 |
64, None | relu | 1000 | 16 |  mae (loss): 0.0567, mse: 0.0062 |
128, None | relu | 1000 | 16 |  mae (loss): 0.0704, mse: 0.0118 |
256, None | relu | 1000 | 16 |  mae (loss): 0.1019, mse: 0.0179 |
64, 32 | relu | 1000 | 16 |  mae (loss): 0.0493, mse: 0.0047 |
128, 64 | relu | 1000 | 16 |  mae (loss): 0.0543, mse: 0.0061 |
64, None | selu | 1000 | 16 |  mae (loss): 0.0733, mse: 0.0088 |
128, None | selu | 1000 | 16 |  mae (loss): 0.0830, mse: 0.0112 |
256, None | selu | 1000 | 16 |  mae (loss): 0.1399, mse: 0.0275 |
64, 32 | selu | 1000 | 16 |  mae (loss): 0.0500, mse: 0.0051 |
128, 64 | selu | 1000 | 16 |  mae (loss): 0.0535, mse: 0.0054 |
64, None | tanh | 1000 | 16 |  mae (loss): 0.0724, mse: 0.0119 |
128, None | tanh | 1000 | 16 |  mae (loss): 0.0989, mse: 0.0167 |
256, None | tanh | 1000 | 16 |  mae (loss): 0.1301, mse: 0.0275 |
64, 32 | tanh | 1000 | 16 |  mae (loss): 0.0521, mse: 0.0052 |
128, 64 | tanh | 1000 | 16 |  mae (loss): 0.0687, mse: 0.0087 |
64, None | relu | 100 | 32 |  mae (loss): 0.0586, mse: 0.0086 |
128, None | relu | 100 | 32 |  mae (loss): 0.0606, mse: 0.0075 |
256, None | relu | 100 | 32 |  mae (loss): 0.0804, mse: 0.0111 |
64, 32 | relu | 100 | 32 |  mae (loss): 0.0486, mse: 0.0046 |
128, 64 | relu | 100 | 32 |  mae (loss): 0.0490, mse: 0.0045 |
64, None | selu | 100 | 32 |  mae (loss): 0.0768, mse: 0.0097 |
128, None | selu | 100 | 32 |  mae (loss): 0.0898, mse: 0.0125 |
256, None | selu | 100 | 32 |  mae (loss): 0.1003, mse: 0.0157 |
64, 32 | selu | 100 | 32 |  mae (loss): 0.0526, mse: 0.0053 |
128, 64 | selu | 100 | 32 |  mae (loss): 0.0595, mse: 0.0060 |
64, None | tanh | 100 | 32 |  mae (loss): 0.0697, mse: 0.0088 |
128, None | tanh | 100 | 32 |  mae (loss): 0.0669, mse: 0.0079 |
256, None | tanh | 100 | 32 |  mae (loss): 0.0670, mse: 0.0081 |
64, 32 | tanh | 100 | 32 |  mae (loss): 0.0683, mse: 0.0080 |
128, 64 | tanh | 100 | 32 |  mae (loss): 0.0704, mse: 0.0088 |
64, None | relu | 500 | 32 |  mae (loss): 0.0534, mse: 0.0086 |
128, None | relu | 500 | 32 |  mae (loss): 0.0588, mse: 0.0078 |
256, None | relu | 500 | 32 |  mae (loss): 0.0846, mse: 0.0126 |
64, 32 | relu | 500 | 32 |  mae (loss): 0.0511, mse: 0.0051 |
128, 64 | relu | 500 | 32 |  mae (loss): 0.0452, mse: 0.0038 |
64, None | selu | 500 | 32 |  mae (loss): 0.0790, mse: 0.0102 |
128, None | selu | 500 | 32 |  mae (loss): 0.0841, mse: 0.0119 |
256, None | selu | 500 | 32 |  mae (loss): 0.1100, mse: 0.0191 |
64, 32 | selu | 500 | 32 |  mae (loss): 0.0497, mse: 0.0049 |
128, 64 | selu | 500 | 32 |  mae (loss): 0.0549, mse: 0.0055 |
64, None | tanh | 500 | 32 |  mae (loss): 0.0677, mse: 0.0085 |
128, None | tanh | 500 | 32 |  mae (loss): 0.0732, mse: 0.0091 |
256, None | tanh | 500 | 32 |  mae (loss): 0.0774, mse: 0.0104 |
64, 32 | tanh | 500 | 32 |  mae (loss): 0.0660, mse: 0.0086 |
128, 64 | tanh | 500 | 32 |  mae (loss): 0.0685, mse: 0.0080 |
64, None | relu | 1000 | 32 |  mae (loss): 0.0620, mse: 0.0090 |
128, None | relu | 1000 | 32 |  mae (loss): 0.0617, mse: 0.0070 |
256, None | relu | 1000 | 32 |  mae (loss): 0.1011, mse: 0.0160 |
64, 32 | relu | 1000 | 32 |  mae (loss): 0.0527, mse: 0.0057 |
128, 64 | relu | 1000 | 32 |  mae (loss): 0.0434, mse: 0.0035 |
64, None | selu | 1000 | 32 |  mae (loss): 0.0668, mse: 0.0076 |
128, None | selu | 1000 | 32 |  mae (loss): 0.0802, mse: 0.0105 |
256, None | selu | 1000 | 32 |  mae (loss): 0.0974, mse: 0.0150 |
64, 32 | selu | 1000 | 32 |  mae (loss): 0.0533, mse: 0.0053 |
128, 64 | selu | 1000 | 32 |  mae (loss): 0.0638, mse: 0.0068 |
64, None | tanh | 1000 | 32 |  mae (loss): 0.0640, mse: 0.0071 |
128, None | tanh | 1000 | 32 |  mae (loss): 0.0700, mse: 0.0084 |
256, None | tanh | 1000 | 32 |  mae (loss): 0.0971, mse: 0.0149 |
64, 32 | tanh | 1000 | 32 |  mae (loss): 0.0591, mse: 0.0063 |
128, 64 | tanh | 1000 | 32 |  mae (loss): 0.0765, mse: 0.0103 |
64, None | relu | 100 | 64 |  mae (loss): 0.0472, mse: 0.0045 |
128, None | relu | 100 | 64 |  mae (loss): 0.0590, mse: 0.0064 |
256, None | relu | 100 | 64 |  mae (loss): 0.0775, mse: 0.0095 |
64, 32 | relu | 100 | 64 |  mae (loss): 0.0496, mse: 0.0048 |
128, 64 | relu | 100 | 64 |  mae (loss): 0.0429, mse: 0.0037 |
64, None | selu | 100 | 64 |  mae (loss): 0.0590, mse: 0.0065 |
128, None | selu | 100 | 64 |  mae (loss): 0.0973, mse: 0.0145 |
256, None | selu | 100 | 64 |  mae (loss): 0.0862, mse: 0.0114 |
64, 32 | selu | 100 | 64 |  mae (loss): 0.0485, mse: 0.0048 |
128, 64 | selu | 100 | 64 |  mae (loss): 0.0531, mse: 0.0051 |
64, None | tanh | 100 | 64 |  mae (loss): 0.0475, mse: 0.0044 |
128, None | tanh | 100 | 64 |  mae (loss): 0.0808, mse: 0.0101 |
256, None | tanh | 100 | 64 |  mae (loss): 0.0959, mse: 0.0135 |
64, 32 | tanh | 100 | 64 |  mae (loss): 0.0669, mse: 0.0076 |
128, 64 | tanh | 100 | 64 |  mae (loss): 0.0640, mse: 0.0071 |
64, None | relu | 500 | 64 |  mae (loss): 0.0478, mse: 0.0045 |
128, None | relu | 500 | 64 |  mae (loss): 0.0562, mse: 0.0058 |
256, None | relu | 500 | 64 |  mae (loss): 0.0829, mse: 0.0111 |
64, 32 | relu | 500 | 64 |  mae (loss): 0.0468, mse: 0.0044 |
128, 64 | relu | 500 | 64 |  mae (loss): 0.0456, mse: 0.0040 |
64, None | selu | 500 | 64 |  mae (loss): 0.0484, mse: 0.0046 |
128, None | selu | 500 | 64 |  mae (loss): 0.0816, mse: 0.0104 |
256, None | selu | 500 | 64 |  mae (loss): 0.1330, mse: 0.0253 |
64, 32 | selu | 500 | 64 |  mae (loss): 0.0455, mse: 0.0041 |
128, 64 | selu | 500 | 64 |  mae (loss): 0.0596, mse: 0.0061 |
64, None | tanh | 500 | 64 |  mae (loss): 0.0564, mse: 0.0060 |
128, None | tanh | 500 | 64 |  mae (loss): 0.0708, mse: 0.0082 |
256, None | tanh | 500 | 64 |  mae (loss): 0.0782, mse: 0.0110 |
64, 32 | tanh | 500 | 64 |  mae (loss): 0.0567, mse: 0.0060 |
128, 64 | tanh | 500 | 64 |  mae (loss): 0.0669, mse: 0.0077 |
64, None | relu | 1000 | 64 |  mae (loss): 0.0518, mse: 0.0050 |
128, None | relu | 1000 | 64 |  mae (loss): 0.0494, mse: 0.0047 |
256, None | relu | 1000 | 64 |  mae (loss): 0.0664, mse: 0.0074 |
64, 32 | relu | 1000 | 64 |  mae (loss): 0.0450, mse: 0.0040 |
128, 64 | relu | 1000 | 64 |  mae (loss): 0.0413, mse: 0.0034 |
64, None | selu | 1000 | 64 |  mae (loss): 0.0478, mse: 0.0045 |
128, None | selu | 1000 | 64 |  mae (loss): 0.0844, mse: 0.0110 |
256, None | selu | 1000 | 64 |  mae (loss): 0.0791, mse: 0.0099 |
64, 32 | selu | 1000 | 64 |  mae (loss): 0.0488, mse: 0.0047 |
128, 64 | selu | 1000 | 64 |  mae (loss): 0.0550, mse: 0.0054 |
64, None | tanh | 1000 | 64 |  mae (loss): 0.0486, mse: 0.0047 |
128, None | tanh | 1000 | 64 |  mae (loss): 0.0750, mse: 0.0088 |
256, None | tanh | 1000 | 64 |  mae (loss): 0.0844, mse: 0.0113 |
64, 32 | tanh | 1000 | 64 |  mae (loss): 0.0606, mse: 0.0066 |
128, 64 | tanh | 1000 | 64 |  mae (loss): 0.0638, mse: 0.0072 |
64, None | relu | 100 | 128 |  mae (loss): 0.0510, mse: 0.0050 |
128, None | relu | 100 | 128 |  mae (loss): 0.0504, mse: 0.0048 |
256, None | relu | 100 | 128 |  mae (loss): 0.0493, mse: 0.0046 |
64, 32 | relu | 100 | 128 |  mae (loss): 0.0449, mse: 0.0041 |
128, 64 | relu | 100 | 128 |  mae (loss): 0.0411, mse: 0.0034 |
64, None | selu | 100 | 128 |  mae (loss): 0.0479, mse: 0.0045 |
128, None | selu | 100 | 128 |  mae (loss): 0.0499, mse: 0.0047 |
256, None | selu | 100 | 128 |  mae (loss): 0.0618, mse: 0.0067 |
64, 32 | selu | 100 | 128 |  mae (loss): 0.0450, mse: 0.0039 |
128, 64 | selu | 100 | 128 |  mae (loss): 0.0442, mse: 0.0038 |
64, None | tanh | 100 | 128 |  mae (loss): 0.0487, mse: 0.0045 |
128, None | tanh | 100 | 128 |  mae (loss): 0.0580, mse: 0.0060 |
256, None | tanh | 100 | 128 |  mae (loss): 0.0904, mse: 0.0126 |
64, 32 | tanh | 100 | 128 |  mae (loss): 0.0496, mse: 0.0046 |
128, 64 | tanh | 100 | 128 |  mae (loss): 0.0689, mse: 0.0077 |
64, None | relu | 500 | 128 |  mae (loss): 0.0501, mse: 0.0049 |
128, None | relu | 500 | 128 |  mae (loss): 0.0461, mse: 0.0042 |
256, None | relu | 500 | 128 |  mae (loss): 0.0513, mse: 0.0048 |
64, 32 | relu | 500 | 128 |  mae (loss): 0.0423, mse: 0.0037 |
128, 64 | relu | 500 | 128 |  mae (loss): 0.0443, mse: 0.0038 |
64, None | selu | 500 | 128 |  mae (loss): 0.0485, mse: 0.0047 |
128, None | selu | 500 | 128 |  mae (loss): 0.0514, mse: 0.0049 |
256, None | selu | 500 | 128 |  mae (loss): 0.0640, mse: 0.0070 |
64, 32 | selu | 500 | 128 |  mae (loss): 0.0467, mse: 0.0042 |
128, 64 | selu | 500 | 128 |  mae (loss): 0.0466, mse: 0.0041 |
64, None | tanh | 500 | 128 |  mae (loss): 0.0499, mse: 0.0047 |
128, None | tanh | 500 | 128 |  mae (loss): 0.0591, mse: 0.0062 |
256, None | tanh | 500 | 128 |  mae (loss): 0.0637, mse: 0.0070 |
64, 32 | tanh | 500 | 128 |  mae (loss): 0.0487, mse: 0.0044 |
128, 64 | tanh | 500 | 128 |  mae (loss): 0.0558, mse: 0.0055 |
64, None | relu | 1000 | 128 |  mae (loss): 0.0487, mse: 0.0046 |
128, None | relu | 1000 | 128 |  mae (loss): 0.0462, mse: 0.0042 |
256, None | relu | 1000 | 128 |  mae (loss): 0.0562, mse: 0.0055 |
64, 32 | relu | 1000 | 128 |  mae (loss): 0.0432, mse: 0.0037 |
128, 64 | relu | 1000 | 128 |  mae (loss): 0.0397, mse: 0.0031 |
64, None | selu | 1000 | 128 |  mae (loss): 0.0503, mse: 0.0048 |
128, None | selu | 1000 | 128 |  mae (loss): 0.0524, mse: 0.0051 |
256, None | selu | 1000 | 128 |  mae (loss): 0.0823, mse: 0.0105 |
64, 32 | selu | 1000 | 128 |  mae (loss): 0.0487, mse: 0.0047 |
128, 64 | selu | 1000 | 128 |  mae (loss): 0.0499, mse: 0.0047 |
64, None | tanh | 1000 | 128 |  mae (loss): 0.0530, mse: 0.0051 |
128, None | tanh | 1000 | 128 |  mae (loss): 0.0562, mse: 0.0057 |
256, None | tanh | 1000 | 128 |  mae (loss): 0.0885, mse: 0.0118 |
64, 32 | tanh | 1000 | 128 |  mae (loss): 0.0478, mse: 0.0044 |
128, 64 | tanh | 1000 | 128 |  mae (loss): 0.0509, mse: 0.0047 |


Best result: neurons=128, 64, activation=relu, batch_size=128, epochs=1000, MAE=0.0397, MSE=0.0031

### SGD optimizer results

- Before Feedback:  
SGD(lr=3e-4, decay=1e-7, momentum=0.9, nesterov=True)

| Activation Function  |	Epochs  |  Batch Size  |	Test set               |
|----------------------|------------|--------------|---------------------------|
| relu                 | 100        | 16             | mse: 0.0066 - mae: 0.0559              | 
| relu                 | 100        | 32             | mse: 0.0087 - mae: 0.0649              | 
| relu                 | 100        | 64             | mse: 0.0108 - mae: 0.0726              | 
| relu                 | 100        | 128            | mse: 0.0118 - mae: 0.0757              |
| **relu**             | **1000**   | **16**         | **mse: 0.0038 - mae: 0.0417**          | 
| relu                 | 1000       | 32             | mse: 0.0043 - mae: 0.0444              | 
| relu                 | 1000       | 64             | mse: 0.0052 - mae: 0.0490              | 
| relu                 | 1000       | 128            | mse: 0.0064 - mae: 0.0545              |
| tanh                 | 100        | 16             | mse: 0.0078 - mae: 0.0619              | 
| tanh                 | 100        | 32             | mse: 0.0095 - mae: 0.0670              | 
| tanh                 | 100        | 64             | mse: 0.0112 - mae: 0.0719              | 
| tanh                 | 100        | 128            | mse: 0.0127 - mae: 0.0745              |
| tanh                 | 1000       | 16             | mse: 0.0042 - mae: 0.0453              | 
| tanh                 | 1000       | 32             | mse: 0.0047 - mae: 0.0475              | 
| tanh                 | 1000       | 64             | mse: 0.0055 - mae: 0.0513              | 
| tanh                 | 1000       | 128            | mse: 0.0071 - mae: 0.0593              | 
| selu                 | 100        | 16             | mse: 0.0061 - mae: 0.0539              |
| selu                 | 100        | 32             | mse: 0.0076 - mae: 0.0606              | 
| selu                 | 100        | 64             | mse: 0.0094 - mae: 0.0658              | 
| selu                 | 100        | 128            | mse: 0.0112 - mae: 0.0711              | 
| selu                 | 1000       | 16             | mse: 0.0041 - mae: 0.0442              |
| selu                 | 1000       | 32             | mse: 0.0042 - mae: 0.0447              | 
| selu                 | 1000       | 64             | mse: 0.0047 - mae: 0.0470              | 
| selu                 | 1000       | 128            | mse: 0.0058 - mae: 0.0521              | 


- After Feedback:  
SGD(lr=1e-4, decay=1e-7, momentum=0.9, nesterov=True)

| Neurons | Activation function | Epochs | Batch size | Results on test set|
|---------|---------------------|--------|------------|--------------------|
64, None | relu | 100 | 16 |  mae (loss): 0.0634, mse: 0.0096 |
128, None | relu | 100 | 16 |  mae (loss): 0.0544, mse: 0.0097 |
256, None | relu | 100 | 16 |  mae (loss): 0.0489, mse: 0.0062 |
64, 32 | relu | 100 | 16 |  mae (loss): 0.0625, mse: 0.0088 |
128, 64 | relu | 100 | 16 |  mae (loss): 0.0549, mse: 0.0071 |
64, None | selu | 100 | 16 |  mae (loss): 0.0564, mse: 0.0065 |
128, None | selu | 100 | 16 |  mae (loss): 0.0522, mse: 0.0060 |
256, None | selu | 100 | 16 |  mae (loss): 0.0511, mse: 0.0054 |
64, 32 | selu | 100 | 16 |  mae (loss): 0.0554, mse: 0.0061 |
128, 64 | selu | 100 | 16 |  mae (loss): 0.0521, mse: 0.0060 |
64, None | tanh | 100 | 16 |  mae (loss): 0.0595, mse: 0.0111 |
128, None | tanh | 100 | 16 |  mae (loss): 0.0538, mse: 0.0063 |
256, None | tanh | 100 | 16 |  mae (loss): 0.0515, mse: 0.0053 |
64, 32 | tanh | 100 | 16 |  mae (loss): 0.0628, mse: 0.0086 |
128, 64 | tanh | 100 | 16 |  mae (loss): 0.0548, mse: 0.0063 |
64, None | relu | 500 | 16 |  mae (loss): 0.0490, mse: 0.0060 |
128, None | relu | 500 | 16 |  mae (loss): 0.0474, mse: 0.0057 |
256, None | relu | 500 | 16 |  mae (loss): 0.0466, mse: 0.0062 |
64, 32 | relu | 500 | 16 |  mae (loss): 0.0491, mse: 0.0055 |
128, 64 | relu | 500 | 16 |  mae (loss): 0.0431, mse: 0.0044 |
64, None | selu | 500 | 16 |  mae (loss): 0.0472, mse: 0.0046 |
128, None | selu | 500 | 16 |  mae (loss): 0.0453, mse: 0.0044 |
256, None | selu | 500 | 16 |  mae (loss): 0.0499, mse: 0.0051 |
64, 32 | selu | 500 | 16 |  mae (loss): 0.0467, mse: 0.0045 |
128, 64 | selu | 500 | 16 |  mae (loss): 0.0429, mse: 0.0040 |
64, None | tanh | 500 | 16 |  mae (loss): 0.0463, mse: 0.0044 |
128, None | tanh | 500 | 16 |  mae (loss): 0.0467, mse: 0.0045 |
256, None | tanh | 500 | 16 |  mae (loss): 0.0495, mse: 0.0049 |
64, 32 | tanh | 500 | 16 |  mae (loss): 0.0454, mse: 0.0044 |
128, 64 | tanh | 500 | 16 |  mae (loss): 0.0425, mse: 0.0037 |
64, None | relu | 1000 | 16 |  mae (loss): 0.0466, mse: 0.0055 |
128, None | relu | 1000 | 16 |  mae (loss): 0.0460, mse: 0.0060 |
256, None | relu | 1000 | 16 |  mae (loss): 0.0445, mse: 0.0051 |
64, 32 | relu | 1000 | 16 |  mae (loss): 0.0443, mse: 0.0041 |
128, 64 | relu | 1000 | 16 |  mae (loss): 0.0428, mse: 0.0041 |
64, None | selu | 1000 | 16 |  mae (loss): 0.0445, mse: 0.0042 |
128, None | selu | 1000 | 16 |  mae (loss): 0.0443, mse: 0.0042 |
256, None | selu | 1000 | 16 |  mae (loss): 0.0475, mse: 0.0052 |
64, 32 | selu | 1000 | 16 |  mae (loss): 0.0426, mse: 0.0039 |
128, 64 | selu | 1000 | 16 |  mae (loss): 0.0399, mse: 0.0033 |
64, None | tanh | 1000 | 16 |  mae (loss): 0.0453, mse: 0.0045 |
128, None | tanh | 1000 | 16 |  mae (loss): 0.0461, mse: 0.0044 |
256, None | tanh | 1000 | 16 |  mae (loss): 0.0500, mse: 0.0051 |
64, 32 | tanh | 1000 | 16 |  mae (loss): 0.0436, mse: 0.0040 |
128, 64 | tanh | 1000 | 16 |  mae (loss): 0.0426, mse: 0.0037 |
64, None | relu | 100 | 32 |  mae (loss): 0.0678, mse: 0.0122 |
128, None | relu | 100 | 32 |  mae (loss): 0.0591, mse: 0.0079 |
256, None | relu | 100 | 32 |  mae (loss): 0.0498, mse: 0.0051 |
64, 32 | relu | 100 | 32 |  mae (loss): 0.0714, mse: 0.0111 |
128, 64 | relu | 100 | 32 |  mae (loss): 0.0616, mse: 0.0085 |
64, None | selu | 100 | 32 |  mae (loss): 0.0630, mse: 0.0079 |
128, None | selu | 100 | 32 |  mae (loss): 0.0552, mse: 0.0064 |
256, None | selu | 100 | 32 |  mae (loss): 0.0513, mse: 0.0056 |
64, 32 | selu | 100 | 32 |  mae (loss): 0.0624, mse: 0.0076 |
128, 64 | selu | 100 | 32 |  mae (loss): 0.0574, mse: 0.0065 |
64, None | tanh | 100 | 32 |  mae (loss): 0.0713, mse: 0.0159 |
128, None | tanh | 100 | 32 |  mae (loss): 0.0551, mse: 0.0061 |
256, None | tanh | 100 | 32 |  mae (loss): 0.0523, mse: 0.0054 |
64, 32 | tanh | 100 | 32 |  mae (loss): 0.0658, mse: 0.0085 |
128, 64 | tanh | 100 | 32 |  mae (loss): 0.0585, mse: 0.0068 |
64, None | relu | 500 | 32 |  mae (loss): 0.0510, mse: 0.0052 |
128, None | relu | 500 | 32 |  mae (loss): 0.0469, mse: 0.0054 |
256, None | relu | 500 | 32 |  mae (loss): 0.0430, mse: 0.0040 |
64, 32 | relu | 500 | 32 |  mae (loss): 0.0505, mse: 0.0057 |
128, 64 | relu | 500 | 32 |  mae (loss): 0.0457, mse: 0.0045 |
64, None | selu | 500 | 32 |  mae (loss): 0.0477, mse: 0.0046 |
128, None | selu | 500 | 32 |  mae (loss): 0.0452, mse: 0.0042 |
256, None | selu | 500 | 32 |  mae (loss): 0.0452, mse: 0.0043 |
64, 32 | selu | 500 | 32 |  mae (loss): 0.0472, mse: 0.0046 |
128, 64 | selu | 500 | 32 |  mae (loss): 0.0449, mse: 0.0042 |
64, None | tanh | 500 | 32 |  mae (loss): 0.0480, mse: 0.0048 |
128, None | tanh | 500 | 32 |  mae (loss): 0.0466, mse: 0.0044 |
256, None | tanh | 500 | 32 |  mae (loss): 0.0492, mse: 0.0049 |
64, 32 | tanh | 500 | 32 |  mae (loss): 0.0483, mse: 0.0047 |
128, 64 | tanh | 500 | 32 |  mae (loss): 0.0432, mse: 0.0038 |
64, None | relu | 1000 | 32 |  mae (loss): 0.0479, mse: 0.0046 |
128, None | relu | 1000 | 32 |  mae (loss): 0.0471, mse: 0.0053 |
256, None | relu | 1000 | 32 |  mae (loss): 0.0428, mse: 0.0039 |
64, 32 | relu | 1000 | 32 |  mae (loss): 0.0478, mse: 0.0050 |
128, 64 | relu | 1000 | 32 |  mae (loss): 0.0420, mse: 0.0037 |
64, None | selu | 1000 | 32 |  mae (loss): 0.0454, mse: 0.0043 |
128, None | selu | 1000 | 32 |  mae (loss): 0.0434, mse: 0.0041 |
256, None | selu | 1000 | 32 |  mae (loss): 0.0442, mse: 0.0040 |
64, 32 | selu | 1000 | 32 |  mae (loss): 0.0426, mse: 0.0038 |
128, 64 | selu | 1000 | 32 |  mae (loss): 0.0403, mse: 0.0035 |
64, None | tanh | 1000 | 32 |  mae (loss): 0.0450, mse: 0.0042 |
128, None | tanh | 1000 | 32 |  mae (loss): 0.0451, mse: 0.0042 |
256, None | tanh | 1000 | 32 |  mae (loss): 0.0481, mse: 0.0047 |
64, 32 | tanh | 1000 | 32 |  mae (loss): 0.0450, mse: 0.0050 |
128, 64 | tanh | 1000 | 32 |  mae (loss): 0.0398, mse: 0.0032 |
64, None | relu | 100 | 64 |  mae (loss): 0.0788, mse: 0.0136 |
128, None | relu | 100 | 64 |  mae (loss): 0.0662, mse: 0.0096 |
256, None | relu | 100 | 64 |  mae (loss): 0.0563, mse: 0.0064 |
64, 32 | relu | 100 | 64 |  mae (loss): 0.0763, mse: 0.0123 |
128, 64 | relu | 100 | 64 |  mae (loss): 0.0739, mse: 0.0113 |
64, None | selu | 100 | 64 |  mae (loss): 0.0755, mse: 0.0113 |
128, None | selu | 100 | 64 |  mae (loss): 0.0642, mse: 0.0086 |
256, None | selu | 100 | 64 |  mae (loss): 0.0542, mse: 0.0060 |
64, 32 | selu | 100 | 64 |  mae (loss): 0.0775, mse: 0.0119 |
128, 64 | selu | 100 | 64 |  mae (loss): 0.0665, mse: 0.0084 |
64, None | tanh | 100 | 64 |  mae (loss): 0.0792, mse: 0.0128 |
128, None | tanh | 100 | 64 |  mae (loss): 0.0707, mse: 0.0103 |
256, None | tanh | 100 | 64 |  mae (loss): 0.0552, mse: 0.0059 |
64, 32 | tanh | 100 | 64 |  mae (loss): 0.0729, mse: 0.0106 |
128, 64 | tanh | 100 | 64 |  mae (loss): 0.0658, mse: 0.0088 |
64, None | relu | 500 | 64 |  mae (loss): 0.0581, mse: 0.0066 |
128, None | relu | 500 | 64 |  mae (loss): 0.0475, mse: 0.0047 |
256, None | relu | 500 | 64 |  mae (loss): 0.0442, mse: 0.0042 |
64, 32 | relu | 500 | 64 |  mae (loss): 0.0565, mse: 0.0065 |
128, 64 | relu | 500 | 64 |  mae (loss): 0.0479, mse: 0.0046 |
64, None | selu | 500 | 64 |  mae (loss): 0.0501, mse: 0.0051 |
128, None | selu | 500 | 64 |  mae (loss): 0.0467, mse: 0.0047 |
256, None | selu | 500 | 64 |  mae (loss): 0.0434, mse: 0.0039 |
64, 32 | selu | 500 | 64 |  mae (loss): 0.0516, mse: 0.0054 |
128, 64 | selu | 500 | 64 |  mae (loss): 0.0457, mse: 0.0043 |
64, None | tanh | 500 | 64 |  mae (loss): 0.0522, mse: 0.0065 |
128, None | tanh | 500 | 64 |  mae (loss): 0.0473, mse: 0.0045 |
256, None | tanh | 500 | 64 |  mae (loss): 0.0476, mse: 0.0047 |
64, 32 | tanh | 500 | 64 |  mae (loss): 0.0522, mse: 0.0056 |
128, 64 | tanh | 500 | 64 |  mae (loss): 0.0466, mse: 0.0044 |
64, None | relu | 1000 | 64 |  mae (loss): 0.0522, mse: 0.0054 |
128, None | relu | 1000 | 64 |  mae (loss): 0.0443, mse: 0.0044 |
256, None | relu | 1000 | 64 |  mae (loss): 0.0416, mse: 0.0037 |
64, 32 | relu | 1000 | 64 |  mae (loss): 0.0495, mse: 0.0051 |
128, 64 | relu | 1000 | 64 |  mae (loss): 0.0420, mse: 0.0037 |
64, None | selu | 1000 | 64 |  mae (loss): 0.0468, mse: 0.0045 |
128, None | selu | 1000 | 64 |  mae (loss): 0.0442, mse: 0.0041 |
256, None | selu | 1000 | 64 |  mae (loss): 0.0434, mse: 0.0039 |
64, 32 | selu | 1000 | 64 |  mae (loss): 0.0464, mse: 0.0044 |
128, 64 | selu | 1000 | 64 |  mae (loss): 0.0422, mse: 0.0037 |
64, None | tanh | 1000 | 64 |  mae (loss): 0.0467, mse: 0.0044 |
128, None | tanh | 1000 | 64 |  mae (loss): 0.0458, mse: 0.0043 |
256, None | tanh | 1000 | 64 |  mae (loss): 0.0458, mse: 0.0042 |
64, 32 | tanh | 1000 | 64 |  mae (loss): 0.0477, mse: 0.0051 |
128, 64 | tanh | 1000 | 64 |  mae (loss): 0.0424, mse: 0.0037 |
64, None | relu | 100 | 128 |  mae (loss): 0.5362, mse: 0.4794 |
128, None | relu | 100 | 128 |  mae (loss): 0.0860, mse: 0.0189 |
256, None | relu | 100 | 128 |  mae (loss): 0.0727, mse: 0.0107 |
64, 32 | relu | 100 | 128 |  mae (loss): 0.6299, mse: 0.7711 |
128, 64 | relu | 100 | 128 |  mae (loss): 0.5846, mse: 0.7124 |
64, None | selu | 100 | 128 |  mae (loss): 0.1056, mse: 0.0221 |
128, None | selu | 100 | 128 |  mae (loss): 0.5491, mse: 0.5668 |
256, None | selu | 100 | 128 |  mae (loss): 0.5081, mse: 0.4727 |
64, 32 | selu | 100 | 128 |  mae (loss): 0.1076, mse: 0.0236 |
128, 64 | selu | 100 | 128 |  mae (loss): 0.0849, mse: 0.0151 |
64, None | tanh | 100 | 128 |  mae (loss): 0.6166, mse: 0.7044 |
128, None | tanh | 100 | 128 |  mae (loss): 0.0877, mse: 0.0154 |
256, None | tanh | 100 | 128 |  mae (loss): 0.0678, mse: 0.0091 |
64, 32 | tanh | 100 | 128 |  mae (loss): 0.0846, mse: 0.0137 |
128, 64 | tanh | 100 | 128 |  mae (loss): 0.5282, mse: 0.4117 |
64, None | relu | 500 | 128 |  mae (loss): 0.0635, mse: 0.0080 |
128, None | relu | 500 | 128 |  mae (loss): 0.0565, mse: 0.0064 |
256, None | relu | 500 | 128 |  mae (loss): 0.0471, mse: 0.0046 |
64, 32 | relu | 500 | 128 |  mae (loss): 0.6312, mse: 0.8005 |
128, 64 | relu | 500 | 128 |  mae (loss): 0.5723, mse: 0.7352 |
64, None | selu | 500 | 128 |  mae (loss): 0.0589, mse: 0.0067 |
128, None | selu | 500 | 128 |  mae (loss): 0.0518, mse: 0.0054 |
256, None | selu | 500 | 128 |  mae (loss): 0.0455, mse: 0.0042 |
64, 32 | selu | 500 | 128 |  mae (loss): 0.5830, mse: 0.5813 |
128, 64 | selu | 500 | 128 |  mae (loss): 0.0513, mse: 0.0052 |
64, None | tanh | 500 | 128 |  mae (loss): 0.5893, mse: 0.6127 |
128, None | tanh | 500 | 128 |  mae (loss): 0.5595, mse: 0.6199 |
256, None | tanh | 500 | 128 |  mae (loss): 0.5004, mse: 0.4036 |
64, 32 | tanh | 500 | 128 |  mae (loss): 0.6168, mse: 0.5923 |
128, 64 | tanh | 500 | 128 |  mae (loss): 0.0521, mse: 0.0055 |
64, None | relu | 1000 | 128 |  mae (loss): 0.5693, mse: 0.5552 |
128, None | relu | 1000 | 128 |  mae (loss): 0.0504, mse: 0.0052 |
256, None | relu | 1000 | 128 |  mae (loss): 0.0432, mse: 0.0040 |
64, 32 | relu | 1000 | 128 |  mae (loss): 0.6587, mse: 0.8470 |
128, 64 | relu | 1000 | 128 |  mae (loss): 0.6132, mse: 0.7916 |
64, None | selu | 1000 | 128 |  mae (loss): 0.5706, mse: 0.6322 |
128, None | selu | 1000 | 128 |  mae (loss): 0.0460, mse: 0.0044 |
256, None | selu | 1000 | 128 |  mae (loss): 0.0441, mse: 0.0040 |
64, 32 | selu | 1000 | 128 |  mae (loss): 0.0503, mse: 0.0051 |
128, 64 | selu | 1000 | 128 |  mae (loss): 0.0449, mse: 0.0041 |
64, None | tanh | 1000 | 128 |  mae (loss): 0.5727, mse: 0.6228 |
128, None | tanh | 1000 | 128 |  mae (loss): 0.5409, mse: 0.5551 |
256, None | tanh | 1000 | 128 |  mae (loss): 0.0471, mse: 0.0045 |
64, 32 | tanh | 1000 | 128 |  mae (loss): 0.6182, mse: 0.6194 |
128, 64 | tanh | 1000 | 128 |  mae (loss): 0.5612, mse: 0.4772 |


Best result: neurons=128, 64, activation=tanh, batch_size=32, epochs=1000, MAE=0.0398, MSE=0.0032

### Nestorov Adam optimizer results

- Before Feedback:  
Nadam(lr=3e-4, beta_1=0.9, beta_2=0.999)


| Activation Function  |	Epochs    |  Batch Size  |	Test set                 |
|----------------------|------------|--------------|---------------------------|
| relu                 | 100        | 16             | mse: 0.0035 - mae: 0.0409              | 
| relu                 | 100        | 32             | mse: 0.0033 - mae: 0.0394              | 
| relu                 | 100        | 64             | mse: 0.0035 - mae: 0.0405              | 
| relu                 | 100        | 128            | mse: 0.0041 - mae: 0.0442              |
| **relu**             | **1000**   | **16**         | **mse: 0.0032 - mae: 0.0393**          | 
| relu                 | 1000       | 32             | mse: 0.0033 - mae: 0.0396              | 
| relu                 | 1000       | 64             | mse: 0.0036 - mae: 0.0412              | 
| relu                 | 1000       | 128            | mse: 0.0033 - mae: 0.0397              |
| tanh                 | 100        | 16             | mse: 0.0045 - mae: 0.0464              | 
| tanh                 | 100        | 32             | mse: 0.0040 - mae: 0.0438              | 
| tanh                 | 100        | 64             | mse: 0.0042 - mae: 0.0450              | 
| tanh                 | 100        | 128            | mse: 0.0047 - mae: 0.0483              |
| tanh                 | 1000       | 16             | mse: 0.0044 - mae: 0.0459              | 
| tanh                 | 1000       | 32             | mse: 0.0041 - mae: 0.0445              | 
| tanh                 | 1000       | 64             | mse: 0.0041 - mae: 0.0447              | 
| tanh                 | 1000       | 128            | mse: 0.0048 - mae: 0.0484              | 
| selu                 | 100        | 16             | mse: 0.0034 - mae: 0.0400              |
| selu                 | 100        | 32             | mse: 0.0039 - mae: 0.0427              | 
| selu                 | 100        | 64             | mse: 0.0041 - mae: 0.0443              | 
| selu                 | 100        | 128            | mse: 0.0047 - mae: 0.0478              | 
| selu                 | 1000       | 16             | mse: 0.0041 - mae: 0.0443              |
| selu                 | 1000       | 32             | mse: 0.0039 - mae: 0.0430              | 
| selu                 | 1000       | 64             | mse: 0.0041 - mae: 0.0444              | 
| selu                 | 1000       | 128            | mse: 0.0046 - mae: 0.0473              | 

- Before Feedback with BatchNormalization()  
Nadam(lr=3e-4, beta_1=0.9, beta_2=0.999)

| Activation Function  |	Epochs    |  Batch Size  |	Test set                 |
|----------------------|------------|--------------|---------------------------|
| relu                 | 100        | 16             | mse: 0.0074 - mae: 0.0679              | 
| relu                 | 100        | 32             | mse: 0.0050 - mae: 0.0526              | 
| relu                 | 100        | 64             | mse: 0.0060 - mae: 0.0597              | 
| **relu**                 | **100**        | **128**            | **mse: 0.0041 - mae: 0.0472**              |
| tanh                 | 100        | 16             | mse: 0.0054 - mae: 0.0537              |
| tanh                 | 100        | 32             | mse: 0.0071 - mae: 0.0663              | 
| tanh                 | 100        | 64             | mse: 0.0059 - mae: 0.0584              | 
| tanh                 | 100        | 128            | mse: 0.0061 - mae: 0.0596              |
| selu                 | 100        | 16             | mse: 0.0064 - mae: 0.0614              |
| selu                 | 100        | 32             | mse: 0.0050 - mae: 0.0534              | 
| selu                 | 100        | 64             | mse: 0.0129 - mae: 0.0910              | 
| selu                 | 100        | 128            | mse: 0.0103 - mae: 0.0823              | 


- After Feedback:  
Nadam(lr=1e-4, beta_1=0.9, beta_2=0.999)

| Neurons | Activation function | Epochs | Batch size | Results on test set|
|---------|---------------------|--------|------------|--------------------|
64, None | | relu | 100 | 16 |  mae (loss): 0.0555, mse: 0.0054 |
128, None | | relu | 100 | 16 |  mae (loss): 0.0685, mse: 0.0082 |
256, None | | relu | 100 | 16 |  mae (loss): 0.0861, mse: 0.0126 |
64, 32 | | relu | 100 | 16 |  mae (loss): 0.0605, mse: 0.0067 |
128, 64 | | relu | 100 | 16 |  mae (loss): 0.0538, mse: 0.0052 |
64, None | | selu | 100 | 16 |  mae (loss): 0.0866, mse: 0.0119 |
128, None | | selu | 100 | 16 |  mae (loss): 0.0691, mse: 0.0080 |
256, None | | selu | 100 | 16 |  mae (loss): 0.0897, mse: 0.0131 |
64, 32 | | selu | 100 | 16 |  mae (loss): 0.0564, mse: 0.0059 |
128, 64 | | selu | 100 | 16 |  mae (loss): 0.0490, mse: 0.0043 |
64, None | | tanh | 100 | 16 |  mae (loss): 0.0557, mse: 0.0061 |
128, None | | tanh | 100 | 16 |  mae (loss): 0.0809, mse: 0.0110 |
256, None | | tanh | 100 | 16 |  mae (loss): 0.0752, mse: 0.0104 |
64, 32 | | tanh | 100 | 16 |  mae (loss): 0.0823, mse: 0.0120 |
128, 64 | | tanh | 100 | 16 |  mae (loss): 0.0750, mse: 0.0097 |
64, None | | relu | 500 | 16 |  mae (loss): 0.0653, mse: 0.0089 |
128, None | | relu | 500 | 16 |  mae (loss): 0.0967, mse: 0.0150 |
256, None | | relu | 500 | 16 |  mae (loss): 0.0697, mse: 0.0086 |
64, 32 | | relu | 500 | 16 |  mae (loss): 0.0582, mse: 0.0063 |
128, 64 | | relu | 500 | 16 |  mae (loss): 0.0489, mse: 0.0047 |
64, None | | selu | 500 | 16 |  mae (loss): 0.0901, mse: 0.0132 |
128, None | | selu | 500 | 16 |  mae (loss): 0.0701, mse: 0.0084 |
256, None | | selu | 500 | 16 |  mae (loss): 0.0831, mse: 0.0115 |
64, 32 | | selu | 500 | 16 |  mae (loss): 0.0740, mse: 0.0095 |
128, 64 | | selu | 500 | 16 |  mae (loss): 0.0729, mse: 0.0089 |
64, None | | tanh | 500 | 16 |  mae (loss): 0.0725, mse: 0.0091 |
128, None | | tanh | 500 | 16 |  mae (loss): 0.0824, mse: 0.0115 |
256, None | | tanh | 500 | 16 |  mae (loss): 0.0962, mse: 0.0157 |
64, 32 | | tanh | 500 | 16 |  mae (loss): 0.0587, mse: 0.0063 |
128, 64 | | tanh | 500 | 16 |  mae (loss): 0.0813, mse: 0.0112 |
64, None | | relu | 1000 | 16 |  mae (loss): 0.0611, mse: 0.0065 |
128, None | | relu | 1000 | 16 |  mae (loss): 0.0913, mse: 0.0138 |
256, None | | relu | 1000 | 16 |  mae (loss): 0.0935, mse: 0.0152 |
64, 32 | | relu | 1000 | 16 |  mae (loss): 0.0514, mse: 0.0052 |
128, 64 | | relu | 1000 | 16 |  mae (loss): 0.0635, mse: 0.0071 |
64, None | | selu | 1000 | 16 |  mae (loss): 0.1038, mse: 0.0176 |
128, None | | selu | 1000 | 16 |  mae (loss): 0.0871, mse: 0.0121 |
256, None | | selu | 1000 | 16 |  mae (loss): 0.0784, mse: 0.0097 |
64, 32 | | selu | 1000 | 16 |  mae (loss): 0.0638, mse: 0.0069 |
128, 64 | | selu | 1000 | 16 |  mae (loss): 0.0634, mse: 0.0066 |
64, None | | tanh | 1000 | 16 |  mae (loss): 0.0945, mse: 0.0148 |
128, None | | tanh | 1000 | 16 |  mae (loss): 0.0785, mse: 0.0106 |
256, None | | tanh | 1000 | 16 |  mae (loss): 0.1204, mse: 0.0244 |
64, 32 | | tanh | 1000 | 16 |  mae (loss): 0.0613, mse: 0.0078 |
128, 64 | | tanh | 1000 | 16 |  mae (loss): 0.0743, mse: 0.0089 |
64, None | | relu | 100 | 32 |  mae (loss): 0.0526, mse: 0.0063 |
128, None | | relu | 100 | 32 |  mae (loss): 0.0528, mse: 0.0052 |
256, None | | relu | 100 | 32 |  mae (loss): 0.0865, mse: 0.0118 |
64, 32 | | relu | 100 | 32 |  mae (loss): 0.0456, mse: 0.0042 |
128, 64 | | relu | 100 | 32 |  mae (loss): 0.0455, mse: 0.0037 |
64, None | | selu | 100 | 32 |  mae (loss): 0.0701, mse: 0.0082 |
128, None | | selu | 100 | 32 |  mae (loss): 0.0616, mse: 0.0066 |
256, None | | selu | 100 | 32 |  mae (loss): 0.1230, mse: 0.0251 |
64, 32 | | selu | 100 | 32 |  mae (loss): 0.0443, mse: 0.0036 |
128, 64 | | selu | 100 | 32 |  mae (loss): 0.0551, mse: 0.0053 |
64, None | | tanh | 100 | 32 |  mae (loss): 0.0574, mse: 0.0059 |
128, None | | tanh | 100 | 32 |  mae (loss): 0.0583, mse: 0.0059 |
256, None | | tanh | 100 | 32 |  mae (loss): 0.0803, mse: 0.0111 |
64, 32 | | tanh | 100 | 32 |  mae (loss): 0.0644, mse: 0.0073 |
128, 64 | | tanh | 100 | 32 |  mae (loss): 0.0522, mse: 0.0048 |
64, None | | relu | 500 | 32 |  mae (loss): 0.0509, mse: 0.0054 |
128, None | | relu | 500 | 32 |  mae (loss): 0.0540, mse: 0.0054 |
256, None | | relu | 500 | 32 |  mae (loss): 0.0700, mse: 0.0079 |
64, 32 | | relu | 500 | 32 |  mae (loss): 0.0592, mse: 0.0060 |
128, 64 | | relu | 500 | 32 |  mae (loss): 0.0457, mse: 0.0040 |
64, None | | selu | 500 | 32 |  mae (loss): 0.0797, mse: 0.0102 |
128, None | | selu | 500 | 32 |  mae (loss): 0.0596, mse: 0.0060 |
256, None | | selu | 500 | 32 |  mae (loss): 0.0899, mse: 0.0128 |
64, 32 | | selu | 500 | 32 |  mae (loss): 0.0492, mse: 0.0044 |
128, 64 | | selu | 500 | 32 |  mae (loss): 0.0551, mse: 0.0052 |
64, None | | tanh | 500 | 32 |  mae (loss): 0.0566, mse: 0.0063 |
128, None | | tanh | 500 | 32 |  mae (loss): 0.0650, mse: 0.0071 |
256, None | | tanh | 500 | 32 |  mae (loss): 0.1004, mse: 0.0169 |
64, 32 | | tanh | 500 | 32 |  mae (loss): 0.0500, mse: 0.0045 |
128, 64 | | tanh | 500 | 32 |  mae (loss): 0.0735, mse: 0.0095 |
64, None | | relu | 1000 | 32 |  mae (loss): 0.0575, mse: 0.0066 |
128, None | | relu | 1000 | 32 |  mae (loss): 0.0687, mse: 0.0080 |
256, None | | relu | 1000 | 32 |  mae (loss): 0.0662, mse: 0.0073 |
64, 32 | | relu | 1000 | 32 |  mae (loss): 0.0514, mse: 0.0049 |
128, 64 | | relu | 1000 | 32 |  mae (loss): 0.0439, mse: 0.0036 |
64, None | | selu | 1000 | 32 |  mae (loss): 0.0691, mse: 0.0080 |
128, None | | selu | 1000 | 32 |  mae (loss): 0.0701, mse: 0.0076 |
256, None | | selu | 1000 | 32 |  mae (loss): 0.1051, mse: 0.0167 |
64, 32 | | selu | 1000 | 32 |  mae (loss): 0.0590, mse: 0.0061 |
128, 64 | | selu | 1000 | 32 |  mae (loss): 0.0613, mse: 0.0063 |
64, None | | tanh | 1000 | 32 |  mae (loss): 0.0661, mse: 0.0072 |
128, None | | tanh | 1000 | 32 |  mae (loss): 0.0656, mse: 0.0075 |
256, None | | tanh | 1000 | 32 |  mae (loss): 0.0959, mse: 0.0148 |
64, 32 | | tanh | 1000 | 32 |  mae (loss): 0.0548, mse: 0.0055 |
128, 64 | | tanh | 1000 | 32 |  mae (loss): 0.0603, mse: 0.0060 |
64, None | | relu | 100 | 64 |  mae (loss): 0.0582, mse: 0.0061 |
128, None | | relu | 100 | 64 |  mae (loss): 0.0576, mse: 0.0057 |
256, None | | relu | 100 | 64 |  mae (loss): 0.0542, mse: 0.0051 |
64, 32 | | relu | 100 | 64 |  mae (loss): 0.0424, mse: 0.0036 |
128, 64 | | relu | 100 | 64 |  mae (loss): 0.0509, mse: 0.0048 |
64, None | | selu | 100 | 64 |  mae (loss): 0.0614, mse: 0.0064 |
128, None | | selu | 100 | 64 |  mae (loss): 0.0685, mse: 0.0076 |
256, None | | selu | 100 | 64 |  mae (loss): 0.0801, mse: 0.0100 |
64, 32 | | selu | 100 | 64 |  mae (loss): 0.0459, mse: 0.0040 |
128, 64 | | selu | 100 | 64 |  mae (loss): 0.0614, mse: 0.0063 |
64, None | | tanh | 100 | 64 |  mae (loss): 0.0741, mse: 0.0091 |
128, None | | tanh | 100 | 64 |  mae (loss): 0.0623, mse: 0.0063 |
256, None | | tanh | 100 | 64 |  mae (loss): 0.1688, mse: 0.0380 |
64, 32 | | tanh | 100 | 64 |  mae (loss): 0.0625, mse: 0.0065 |
128, 64 | | tanh | 100 | 64 |  mae (loss): 0.0846, mse: 0.0112 |
64, None | | relu | 500 | 64 |  mae (loss): 0.0598, mse: 0.0062 |
128, None | | relu | 500 | 64 |  mae (loss): 0.0549, mse: 0.0054 |
256, None | | relu | 500 | 64 |  mae (loss): 0.0595, mse: 0.0058 |
64, 32 | | relu | 500 | 64 |  mae (loss): 0.0436, mse: 0.0037 |
128, 64 | | relu | 500 | 64 |  mae (loss): 0.0518, mse: 0.0048 |
64, None | | selu | 500 | 64 |  mae (loss): 0.0575, mse: 0.0058 |
128, None | | selu | 500 | 64 |  mae (loss): 0.0769, mse: 0.0094 |
256, None | | selu | 500 | 64 |  mae (loss): 0.0658, mse: 0.0075 |
64, 32 | | selu | 500 | 64 |  mae (loss): 0.0497, mse: 0.0046 |
128, 64 | | selu | 500 | 64 |  mae (loss): 0.0588, mse: 0.0060 |
64, None | | tanh | 500 | 64 |  mae (loss): 0.0513, mse: 0.0049 |
128, None | | tanh | 500 | 64 |  mae (loss): 0.1056, mse: 0.0166 |
256, None | | tanh | 500 | 64 |  mae (loss): 0.0700, mse: 0.0079 |
64, 32 | | tanh | 500 | 64 |  mae (loss): 0.0548, mse: 0.0053 |
128, 64 | | tanh | 500 | 64 |  mae (loss): 0.0724, mse: 0.0087 |
64, None | | relu | 1000 | 64 |  mae (loss): 0.0518, mse: 0.0050 |
128, None | | relu | 1000 | 64 |  mae (loss): 0.0471, mse: 0.0043 |
256, None | | relu | 1000 | 64 |  mae (loss): 0.0679, mse: 0.0077 |
64, 32 | | relu | 1000 | 64 |  mae (loss): 0.0530, mse: 0.0063 |
128, 64 | | relu | 1000 | 64 |  mae (loss): 0.0476, mse: 0.0043 |
64, None | | selu | 1000 | 64 |  mae (loss): 0.0565, mse: 0.0055 |
128, None | | selu | 1000 | 64 |  mae (loss): 0.0725, mse: 0.0089 |
256, None | | selu | 1000 | 64 |  mae (loss): 0.0794, mse: 0.0108 |
64, 32 | | selu | 1000 | 64 |  mae (loss): 0.0483, mse: 0.0043 |
128, 64 | | selu | 1000 | 64 |  mae (loss): 0.0614, mse: 0.0061 |
64, None | | tanh | 1000 | 64 |  mae (loss): 0.0524, mse: 0.0050 |
128, None | | tanh | 1000 | 64 |  mae (loss): 0.0718, mse: 0.0082 |
256, None | | tanh | 1000 | 64 |  mae (loss): 0.0701, mse: 0.0089 |
64, 32 | | tanh | 1000 | 64 |  mae (loss): 0.0493, mse: 0.0044 |
128, 64 | | tanh | 1000 | 64 |  mae (loss): 0.0667, mse: 0.0073 |
64, None | | relu | 100 | 128 |  mae (loss): 0.0513, mse: 0.0049 |
128, None | | relu | 100 | 128 |  mae (loss): 0.0554, mse: 0.0055 |
256, None | | relu | 100 | 128 |  mae (loss): 0.0823, mse: 0.0102 |
64, 32 | | relu | 100 | 128 |  mae (loss): 0.0422, mse: 0.0035 |
128, 64 | | relu | 100 | 128 |  mae (loss): 0.0440, mse: 0.0036 |
64, None | | selu | 100 | 128 |  mae (loss): 0.0556, mse: 0.0055 |
128, None | | selu | 100 | 128 |  mae (loss): 0.0767, mse: 0.0097 |
256, None | | selu | 100 | 128 |  mae (loss): 0.1208, mse: 0.0215 |
64, 32 | | selu | 100 | 128 |  mae (loss): 0.0509, mse: 0.0047 |
128, 64 | | selu | 100 | 128 |  mae (loss): 0.0446, mse: 0.0036 |
64, None | | tanh | 100 | 128 |  mae (loss): 0.0534, mse: 0.0051 |
128, None | | tanh | 100 | 128 |  mae (loss): 0.0577, mse: 0.0056 |
256, None | | tanh | 100 | 128 |  mae (loss): 0.2153, mse: 0.0647 |
64, 32 | | tanh | 100 | 128 |  mae (loss): 0.0511, mse: 0.0045 |
128, 64 | | tanh | 100 | 128 |  mae (loss): 0.0667, mse: 0.0077 |
64, None | | relu | 500 | 128 |  mae (loss): 0.0522, mse: 0.0052 |
128, None | | relu | 500 | 128 |  mae (loss): 0.0487, mse: 0.0045 |
256, None | | relu | 500 | 128 |  mae (loss): 0.0994, mse: 0.0151 |
64, 32 | | relu | 500 | 128 |  mae (loss): 0.0431, mse: 0.0037 |
128, 64 | | relu | 500 | 128 |  mae (loss): 0.0476, mse: 0.0042 |
64, None | | selu | 500 | 128 |  mae (loss): 0.0628, mse: 0.0068 |
128, None | | selu | 500 | 128 |  mae (loss): 0.0992, mse: 0.0148 |
256, None | | selu | 500 | 128 |  mae (loss): 0.1198, mse: 0.0215 |
64, 32 | | selu | 500 | 128 |  mae (loss): 0.0609, mse: 0.0063 |
128, 64 | | selu | 500 | 128 |  mae (loss): 0.0534, mse: 0.0050 |
64, None | | tanh | 500 | 128 |  mae (loss): 0.0478, mse: 0.0043 |
128, None | | tanh | 500 | 128 |  mae (loss): 0.0680, mse: 0.0079 |
256, None | | tanh | 500 | 128 |  mae (loss): 0.1204, mse: 0.0228 |
64, 32 | | tanh | 500 | 128 |  mae (loss): 0.0629, mse: 0.0066 |
128, 64 | | tanh | 500 | 128 |  mae (loss): 0.0648, mse: 0.0072 |
64, None | | relu | 1000 | 128 |  mae (loss): 0.0542, mse: 0.0054 |
128, None | | relu | 1000 | 128 |  mae (loss): 0.0512, mse: 0.0050 |
256, None | | relu | 1000 | 128 |  mae (loss): 0.0640, mse: 0.0068 |
64, 32 | | relu | 1000 | 128 |  mae (loss): 0.0441, mse: 0.0039 |
128, 64 | | relu | 1000 | 128 |  mae (loss): 0.0491, mse: 0.0046 |
64, None | | selu | 1000 | 128 |  mae (loss): 0.0534, mse: 0.0051 |
128, None | | selu | 1000 | 128 |  mae (loss): 0.0876, mse: 0.0120 |
256, None | | selu | 1000 | 128 |  mae (loss): 0.0907, mse: 0.0126 |
64, 32 | | selu | 1000 | 128 |  mae (loss): 0.0493, mse: 0.0044 |
128, 64 | | selu | 1000 | 128 |  mae (loss): 0.0523, mse: 0.0049 |
64, None | | tanh | 1000 | 128 |  mae (loss): 0.0535, mse: 0.0053 |
128, None | | tanh | 1000 | 128 |  mae (loss): 0.0773, mse: 0.0097 |
256, None | | tanh | 1000 | 128 |  mae (loss): 0.2118, mse: 0.0652 |
64, 32 | | tanh | 1000 | 128 |  mae (loss): 0.0541, mse: 0.0050 |
128, 64 | | tanh | 1000 | 128 |  mae (loss): 0.0552, mse: 0.0053 |


Best result: neurons=64, 32, activation=relu, batch_size=128, epochs=100, MAE=0.0422, MSE=0.0035