# MLP with Keras

The problem we tackle here is attempting to learn a non-linear function. 

## Imports

In [58]:
import numpy as np
import random
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from tensorflow import keras


## The data

X is a bunch of four-length vectors. Y is a bunch of two length vectors. 
y0 = (x0+x1)^2 and y1 = (x2+x3)^2

In [15]:
X = np.array([[random.random() for i in range(4)] for j in range(500)])

In [16]:
X

array([[0.59594063, 0.18556316, 0.98459029, 0.79396285],
       [0.7664722 , 0.15209217, 0.80891076, 0.46117383],
       [0.40745343, 0.5930682 , 0.70656438, 0.65307099],
       ...,
       [0.55620659, 0.07214845, 0.60118177, 0.08206728],
       [0.74428055, 0.19686998, 0.92920736, 0.34143322],
       [0.5666348 , 0.51091642, 0.44214454, 0.69800628]])

In [21]:
Y = np.array([[(X[i][0] + X[i][1])**2, (X[i][3] + X[1][3])**2] for i in range(500)])

In [22]:
Y

array([[6.10748164e-01, 1.57536807e+00],
       [8.43760507e-01, 8.50725207e-01],
       [1.00104353e+00, 1.24154152e+00],
       [4.33554744e-02, 8.46942867e-01],
       [8.07141028e-01, 2.26697632e-01],
       [1.56070618e+00, 4.17417574e-01],
       [2.19060107e+00, 6.42453741e-01],
       [6.45733980e-02, 5.77371450e-01],
       [2.26178461e+00, 1.42864230e+00],
       [5.67227569e-01, 1.31460874e+00],
       [1.98853322e+00, 1.47256761e+00],
       [1.66936377e+00, 3.77076680e-01],
       [4.30181165e-01, 9.95627951e-01],
       [1.09670280e+00, 1.94694727e+00],
       [1.44366413e+00, 1.49808601e+00],
       [2.75961615e+00, 7.07789651e-01],
       [1.45258279e+00, 3.28280314e-01],
       [9.33385473e-01, 2.19194300e-01],
       [2.09827077e-01, 7.14133118e-01],
       [1.53650856e-01, 1.91989670e+00],
       [6.30870894e-01, 4.99070683e-01],
       [1.67625385e+00, 6.43199442e-01],
       [1.36201368e+00, 1.15269526e+00],
       [1.94919482e-01, 4.15097155e-01],
       [2.378240

## Input

The above can be given directly as input. We will train the whole thing as one batch. 

Before we write our own MLP, we will test if the data converges using the MLPRegressor model provided in scikit-learn

In [26]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state = 1)

In [28]:
regr = MLPRegressor(random_state = 1, max_iter = 500, verbose = True).fit(X_train, y_train)

Iteration 1, loss = 0.98878929
Iteration 2, loss = 0.93196232
Iteration 3, loss = 0.87759817
Iteration 4, loss = 0.82392072
Iteration 5, loss = 0.77410509
Iteration 6, loss = 0.72550599
Iteration 7, loss = 0.67952670
Iteration 8, loss = 0.63518745
Iteration 9, loss = 0.59256862
Iteration 10, loss = 0.55219221
Iteration 11, loss = 0.51486829
Iteration 12, loss = 0.47821968
Iteration 13, loss = 0.44353076
Iteration 14, loss = 0.41137487
Iteration 15, loss = 0.38159616
Iteration 16, loss = 0.35362554
Iteration 17, loss = 0.32721202
Iteration 18, loss = 0.30357403
Iteration 19, loss = 0.28062569
Iteration 20, loss = 0.26099770
Iteration 21, loss = 0.24329345
Iteration 22, loss = 0.22736430
Iteration 23, loss = 0.21316705
Iteration 24, loss = 0.20120390
Iteration 25, loss = 0.19081073
Iteration 26, loss = 0.18181496
Iteration 27, loss = 0.17446240
Iteration 28, loss = 0.16867304
Iteration 29, loss = 0.16348780
Iteration 30, loss = 0.15913208
Iteration 31, loss = 0.15553468
Iteration 32, los

In [30]:
regr.predict(X_test)

array([[ 1.17029031,  0.88218779],
       [ 1.98621892,  0.17452881],
       [ 1.56236689,  0.59444916],
       [ 1.51412392,  1.59829281],
       [ 2.5552409 ,  1.72616195],
       [ 1.12566115,  0.19879228],
       [ 1.04986135,  1.70534047],
       [ 0.7696764 ,  0.61150714],
       [ 1.48588212,  1.44314823],
       [ 1.72586205,  1.52619437],
       [ 1.12247561,  1.2586208 ],
       [ 0.85922021,  1.36888576],
       [ 1.77997968,  0.07213514],
       [ 1.14144572,  0.36040755],
       [ 1.75022051,  1.10761443],
       [ 2.86613349,  1.82720056],
       [ 1.4580316 ,  1.18949924],
       [ 0.42203753,  0.85263025],
       [ 2.18726855,  0.44167762],
       [ 0.90716034,  1.19061872],
       [ 0.62129312,  1.03948622],
       [ 0.75628139,  0.08595717],
       [ 1.36216928,  1.23193792],
       [ 2.28361756,  1.60943488],
       [ 1.67663479,  0.74662744],
       [ 1.06238118,  1.92035957],
       [ 0.31665615,  0.92848061],
       [ 1.25974955,  0.71631187],
       [ 1.72278198,

In [32]:
y_test

array([[1.06302696e+00, 8.15276414e-01],
       [2.01994608e+00, 2.73669442e-01],
       [1.44617401e+00, 5.59786508e-01],
       [1.45191376e+00, 1.59655466e+00],
       [2.97926623e+00, 1.79445872e+00],
       [9.66473996e-01, 2.88984729e-01],
       [9.02952767e-01, 1.72139423e+00],
       [6.35325515e-01, 5.63394574e-01],
       [1.37280280e+00, 1.39032869e+00],
       [1.63178411e+00, 1.49752984e+00],
       [1.00004917e+00, 1.18462685e+00],
       [6.99479992e-01, 1.30712245e+00],
       [1.77489087e+00, 2.25199201e-01],
       [9.82401514e-01, 3.80281102e-01],
       [1.72881895e+00, 1.03141845e+00],
       [3.56574719e+00, 1.94871248e+00],
       [1.30396862e+00, 1.10751669e+00],
       [3.78870407e-01, 7.64414722e-01],
       [2.28573567e+00, 4.55581011e-01],
       [7.75927600e-01, 1.10173463e+00],
       [5.50274715e-01, 9.42464874e-01],
       [6.38158040e-01, 2.27753263e-01],
       [1.28518433e+00, 1.17816077e+00],
       [2.45843228e+00, 1.67056516e+00],
       [1.554419

In [33]:
regr.score(X_test, y_test)

0.9654985538080119

In [35]:
regr.n_layers_

3

### Comments
So, it is clear that neural networks are used for regression as well, and they are quite good at it.  
I will be now trying to replicate exactly what this does using keras. 

Don't forget the existence of scikit-learn, and always see if you can do the easy things and test with scikit-learn. 

## Model

DAGs of layers are called models. There are two ways to define models,
use pre-existing layers and stack them together (the functional API)
or define your own. 


In [36]:
# We define a model and add one Dense layer to it. 

In [41]:
inputs = keras.Input(shape = (4,), dtype="float32")

In [42]:
hidden = keras.layers.Dense(30, activation = "relu")(inputs)

In [43]:
outputs = keras.layers.Dense(2, activation = "relu")(hidden)

In [46]:
model = keras.Model(inputs=inputs, outputs=outputs, name = "MLP")

In [47]:
model.summary()

Model: "MLP"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 4)]               0         
_________________________________________________________________
dense_1 (Dense)              (None, 30)                150       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 62        
Total params: 212
Trainable params: 212
Non-trainable params: 0
_________________________________________________________________


In [50]:
optimizer = keras.optimizers.Adam()

In [51]:
loss = keras.losses.MeanSquaredError()

In [52]:
model.compile(optimizer = optimizer, loss = loss, metrics = ['accuracy', 'mse'])

## Training all together

In [53]:
model.fit(X, Y, epochs = 500, verbose = 2)

Epoch 1/500
16/16 - 0s - loss: 1.4796 - accuracy: 0.4400 - mse: 1.4796
Epoch 2/500
16/16 - 0s - loss: 1.3118 - accuracy: 0.4720 - mse: 1.3118
Epoch 3/500
16/16 - 0s - loss: 1.0751 - accuracy: 0.4780 - mse: 1.0751
Epoch 4/500
16/16 - 0s - loss: 0.7766 - accuracy: 0.4600 - mse: 0.7766
Epoch 5/500
16/16 - 0s - loss: 0.5344 - accuracy: 0.5280 - mse: 0.5344
Epoch 6/500
16/16 - 0s - loss: 0.3862 - accuracy: 0.6180 - mse: 0.3862
Epoch 7/500
16/16 - 0s - loss: 0.3112 - accuracy: 0.6280 - mse: 0.3112
Epoch 8/500
16/16 - 0s - loss: 0.2742 - accuracy: 0.6680 - mse: 0.2742
Epoch 9/500
16/16 - 0s - loss: 0.2466 - accuracy: 0.7220 - mse: 0.2466
Epoch 10/500
16/16 - 0s - loss: 0.2231 - accuracy: 0.7560 - mse: 0.2231
Epoch 11/500
16/16 - 0s - loss: 0.2016 - accuracy: 0.7820 - mse: 0.2016
Epoch 12/500
16/16 - 0s - loss: 0.1822 - accuracy: 0.8360 - mse: 0.1822
Epoch 13/500
16/16 - 0s - loss: 0.1643 - accuracy: 0.8580 - mse: 0.1643
Epoch 14/500
16/16 - 0s - loss: 0.1484 - accuracy: 0.8700 - mse: 0.1484
E

Epoch 115/500
16/16 - 0s - loss: 0.0045 - accuracy: 0.9860 - mse: 0.0045
Epoch 116/500
16/16 - 0s - loss: 0.0045 - accuracy: 0.9860 - mse: 0.0045
Epoch 117/500
16/16 - 0s - loss: 0.0044 - accuracy: 0.9840 - mse: 0.0044
Epoch 118/500
16/16 - 0s - loss: 0.0043 - accuracy: 0.9840 - mse: 0.0043
Epoch 119/500
16/16 - 0s - loss: 0.0043 - accuracy: 0.9840 - mse: 0.0043
Epoch 120/500
16/16 - 0s - loss: 0.0043 - accuracy: 0.9860 - mse: 0.0043
Epoch 121/500
16/16 - 0s - loss: 0.0042 - accuracy: 0.9860 - mse: 0.0042
Epoch 122/500
16/16 - 0s - loss: 0.0041 - accuracy: 0.9840 - mse: 0.0041
Epoch 123/500
16/16 - 0s - loss: 0.0041 - accuracy: 0.9840 - mse: 0.0041
Epoch 124/500
16/16 - 0s - loss: 0.0040 - accuracy: 0.9840 - mse: 0.0040
Epoch 125/500
16/16 - 0s - loss: 0.0040 - accuracy: 0.9860 - mse: 0.0040
Epoch 126/500
16/16 - 0s - loss: 0.0039 - accuracy: 0.9880 - mse: 0.0039
Epoch 127/500
16/16 - 0s - loss: 0.0039 - accuracy: 0.9820 - mse: 0.0039
Epoch 128/500
16/16 - 0s - loss: 0.0038 - accuracy:

Epoch 228/500
16/16 - 0s - loss: 9.7832e-04 - accuracy: 0.9900 - mse: 9.7832e-04
Epoch 229/500
16/16 - 0s - loss: 9.5942e-04 - accuracy: 0.9860 - mse: 9.5942e-04
Epoch 230/500
16/16 - 0s - loss: 9.7182e-04 - accuracy: 0.9880 - mse: 9.7182e-04
Epoch 231/500
16/16 - 0s - loss: 9.4276e-04 - accuracy: 0.9920 - mse: 9.4276e-04
Epoch 232/500
16/16 - 0s - loss: 9.3436e-04 - accuracy: 0.9920 - mse: 9.3436e-04
Epoch 233/500
16/16 - 0s - loss: 9.1143e-04 - accuracy: 0.9900 - mse: 9.1143e-04
Epoch 234/500
16/16 - 0s - loss: 9.1261e-04 - accuracy: 0.9900 - mse: 9.1261e-04
Epoch 235/500
16/16 - 0s - loss: 8.9456e-04 - accuracy: 0.9860 - mse: 8.9456e-04
Epoch 236/500
16/16 - 0s - loss: 8.8134e-04 - accuracy: 0.9940 - mse: 8.8134e-04
Epoch 237/500
16/16 - 0s - loss: 8.8466e-04 - accuracy: 0.9900 - mse: 8.8466e-04
Epoch 238/500
16/16 - 0s - loss: 8.6626e-04 - accuracy: 0.9900 - mse: 8.6626e-04
Epoch 239/500
16/16 - 0s - loss: 8.4590e-04 - accuracy: 0.9900 - mse: 8.4590e-04
Epoch 240/500
16/16 - 0s - l

16/16 - 0s - loss: 4.0116e-04 - accuracy: 0.9920 - mse: 4.0116e-04
Epoch 330/500
16/16 - 0s - loss: 3.9442e-04 - accuracy: 0.9940 - mse: 3.9442e-04
Epoch 331/500
16/16 - 0s - loss: 3.8651e-04 - accuracy: 0.9920 - mse: 3.8651e-04
Epoch 332/500
16/16 - 0s - loss: 3.8950e-04 - accuracy: 0.9960 - mse: 3.8950e-04
Epoch 333/500
16/16 - 0s - loss: 3.8383e-04 - accuracy: 0.9920 - mse: 3.8383e-04
Epoch 334/500
16/16 - 0s - loss: 3.8424e-04 - accuracy: 0.9940 - mse: 3.8424e-04
Epoch 335/500
16/16 - 0s - loss: 3.8311e-04 - accuracy: 0.9960 - mse: 3.8311e-04
Epoch 336/500
16/16 - 0s - loss: 3.7643e-04 - accuracy: 0.9960 - mse: 3.7643e-04
Epoch 337/500
16/16 - 0s - loss: 3.8815e-04 - accuracy: 0.9920 - mse: 3.8815e-04
Epoch 338/500
16/16 - 0s - loss: 3.7398e-04 - accuracy: 0.9960 - mse: 3.7398e-04
Epoch 339/500
16/16 - 0s - loss: 3.8475e-04 - accuracy: 0.9900 - mse: 3.8475e-04
Epoch 340/500
16/16 - 0s - loss: 3.8737e-04 - accuracy: 0.9920 - mse: 3.8737e-04
Epoch 341/500
16/16 - 0s - loss: 3.7692e-0

Epoch 431/500
16/16 - 0s - loss: 2.8920e-04 - accuracy: 0.9920 - mse: 2.8920e-04
Epoch 432/500
16/16 - 0s - loss: 2.6987e-04 - accuracy: 0.9940 - mse: 2.6987e-04
Epoch 433/500
16/16 - 0s - loss: 2.8397e-04 - accuracy: 0.9880 - mse: 2.8397e-04
Epoch 434/500
16/16 - 0s - loss: 2.6477e-04 - accuracy: 0.9920 - mse: 2.6477e-04
Epoch 435/500
16/16 - 0s - loss: 2.6951e-04 - accuracy: 0.9960 - mse: 2.6951e-04
Epoch 436/500
16/16 - 0s - loss: 2.7483e-04 - accuracy: 0.9940 - mse: 2.7483e-04
Epoch 437/500
16/16 - 0s - loss: 2.6215e-04 - accuracy: 0.9900 - mse: 2.6215e-04
Epoch 438/500
16/16 - 0s - loss: 2.6481e-04 - accuracy: 0.9920 - mse: 2.6481e-04
Epoch 439/500
16/16 - 0s - loss: 2.6786e-04 - accuracy: 0.9900 - mse: 2.6786e-04
Epoch 440/500
16/16 - 0s - loss: 2.5659e-04 - accuracy: 0.9940 - mse: 2.5659e-04
Epoch 441/500
16/16 - 0s - loss: 2.6135e-04 - accuracy: 0.9920 - mse: 2.6135e-04
Epoch 442/500
16/16 - 0s - loss: 2.6489e-04 - accuracy: 0.9960 - mse: 2.6489e-04
Epoch 443/500
16/16 - 0s - l

<tensorflow.python.keras.callbacks.History at 0x7f6f4b42b250>

## Testing

In [54]:
model.predict(X_test)

array([[1.0440782 , 0.8258589 ],
       [2.0093324 , 0.2710364 ],
       [1.4420075 , 0.582639  ],
       [1.4604161 , 1.6246014 ],
       [2.990481  , 1.8063687 ],
       [0.9604102 , 0.29500544],
       [0.8944757 , 1.7381955 ],
       [0.6221804 , 0.5738717 ],
       [1.3737028 , 1.401436  ],
       [1.6201532 , 1.5246603 ],
       [0.9864702 , 1.1976547 ],
       [0.68879664, 1.3201724 ],
       [1.7574049 , 0.21459852],
       [0.9748007 , 0.3924694 ],
       [1.7243252 , 1.0090148 ],
       [3.5236568 , 1.9423163 ],
       [1.286489  , 1.109611  ],
       [0.3748245 , 0.7475952 ],
       [2.282292  , 0.47905767],
       [0.78004366, 1.1142153 ],
       [0.544019  , 0.91665095],
       [0.6088043 , 0.21514542],
       [1.2721297 , 1.1807095 ],
       [2.4624035 , 1.6721895 ],
       [1.5490291 , 0.6912559 ],
       [0.92091626, 2.0596905 ],
       [0.33110917, 0.82759726],
       [1.1883221 , 0.6704893 ],
       [1.5958216 , 1.1578085 ],
       [2.1816232 , 0.21118222],
       [0.

In [55]:
y_test

array([[1.06302696e+00, 8.15276414e-01],
       [2.01994608e+00, 2.73669442e-01],
       [1.44617401e+00, 5.59786508e-01],
       [1.45191376e+00, 1.59655466e+00],
       [2.97926623e+00, 1.79445872e+00],
       [9.66473996e-01, 2.88984729e-01],
       [9.02952767e-01, 1.72139423e+00],
       [6.35325515e-01, 5.63394574e-01],
       [1.37280280e+00, 1.39032869e+00],
       [1.63178411e+00, 1.49752984e+00],
       [1.00004917e+00, 1.18462685e+00],
       [6.99479992e-01, 1.30712245e+00],
       [1.77489087e+00, 2.25199201e-01],
       [9.82401514e-01, 3.80281102e-01],
       [1.72881895e+00, 1.03141845e+00],
       [3.56574719e+00, 1.94871248e+00],
       [1.30396862e+00, 1.10751669e+00],
       [3.78870407e-01, 7.64414722e-01],
       [2.28573567e+00, 4.55581011e-01],
       [7.75927600e-01, 1.10173463e+00],
       [5.50274715e-01, 9.42464874e-01],
       [6.38158040e-01, 2.27753263e-01],
       [1.28518433e+00, 1.17816077e+00],
       [2.45843228e+00, 1.67056516e+00],
       [1.554419

In [60]:
r2_score(y_test, model.predict(X_test))

0.9995040401233306

## This looks good!