This is the solution example for the exercises given in [PredictWeight.ipynb](PredictWeight.ipynb)

# First exercise: Data normalization using StandardScaler

Data loading works similarly as before:

In [None]:
import numpy as np
import matplotlib.pyplot as pp
import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1" #disable Tensorflow GPU usage, a simple example like this runs faster on CPU
import tensorflow as tf
from tensorflow import keras  
import pandas as pd

import pandas as pd
dataframe=pd.read_csv("https://raw.githubusercontent.com/PerttuHamalainen/MediaAI/master/Code/Datasets/weight-height.csv")
data=np.array(dataframe)
data=data[:,1:]
data=data.astype(np.float) 
data[:,0]*=2.54   
data[:,1]*=0.45359237
pp.scatter(data[:,0],data[:,1],marker=".")
pp.title("Relation of height and weight")
pp.xlabel("Height (centimeters)")
pp.ylabel("Weight (kilograms)")
pp.show()

Now, to optimize the data for neural networks, we make it zero-mean, unit standard deviation using the StandardScaler

In [None]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaler.fit(data)
scaled=scaler.transform(data)
pp.scatter(scaled[:,0],scaled[:,1],marker=".")
pp.title("Normalized data")
pp.show()

With this data, training the neural network will be much faster. We will first train a single-neuron network, i.e., a simple linear model. You should see that the model is pretty good already after 5 epochs, even though we use a smaller learning rate of 0.01, which allows the model to fit more accurately.

In [None]:
#keras.Sequential makes it easy to compose a neural network models out of layers
model = keras.Sequential()

#Add a 1-neuron layer with linear activation, taking one input value. 
#The input_shape=(1,) defines that there's only a single input value, but batch size is yet unknown.
#Note that this notation is a bit misleading, as the batch data index dimension is really the first one and not the second one.  
#Fortunately, the input_shape needs to only be specified for the first layer
model.add(keras.layers.Dense(1,input_shape=(1,)))

#Make the model ready for optimization using Adam optimizer (the usual reasonable first guess).
#The loss parameter defines the loss function that optimization tries to minimize, in this case
#the mean squared error between the network outputs and actual data values.
#The lr parameter is the "learning rate". With this simple model, we can use a high learning rate of 0.1,
#whereas many complex networks require 0.001 or even 0.0001. This makes training more stable but also more slow.
model.compile(optimizer=keras.optimizers.Adam(lr=0.01),loss="mean_squared_error")

#Define our training inputs and outputs. Our network takes in height (column 0 in the data) and outputs weight (column 1).
trainingInputs=scaled[:,0]
trainingOutputs=scaled[:,1]

#Reshape the tensors: This is needed because Tensorflow and Keras models expect to get data in batches, as specified above.
trainingInputs=np.reshape(trainingInputs,[trainingInputs.shape[0],1])
trainingOutputs=np.reshape(trainingOutputs,[trainingOutputs.shape[0],1])

#Fit (train) the model. Epochs defines how many times the network will see all data during the training.
model.fit(trainingInputs,trainingOutputs,verbose=1,epochs=5)

#Plot the data and predictions given by the network
pp.scatter(trainingInputs[:,0],trainingOutputs[:,0],marker=".")
pp.title("Relation of height and weight")
pp.xlabel("Height (centimeters)")
pp.ylabel("Weight (kilograms)")
predictions=model.predict(trainingInputs)
#NOTE: The predictions is of the same shape as trainingOutputs, i.e., [10000,1]
#scatter() expects 1-dimensional x and y arrays; thus, we need to use the [:,0] and [:,1] indexing.
pp.scatter(trainingInputs[:,0],predictions[:,0])
pp.show()

Note that when using this model for predicting real, non-scaled values, we must use the scaler. There's a few ways to go about it. If you want to stick to the transform() and inverse_transform() functions, it can be a bit cumbersome, as one needs to always think of what kind of tensor shapes are fed in to networks and functions and what shapes are received as output:

In [None]:
#We want to predict weight from this height:
height=200
#To get normalized height, we must compose a [1,2] shaped tensor for the scaler, i.e., 
#1 pair of heights and weights. Height is the first value and the second value can be anything (we use 0).
normalizedHeight=scaler.transform([[height,0]])
#The scaler returns a tensor of similar shape, let's get the actual height value out of it
normalizedHeight=normalizedHeight[0,0]
#Now, to get a normalized weight, we can feed this to the network.
#Again, the returned value is a batch, so we take the first element
normalizedWeight=model.predict([[normalizedHeight]])[0,0]
#Finally, we can use the scaler to get unnormalized 
weight=scaler.inverse_transform([[normalizedHeight,normalizedWeight]])
#The scaler returns a tensor of similar shape, let's get the actual weight value out of it
weight=weight[0,1]
print("The predicted weight for a person whose height is",height,"cm is",weight,"kg")

If you feel more comfortable doing the calculations yourself, you can directly access the scaler's mean and variance. The standard deviation equals the square root of the variance.

In [None]:
normalizedHeight=(200-scaler.mean_[0])/np.sqrt(scaler.var_[0])
normalizedWeight=model.predict([[normalizedHeight]])[0,0]
weight=normalizedWeight*np.sqrt(scaler.var_[1])+scaler.mean_[1]
print("The predicted weight for a person whose height is 200 cm is",weight,"kg")

# Second exercise: use a more complex neural network. 

To see things in more detail and get to demonstrate overfitting, we will only use the first 50 data points. Overfitting can be mitigated by using a larger dataset, a more simple model, or a regularization technique such as dropout.

Here, overfitting should manifest as a nonlinear model, even though the data is from a (noisy) linear relation.

To gain a better feel of overfitting, you can try different data, layer, and neuron counts 

In [None]:
model = keras.Sequential()
#add a layer with 32 ReLU neurons
model.add(keras.layers.Dense(4,activation="relu",input_shape=(1,)))
#add a layer with 32 ReLU neurons
model.add(keras.layers.Dense(4,activation="relu"))
#NOTE: we don't need to specify input_shape for others than the first layer. Keras can deduce it automatically.
#add the output layer (1 neuron because only 1 predicted value)
model.add(keras.layers.Dense(1))
model.compile(optimizer=keras.optimizers.Adam(lr=0.01),loss="mean_squared_error")

#Define our training inputs and outputs. Our network takes in height (column 0 in the data) and outputs weight (column 1).
trainingInputs=scaled[:50,0]
trainingOutputs=scaled[:50,1]

#Reshape the tensors: This is needed because Tensorflow and Keras models expect to get data in batches, as specified above.
trainingInputs=np.reshape(trainingInputs,[trainingInputs.shape[0],1])
trainingOutputs=np.reshape(trainingOutputs,[trainingOutputs.shape[0],1])

#Fit the model. Epochs defines how many times the network will see all data during the training.
model.fit(trainingInputs,trainingOutputs,verbose=2,epochs=100)

#Scatterplot both the data and the predictions
pp.scatter(trainingInputs,trainingOutputs)
predictions=model.predict(trainingInputs)
pp.scatter(trainingInputs,predictions)