<img src="assets/logo.png" width="800">

Made by **Balázs Nagy** and **Márk Domokos**

[<img src="assets/open_button.png">](https://colab.research.google.com/github/Fortuz/edu_Adaptive/blob/main/practices/L08%20-%20Regularization_solved.ipynb)

# Labor 08 - Regularisation of Linear Regression and Bias - Variance

### Water flow

In the first part of the exercise, a linear regression is implemented to predict the amount of water spilled from a tank based on how much water is in the tank. In the second half of the exercise, we observe the debugging of the learning algorithms and the bias and variance type errors.

### 1: Import, load and visualise data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat
import scipy.optimize as op

# keras imports for the dataset and building our neural network
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras import regularizers
from keras import optimizers


from numpy.random import seed
# Since this nootebook contains a random initialization it is advised to fix the seed, to get always the same random results
seed(1)

### 2: Load data

The data will be loaded from a publicly available file. An alternative solution would be to upload the data file directly to the google colab file system. 

In [None]:
!wget https://github.com/Fortuz/edu_Adaptive/raw/main/practices/assets/Lab08/Lab8data.mat
!wget https://github.com/Fortuz/edu_Adaptive/raw/main/practices/assets/Lab08/w_final.txt

Load in the data! Use the Pandas package to do this and then convert it into a numpy array.

In [None]:
data = loadmat("Lab8data.mat")                         
X_train = data["X"]                                    
Y_train = data["y"]
X_val   = data["Xval"]
Y_val   = data["yval"]
X_test  = data["Xtest"]
Y_test  = data["ytest"]

del data
m,n = X_train.shape
print('Shape of X:', X_train.shape)
print('Shape of Y:', Y_train.shape)

### 3: Visualization

Let's vizualise the data set to understand it more.

In [None]:
plt.plot(X_train,Y_train,'x') 
plt.title('Training data')
plt.xlabel('Change in water level (X)')
plt.ylabel('Water floeing out of the dam (Y)')
plt.show()

Machine learning goes as far as deep learning. The limit is the single layer neural network. With this in mind, let's return to the punishment process introduced in lab L04, now using neural network packages. 

### Linearized regression with regularization

We will create a network corresponding to a linear regression, which in our case will be based on 1 input variable and will consist of 1 layer containing 1 neuron. The BIAS tag will be added naturally and a penalty tag will be added. Two commonly used penalty methods are L1 and L2.

#### L2 regularizáció (Ridge regression)
$ C(w)=\frac{1}{2m}\sum_{i=1}^m(h_w(x^i)-y^i)^2+ \color{red}{\lambda\sum_{j=1}^nw_j^2} $

This technique can be used to prevent over fitting. $\lambda $  should be choosen wisely. Setting it too high can result in under fitting. 

#### L1 regularisation (Lasso regression)
$ C(w)=\frac{1}{2m}\sum_{i=1}^m(h_w(x^i)-y^i)^2+ \color{red}{\lambda\sum_{j=1}^n|w_j|} $

This technique can also reduce certain coefficients to 0, so it is useful for input parameter selection.

Vizsgáljuk meg az alábbi háló összeállítást.

In [None]:
# Underfit Case
Lambda = 0
lr_rate = 0.005
epoch = 100

# building a linear stack of layers with the sequential model
model = Sequential()
model.add(Dense(1, input_shape=(1,), use_bias=True, kernel_regularizer=regularizers.l2(Lambda)))

# compiling the sequential model
sgd = optimizers.Adam(lr=lr_rate)
model.compile(loss='MSE', optimizer=sgd)

# training the model and saving metrics in history
history = model.fit(X_train, Y_train, epochs=epoch, validation_data=(X_val, Y_val), verbose = 2)

Let's do the prediction.

In [None]:
################### CODE HERE ######################## 
# Implement the prediction step.
# Use the predict() function of the model



######################################################

Plot the result.

In [None]:
plt.plot(X_train,Y_train,'x')   # adatok megjelenítése
plt.plot(X_test,Y_pred,'x', color='red')   # adatok megjelenítése
plt.title('Training data')
plt.xlabel('Change in water level (X)')
plt.ylabel('Water floeing out of the dam (Y)')
plt.legend(['Training data', 'Prediction (on Test data)'])
plt.show()

In line with our expectations and our simple model design, we obtained a linear estimate that does not seem to adequately capture our data. Using the built-in metrics, we can examine the running of the cost function on the training and validation data. To do this, we first plot the corresponding metrics.

In [None]:
# plotting the metrics
fig = plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss functions')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Training loss', 'Validation loss'])

In the graph, we can observe that both the cost function of the training data and the cost function of the validation data remain high. Higher than one would expect from a well-functioning predictor.

<img src="assets/Lab08/Pics/L08_HighBias.png" width="600">

So let's change our model to get a better result.

### 4: Poly Feature

To get a better result, we need more input parameters. One possible way to generate more parameters is to take the polynomial coefficients of the available parameter and feed them into your neural network as additional parameters.

Write a function that takes as input the matrix of actual samples and enriches it with the appropriate columns up to the desired exponent. $x => x, x^2, x^3 ... x^p$

In [None]:
def polyFeatures(X,p=9):
    ################### CODE HERE ######################## 
    # Implement the feature extension for the input data




            
    ######################################################    
    return X_poly

Do not forget to lower the values, otherwise the high exponent members would distort the learning very much. Write a normalization function that takes a sample matrix as input and returns the means and variances as a vector in addition to the normalized matrix.

In [None]:
def featureNormalize(X):
    ################### CODE HERE ######################## 
    # Normalize the features of the input data.
    # Each features has different normalization values.

    


    
    
    ######################################################    
    return X_norm,avg,std

Perform both the polynomial variable enrichment and the normalization. 

$\color{red}{Warning!}$

You should be aware of that during training you only want to use information from the training data set. This means that we should also normalise the test and validation data using the mean and standard deviation calculated from the training data. Otherwise, we would leak data back to the algorithm that could distort the learning result. The validation and test datasets thus remain truly independent and the algorithm can only work with the information obtained from the training data.

In [None]:
order=9

# Training data
X_train_p = polyFeatures(X_train, order)                                # polynomial features
X_train_pn, mu, sigma = featureNormalize(X_train_p)                     # feature normalization

################### CODE HERE ########################  
# Perform the feature extension and normalization for the remaining data sets as well

# Validation data


# Teszt data


###################################################### 

print("""Expected Normalized Training Example for order=3 (approx.):
[-0.362 -0.755  0.182 ]""")
print('Normalized Training Example 1:\n',X_train_pn[0,:])

Set the parameters.

In [None]:
# Just Right Case
Lambda = 0
lr_rate = 0.01
epoch = 100
optim = optimizers.SGD(lr=lr_rate)

Since we have increased the number of variables, we need to modify our model. The input layer should fit the size of our increased input matrix, but we still only want one output.

In [None]:
# Building a linear stack of layers with the sequential model
model2 = Sequential()

################### CODE HERE ######################## 
# Build a 2 layer Neural Network matching the input shape of the data and producing 1 output.
# Use BIAS and L2 regularization.




###################################################### 

After the architecture of the model is established, the model is compiled and the cost function and the optimizer are defined. Again, there are several options to choose from, such as SGD or ADAM.

In [None]:
# Compiling the sequential model
# optim = optimizers.SGD(lr=lr_rate)
optim = optimizers.Adam(lr=lr_rate)
model2.compile(loss='MSE', optimizer=optim)

# Training the model and saving metrics in history
history2 = model2.fit(X_train_pn, Y_train, epochs=epoch, validation_data=(X_val_pn, Y_val), verbose = 2)

After training the model, we make this prediction on our test data.

In [None]:
################### CODE HERE ########################
# Calculate the predictions using the recently trained model.
 


###################################################### 

Let's plot the result of the prediction.

In [None]:
plt.plot(X_test,Y_pred2,'x', color="red") 
plt.plot(X_test,Y_test,'x', color="green") 
plt.title('Training data')
plt.xlabel('Change in water level (X)')
plt.ylabel('Water floeing out of the dam (Y)')
plt.legend(['Prediction', 'Test'])
plt.show()

You can see that the extended model gives a much better result, which fits our data much better. The result is also reflected in the metrics.

In [None]:
# plotting the metrics
fig = plt.figure()
plt.plot(history2.history['loss'])
plt.plot(history2.history['val_loss'])
plt.title('Loss functions')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Train loss', 'Validation loss'])

Both the cost function of our training data and the cost function of our validation data were significantly lower than in our first attempt.

<img src="assets/Lab08/Pics/L08_HighBias.png" width="600">

You can extend your model further and tune the hyperparameters, but be careful not to fall into the trap of over-learning, as in the model below.

In our case, we chose too many input parameters and ran the learning that way.

In [None]:
#Overfit
Lambda = 0
lr_rate = 0.001
epoch = 100
order=12
optim = optimizers.Adam(lr=lr_rate)

In [None]:
# Train
X_train_p = polyFeatures(X_train, order)                                # polynomial features
X_train_pn, mu, sigma = featureNormalize(X_train_p)                     # feature normalization

# Validation
X_val_p = polyFeatures(X_val, order)
X_val_pn = (X_val_p-mu)/sigma

# Teszt
X_test_p = polyFeatures(X_test, order)
X_test_pn = (X_test_p-mu)/sigma

In [None]:
# Building a linear stack of layers with the sequential model
model3 = Sequential()
model3.add(Dense(order, input_shape=(order,), use_bias=True, kernel_regularizer=regularizers.l2(Lambda)))
model3.add(Dense(1))

In [None]:
# compiling the sequential model
optim = optimizers.SGD(lr=lr_rate)
model3.compile(loss='MSE', optimizer=optim)

# training the model and saving metrics in history
history3 = model3.fit(X_train_pn, Y_train, epochs=epoch, validation_data=(X_val_pn, Y_val), verbose = 2)

In [None]:
Y_pred3 = model3.predict(X_test_pn)

In [None]:
plt.plot(X_test,Y_pred3,'x', color="red") 
plt.plot(X_test,Y_test,'x', color="green") 
plt.title('Training data')
plt.xlabel('Change in water level (X)')
plt.ylabel('Water floeing out of the dam (Y)')
plt.legend(['Prediction', 'Test'])
plt.show()

In [None]:
# plotting the metrics
fig = plt.figure()
plt.plot(history3.history['loss'])
plt.plot(history3.history['val_loss'])
plt.title('Loss functions')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Train loss', 'Validation loss'])

It can be seen that the model learns too much on the training samples and after the 60th epoch the cost function of the training samples continues to decrease, but the result on the validation samples starts to deteriorate. This means that although the cost function of the training samples performs well below an expected level, the cost function of the validation and test samples will be well above the expected result. The "scissors are open", there will be a large gap between the two cost functions.

<img src="assets/Lab08/Pics/L08_HighVariance.png" width="600">

### 5: Summary and Tips

Underfitting (High Bias problem):
- Adding more input parameters to the model (either independent or polynomial).
- $\lambda$ parameter reduction, i.e. less penalization.
- Increase the epoch number.
- Increase $\alpha$ (learning rate).

Overfitting (High Variance problem):
- Less input parameter.
- Increase the $\lambda$ parameter, i.e. we penalize more.
- More learning samples.
- Reducing $\alpha$ (learning rate).

<div style="text-align: right">This lab exercise uses elements from Andrew Ng's Machine Learning course.</div>