# This is a Modified Copy of the Basic Example

This is a modified copy of the basic example of a Single and Multilayer Perceptron for Classification and Regression.

It will allow to test different scenarios by changing the global variables at the beginning of the script.

**TO SIMPLIFY, ONLY CLASSIFICATION PROBLEMS ARE TESTED**

**This script has been tested with the following package versions:**
- pandas 1.3.3
- sklearn 0.24.0
- keras 2.2.4 + tensorflow 1.14.0 / keras 2.6.0 + tensorflow 2.6.0

# This is a list of things that you can test:
- Check that the results can be very different:
  - For different runs with the same parameters (non-linear networks)
  - By changing the value of several data parameters:
    - TRAIN\_SIZE\_PCT\_SPLIT: 0.70,0.50,0.20,0.05,0.01 (ionosphere, MLP-100-50, SGD, LR = 0.01)
    - SCALE\_INPUTS\_FUNCTION (hepatits, linear or non-linear networks)
  - By changing the value of several critical training parameters (ionosphere, MLP-100-50):
    - TRAINING\_ALGORITHM: SGD, RMSprop, Adam
    - LEARNING\_RATE: 0.1,0.01,0.001,0.0001
    - MOMENTUM\_RATE: 0.80, 0.00 (SGD, LR = 0.001)
    - FACTIVATION\_HIDDEN1 and FACTIVATION\_HIDDEN2: tanh, relu (SGD, LR = 0.001)
    - FACTIVATION\_OUTPUT: softmax, linear
- Underfitting and overfitting:
  - Underfitting:
    - Train a linear model and compare with a non-linear one (xor)
    - Train a non-linear model with small LR few epochs (ionosphere - MLP-100-50, SGD, LR <= 0.0001, EPOCHS = 200)
  - Overfitting:
    - Train a non-linear model with large LR few epochs (ionosphere - MLP-100-50, SGD, LR >= 0.01, EPOCHS = 200)
    - Train a non-linear model with small LR many epochs (ionosphere - MLP-100-50, SGD, LR <= 0.0001, EPOCHS = 5000)
    - Train a non-linear model with many hidden layers (sonar - MLP-20 vs MLP-20-20-20-20-20, SGD, LR = 0.002,   EPOCHS = 200)
    - Curse of Dimensionality: small value for TRAIN\_SIZE\_PCT\_SPLIT (ionosphere - MLP-100-50, SGD, LR = 0.01, EPOCHS = 200)
- The effect of the noise in the data:
  - Add random inputs and labels: NRANDOM_EXAMPLES != 0)
  - Add random features: NRANDOM_FEATURES = 1000,10000 (ionosphere - MLP-100-50, SGD, LR = 0.001, EPOCHS = 200)
  - Add noise to the inputs: ADD_NOISE_INPUTS = True
  - Shuffle the labels: SHUFFLE\_LABELS = True


## Global variables for the script

In [None]:
### First we set the values of several global variables
MULTILAYER_PERCEPTRON = False     ### If False, it is a Single-layer Perceptron

NRANDOM_EXAMPLES      = 0         ### Number of random examples to add
NRANDOM_FEATURES      = 0         ### Number of random features to add
ADD_NOISE_INPUTS      = False     ### If True, add standard Gaussian noise to the inputs
SHUFFLE_LABELS        = False     ### If True, shuffle the labels (without shuffling the inputs)
SCALE_INPUTS_FUNCTION = "M0SD1"   ### "M0SD1": Mean 0 StdDev 1 / "MinMax": Values in an interval / None

TRAIN_SIZE_PCT_SPLIT  = 0.70      ### Percentage of data used for training (it must be in (0,1])

NHIDDEN1 = 100;  FACTIVATION_HIDDEN1 = 'tanh'
NHIDDEN2 = 50;   FACTIVATION_HIDDEN2 = 'tanh'
NHIDDEN3 = 0;    FACTIVATION_HIDDEN3 = 'tanh'
NHIDDEN4 = 0;    FACTIVATION_HIDDEN4 = 'tanh'
NHIDDEN5 = 0;    FACTIVATION_HIDDEN5 = 'tanh'
FACTIVATION_OUTPUT = 'softmax'    ### only for CLASSIFICATION

TRAINING_ALGORITHM = "SGD"        ### "SGD", "RMSprop", "Adam"

LEARNING_RATE      = 0.01         ### (almost) ALL training algorithms have a learning rate
MOMENTUM_RATE      = 0.80         ### Maybe not needed in some training algorithms

BATCHSIZE          = 20           ### Mini-batch size
NEPOCHS            = 200          ### Number of training iterations

inputsFileName = 'Data/ionosphere.inputs'
labelsFileName = 'Data/ionosphere.labels'
#inputsFileName = 'Data/hepatitis.inputs'
#labelsFileName = 'Data/hepatitis.labels'
#inputsFileName = 'Data/sonar.inputs'
#labelsFileName = 'Data/sonar.labels'
#inputsFileName = 'Data/xor.inputs'
#labelsFileName = 'Data/xor.labels'

## General imports

In [None]:
import DataFunctions                ### Functions for data management
import numpy                        ### Library for numerical computations
import keras, tensorflow, sklearn   ### Libraries for constructing and training the models
import matplotlib.pyplot as plt     ### Library for plotting

In [None]:
print(keras.__version__)
print(tensorflow.__version__)

## Load inputs and labels, change (if required)  and preprocess them

In [None]:
### Now we read inputs and labels
x, y = DataFunctions.loadDatasetsFromFiles (inputsFileName, labelsFileName)
y    = y.ravel()   ### sklearn prefers shapes (N,) than (N,1)

### Maybe we want to add random examples
if NRANDOM_EXAMPLES != 0:
    xx = numpy.random.rand(NRANDOM_EXAMPLES,x.shape[1])
    x  = numpy.vstack([x, xx])
    yy = numpy.random.randint(numpy.min(y),numpy.max(y)+1,NRANDOM_EXAMPLES)
    y  = numpy.hstack([y, yy])

### Maybe we want to add random features
if NRANDOM_FEATURES != 0:
    xx = numpy.random.rand(x.shape[0],NRANDOM_FEATURES)
    x  = numpy.hstack([x, xx])

### Maybe we want to add noise to the data
if ADD_NOISE_INPUTS:
    x += numpy.random.randn(x.shape[0],x.shape[1])

### Maybe we want to shuffle the labels
if SHUFFLE_LABELS:
    random.shuffle(y)

nExamples = x.shape[0]
nFeatures = x.shape[1]
nClasses  = len(numpy.unique(y))                   ### only for CLASSIFICATION

### Convert labels to a 1-of-C (one-hot) scheme
## For neural networks, it is easier to output yes/no than (for example) an integer with the predicted class
y1C = DataFunctions.convertLabels_1ofC_Scheme (y)  ### only for CLASSIFICATION
    
### Scale inputs
if   SCALE_INPUTS_FUNCTION == "M0SD1":
    x, Scaler = DataFunctions.scaleDataMean0Dev1Scaler (x)
elif SCALE_INPUTS_FUNCTION == "MinMax":
    x, Scaler = DataFunctions.scaleDataMinMaxScaler (x, FeatureRange=(-1,+1))
#print("First 3 rows of x:"); print(x[0:3,])

## Split data and labels into training and test data

In [None]:
### Split data into training (to construct the model) and test (to estimate the generalization)
from sklearn import model_selection
x_train, x_test, y_train, y_test = \
  model_selection.train_test_split (x, y, train_size=TRAIN_SIZE_PCT_SPLIT, shuffle=True, stratify=y)
y1C_train = DataFunctions.convertLabels_1ofC_Scheme (y_train)  ### only for CLASSIFICATION
y1C_test  = DataFunctions.convertLabels_1ofC_Scheme (y_test)   ### only for CLASSIFICATION

## Create the architecture (Type of Problem + Model Representation)

In [None]:
###
### https://keras.io/api/models/sequential/#sequential-class
###

### First we indicate that it is a sequential model
myNetwork = keras.Sequential()

### We need to indicate the input dimension in the first layer
inputDimension  = nFeatures
outputDimension = nClasses    ### only for CLASSIFICATION

if MULTILAYER_PERCEPTRON:

    ### Now we add the hidden layers
    myNetwork.add ( keras.layers.Dense (NHIDDEN1, activation=FACTIVATION_HIDDEN1, input_dim=inputDimension) )
    #myNetwork.add ( keras.layers.Dense (NHIDDEN1, activation=FACTIVATION_HIDDEN1, kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01), input_dim=nFeatures) )
    if NHIDDEN2 != 0:
        myNetwork.add ( keras.layers.Dense (NHIDDEN2, activation=FACTIVATION_HIDDEN2) )
        #myNetwork.add ( keras.layers.Dense (NHIDDEN2, activation=FACTIVATION_HIDDEN2 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )
    if NHIDDEN3 != 0:
        myNetwork.add ( keras.layers.Dense (NHIDDEN3, activation=FACTIVATION_HIDDEN3) )
        #myNetwork.add ( keras.layers.Dense (NHIDDEN3, activation=FACTIVATION_HIDDEN3 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )
    if NHIDDEN4 != 0:
        myNetwork.add ( keras.layers.Dense (NHIDDEN4, activation=FACTIVATION_HIDDEN4) )
        #myNetwork.add ( keras.layers.Dense (NHIDDEN4, activation=FACTIVATION_HIDDEN4 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )
    if NHIDDEN5 != 0:
        myNetwork.add ( keras.layers.Dense (NHIDDEN5, activation=FACTIVATION_HIDDEN5) )
        #myNetwork.add ( keras.layers.Dense (NHIDDEN5, activation=FACTIVATION_HIDDEN5 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )

    ### And finally we add the output layer
    myNetwork.add ( keras.layers.Dense (outputDimension, activation=FACTIVATION_OUTPUT) )

else:

    ### We only have an output layer
    myNetwork.add ( keras.layers.Dense (outputDimension, activation=FACTIVATION_OUTPUT, input_dim=inputDimension) )

### Print statistics
print(myNetwork.summary())

### Now we create a keras model
myInput = keras.layers.Input (shape=(nFeatures,))
myModel = keras.models.Model (inputs=myInput, outputs=myNetwork(myInput))

## Select the variables for training: loss function, training algorithm, ...

In [None]:
### Loss function
## Usual loss functions: 'categorical_crossentropy' 'binary_crossentropy' 'mean_squared_error', etc
lossFunction = ['categorical_crossentropy']   ### only for CLASSIFICATION (for one-hot labels, use categorical_crossentropy)

#print(keras.__version__)
if keras.__version__ == "2.2.4":
    optimizers = keras.optimizers
else:
    optimizers = tensorflow.keras.optimizers

### Training algorithm
## Every training algorithm will have its own parameters
if   TRAINING_ALGORITHM == "SGD":
    trainAlgorithm = optimizers.SGD (lr=LEARNING_RATE, momentum=MOMENTUM_RATE)  # There are more parameters
elif TRAINING_ALGORITHM == "RMSprop":
    trainAlgorithm = optimizers.RMSprop (lr=LEARNING_RATE)                      # There are more parameters
elif TRAINING_ALGORITHM == "Adam":
    trainAlgorithm = optimizers.Adam (lr=LEARNING_RATE)                         # There are more parameters

### Metrics to monitorize
## Keras allows to monitorize several metrics along training
showMetrics = ['categorical_accuracy', 'mean_squared_error']   ### only for CLASSIFICATION (for one-hot labels, use categorical_accuracy)

### Compile the model with all the elements (this is the standard way to work in keras)
myModel.compile (loss=lossFunction, optimizer=trainAlgorithm, metrics=showMetrics)

## Train the model with the training data

In [None]:
###
### This method has many parameters:
###   https://keras.io/api/models/model_training_apis/#fit-method
###

validationData = (x_test,y1C_test)  ### We could also use the validation_split parameter
fitData = myModel.fit \
  (x_train, y1C_train, validation_data=validationData, batch_size=BATCHSIZE, epochs=NEPOCHS)


## Test the model in the training and test data at the end of the training phase

In [None]:
scoresTrain = myModel.evaluate (x_train,y1C_train)
scoresTest  = myModel.evaluate (x_test,y1C_test)
print("Loss function and Accuracy in the training set: %.8f  %7.3f%%" % (scoresTrain[0], 100*scoresTrain[1])) 
print("Loss function and Accuracy in the test set:     %.8f  %7.3f%%" % (scoresTest[0],  100*scoresTest[1]))

## Now we can plot the training history

In [None]:
lossTrain = fitData.history["loss"]
lossValid = fitData.history["val_loss"]

epochsPlot = range(1,len(lossTrain)+1)

plt.plot(epochsPlot,lossTrain,label='Training Loss')
plt.plot(epochsPlot,lossValid,label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
accuracyTrain = fitData.history["categorical_accuracy"]
accuracyValid = fitData.history["val_categorical_accuracy"]

epochsPlot = range(1,len(accuracyTrain)+1)

plt.plot(epochsPlot,accuracyTrain,label='Training Accuracy')
plt.plot(epochsPlot,accuracyValid,label='Validation Accuracy')

plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()