# This is a Modified Copy of the Basic Example with Most Cells Merged

This is a modified copy of the basic example of a Single and Multilayer Perceptron for Classification and Regression.

It will allow to test different scenarios by changing easily the global variables.

**TO SIMPLIFY, ONLY CLASSIFICATION PROBLEMS ARE TESTED**

**TO SIMPLIFY, MOST CELLS ARE MERGED**

**This script has been tested with the following package versions:**
- pandas 1.3.3
- sklearn 0.24.0
- keras 2.2.4 + tensorflow 1.14.0 / keras 2.6.0 + tensorflow 2.6.0

**Maybe you can activate a conda environment already created:**
- conda create --name masternn python=3.9
- conda activate masternn
- conda install jupyter matplotlib pandas
- pip install sklearn keras==2.6.0 tensorflow==2.6.0

In [None]:
import DataFunctions                ### Functions for data management
import numpy                        ### Library for numerical computations
import keras, tensorflow, sklearn   ### Libraries for constructing and training the models
import matplotlib.pyplot as plt     ### Library for plotting
import copy                         ### Allows copy and deepcopy
print(keras.__version__)
print(tensorflow.__version__)

In [None]:

### Main function

def mainFunction(params):

    ### First we read inputs and labels
    x, y = DataFunctions.loadDatasetsFromFiles (params.INPUTSFILENAME, params.LABELSFILENAME)
    y    = y.ravel()   ### sklearn prefers shapes (N,) than (N,1)

    ### Maybe we want to add random features
    if params.NRANDOM_FEATURES != 0:
        xx = numpy.random.rand(x.shape[0],params.NRANDOM_FEATURES)
        x  = numpy.hstack([x, xx])

    ### Maybe we want to add random examples
    if params.NRANDOM_EXAMPLES != 0:
        xx = numpy.random.rand(params.NRANDOM_EXAMPLES,x.shape[1])
        x  = numpy.vstack([x, xx])
        yy = numpy.random.randint(numpy.min(y),numpy.max(y)+1,params.NRANDOM_EXAMPLES)
        y  = numpy.hstack([y, yy])

    ### Maybe we want to add noise to the data
    if params.ADD_NOISE_INPUTS:
        x += numpy.random.randn(x.shape[0],x.shape[1])

    ### Maybe we want to shuffle the labels
    if params.SHUFFLE_LABELS:
        random.shuffle(y)

    nExamples = x.shape[0]
    nFeatures = x.shape[1]
    nClasses  = len(numpy.unique(y))                   ### only for CLASSIFICATION

    ### Convert labels to a 1-of-C (one-hot) scheme
    ## For neural networks, it is easier to output yes/no than (for example) an integer with the predicted class
    y1C = DataFunctions.convertLabels_1ofC_Scheme (y)  ### only for CLASSIFICATION
    
    ### Scale inputs
    if   params.SCALE_INPUTS_FUNCTION == "M0SD1":
        x, Scaler = DataFunctions.scaleDataMean0Dev1Scaler (x)
    elif params.SCALE_INPUTS_FUNCTION == "MinMax":
        x, Scaler = DataFunctions.scaleDataMinMaxScaler (x, FeatureRange=(-1,+1))
    #print("First 3 rows of x:"); print(x[0:3,])

    ### Split data into training (to construct the model) and test (to estimate the generalization)
    from sklearn import model_selection
    x_train, x_test, y_train, y_test = \
      model_selection.train_test_split (x, y, train_size=params.TRAIN_SIZE_PCT_SPLIT, shuffle=True, stratify=y)
    y1C_train = DataFunctions.convertLabels_1ofC_Scheme (y_train)  ### only for CLASSIFICATION
    y1C_test  = DataFunctions.convertLabels_1ofC_Scheme (y_test)   ### only for CLASSIFICATION

    ###
    ### https://keras.io/api/models/sequential/#sequential-class
    ###

    ### First we indicate that it is a sequential model
    myNetwork = keras.Sequential()

    ### We need to indicate the input dimension in the first layer
    inputDimension  = nFeatures
    outputDimension = nClasses    ### only for CLASSIFICATION

    if params.MULTILAYER_PERCEPTRON:

        ### Now we add the hidden layers
        myNetwork.add ( keras.layers.Dense (params.NHIDDEN1, activation=params.FACTIVATION_HIDDEN1, input_dim=inputDimension) )
        #myNetwork.add ( keras.layers.Dense (params.NHIDDEN1, activation=params.FACTIVATION_HIDDEN1, kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01), input_dim=nFeatures) )
        if params.NHIDDEN2 != 0:
            myNetwork.add ( keras.layers.Dense (params.NHIDDEN2, activation=params.FACTIVATION_HIDDEN2) )
            #myNetwork.add ( keras.layers.Dense (params.NHIDDEN2, activation=params.FACTIVATION_HIDDEN2 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )
        if params.NHIDDEN3 != 0:
            myNetwork.add ( keras.layers.Dense (params.NHIDDEN3, activation=params.FACTIVATION_HIDDEN3) )
            #myNetwork.add ( keras.layers.Dense (params.NHIDDEN3, activation=params.FACTIVATION_HIDDEN3 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )
        if params.NHIDDEN4 != 0:
            myNetwork.add ( keras.layers.Dense (params.NHIDDEN4, activation=params.FACTIVATION_HIDDEN4) )
            #myNetwork.add ( keras.layers.Dense (params.NHIDDEN4, activation=params.FACTIVATION_HIDDEN4 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )
        if params.NHIDDEN5 != 0:
            myNetwork.add ( keras.layers.Dense (params.NHIDDEN5, activation=params.FACTIVATION_HIDDEN5) )
            #myNetwork.add ( keras.layers.Dense (params.NHIDDEN5, activation=params.FACTIVATION_HIDDEN5 kernel_regularizer=keras.regularizers.l2(0.01), bias_regularizer=keras.regularizers.l2(0.01)) )

        ### And finally we add the output layer
        myNetwork.add ( keras.layers.Dense (outputDimension, activation=params.FACTIVATION_OUTPUT) )

    else:

        ### We only have an output layer
        myNetwork.add ( keras.layers.Dense (outputDimension, activation=params.FACTIVATION_OUTPUT, input_dim=inputDimension) )

    ### Print statistics
    print(myNetwork.summary())

    ### Now we create a keras model
    myInput = keras.layers.Input (shape=(nFeatures,))
    myModel = keras.models.Model (inputs=myInput, outputs=myNetwork(myInput))

    ### Loss function
    ## Usual loss functions: 'categorical_crossentropy' 'binary_crossentropy' 'mean_squared_error', etc
    lossFunction = ['categorical_crossentropy']   ### only for CLASSIFICATION (for one-hot labels, use categorical_crossentropy)

    #print(keras.__version__)
    if keras.__version__ < "2.3.0":
        optimizers = keras.optimizers
    else:
          optimizers = tensorflow.keras.optimizers

    ### Training algorithm
    ## Every training algorithm will have its own parameters
    if   params.TRAINING_ALGORITHM == "SGD":
        trainAlgorithm = optimizers.SGD (lr=params.LEARNING_RATE, momentum=params.MOMENTUM_RATE)  # There are more parameters
    elif params.TRAINING_ALGORITHM == "RMSprop":
        trainAlgorithm = optimizers.RMSprop (lr=params.LEARNING_RATE)                             # There are more parameters
    elif params.TRAINING_ALGORITHM == "Adam":
        trainAlgorithm = optimizers.Adam (lr=params.LEARNING_RATE)                                # There are more parameters

    ### Metrics to monitorize
    ## Keras allows to monitorize several metrics along training
    showMetrics = ['categorical_accuracy', 'categorical_crossentropy', 'mean_squared_error']

    ### Compile the model with all the elements (this is the standard way to work in keras)
    myModel.compile (loss=lossFunction, optimizer=trainAlgorithm, metrics=showMetrics)

    ###
    ### This method has many parameters:
    ###   https://keras.io/api/models/model_training_apis/#fit-method
    ###

    validationData = (x_test,y1C_test)  ### We could also use the validation_split parameter
    fitData = myModel.fit \
      (x_train, y1C_train, validation_data=validationData, batch_size=params.BATCHSIZE, epochs=params.NEPOCHS)

    return myModel, fitData, x_train, y1C_train, x_test, y1C_test


In [None]:

### This class is only used to set default parameters

class defaultParameters(object):

    def __init__(self):

        self.MULTILAYER_PERCEPTRON = True      ### If False, it is a Single-layer Perceptron

        self.NRANDOM_EXAMPLES      = 0         ### Number of random examples to add
        self.NRANDOM_FEATURES      = 0         ### Number of random features to add
        self.ADD_NOISE_INPUTS      = False     ### If True, add standard Gaussian noise to the inputs
        self.SHUFFLE_LABELS        = False     ### If True, shuffle the labels (without shuffling the inputs)
        self.SCALE_INPUTS_FUNCTION = "M0SD1"   ### "M0SD1": Mean 0 StdDev 1 / "MinMax": Values in an interval / None

        self.TRAIN_SIZE_PCT_SPLIT  = 0.70      ### Percentage of data used for training (it must be in (0,1])

        self.NHIDDEN1 = 100;    self.FACTIVATION_HIDDEN1 = 'tanh'
        self.NHIDDEN2 = 50;     self.FACTIVATION_HIDDEN2 = 'tanh'
        self.NHIDDEN3 = 0;      self.FACTIVATION_HIDDEN3 = 'tanh'
        self.NHIDDEN4 = 0;      self.FACTIVATION_HIDDEN4 = 'tanh'
        self.NHIDDEN5 = 0;      self.FACTIVATION_HIDDEN5 = 'tanh'
        self.FACTIVATION_OUTPUT = 'softmax'    ### only for CLASSIFICATION

        self.TRAINING_ALGORITHM = "SGD"        ### "SGD", "RMSprop", "Adam"

        self.LEARNING_RATE      = 0.001        ### (almost) ALL training algorithms have a learning rate
        self.MOMENTUM_RATE      = 0.80         ### Maybe not needed in some training algorithms

        self.BATCHSIZE          = 20           ### Mini-batch size
        self.NEPOCHS            = 200          ### Number of training iterations

        self.INPUTSFILENAME = 'Data/ionosphere.inputs'
        self.LABELSFILENAME = 'Data/ionosphere.labels'
        #self.INPUTSFILENAME = 'Data/hepatitis.inputs'
        #self.LABELSFILENAME = 'Data/hepatitis.labels'
        #self.INPUTSFILENAME = 'Data/sonar.inputs'
        #self.LABELSFILENAME = 'Data/sonar.labels'
        #self.INPUTSFILENAME = 'Data/xor.inputs'
        #self.LABELSFILENAME = 'Data/xor.labels'


In [None]:
def showResults1(myModel, fitData, x_train, y1C_train, x_test, y1C_test):
    
    ### Training history
    lossTrain = fitData.history["loss"]
    lossValid = fitData.history["val_loss"]

    ### Plot
    plt.figure(figsize=(7,5))
    epochsPlot = range(1,len(lossTrain)+1)
    plt.plot(epochsPlot,lossTrain,label='Training Loss')
    plt.plot(epochsPlot,lossValid,label='Validation Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()

    ### Loss function and accuracy
    scoresTrain = myModel.evaluate (x_train,y1C_train)
    scoresTest  = myModel.evaluate (x_test,y1C_test)
    print("Loss function value and Accuracy training set: %.5f  %7.3f%%" % (scoresTrain[0], 100*scoresTrain[1])) 
    print("Loss function value and Accuracy test set:     %.5f  %7.3f%%" % (scoresTest[0],  100*scoresTest[1]))

def showResults2(myModel1, fitData1, x_train1, y1C_train1, x_test1, y1C_test1,
                 myModel2, fitData2, x_train2, y1C_train2, x_test2, y1C_test2):

    ### Loss function and accuracy 1
    scoresTrain1 = myModel1.evaluate (x_train1,y1C_train1)
    scoresTest1  = myModel1.evaluate (x_test1,y1C_test1)

    ### Training history 1
    lossTrain1 = fitData1.history["loss"]
    lossValid1 = fitData1.history["val_loss"]

    ### Loss function and accuracy 2
    scoresTrain2 = myModel2.evaluate (x_train2,y1C_train2)
    scoresTest2  = myModel2.evaluate (x_test2,y1C_test2)
    
    ### Training history 2
    lossTrain2 = fitData2.history["loss"]
    lossValid2 = fitData2.history["val_loss"]

    ### Plot
    plt.figure(figsize=(15,5))
    
    ### Plot 1
    plt.subplot(1,2,1)
    epochsPlot1 = range(1,len(lossTrain1)+1)
    plt.plot(epochsPlot1,lossTrain1,label='Training Loss 1')
    plt.plot(epochsPlot1,lossValid1,label='Validation Loss 1')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    #plt.show()
    
    ### Plot 2
    plt.subplot(1,2,2)
    epochsPlot2 = range(1,len(lossTrain2)+1)
    plt.plot(epochsPlot2,lossTrain2,label='Training Loss 2')
    plt.plot(epochsPlot2,lossValid2,label='Validation Loss 2')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    plt.show()
    
    ### Print accuracies
    print("   Loss function value and Accuracy")
    print()
    print("        Training set 1: %.5f  %7.3f%%                      Training set 2: %.8f  %7.3f%%" %
          (scoresTrain1[0], 100*scoresTrain1[1], scoresTrain2[0], 100*scoresTrain2[1])) 
    print("        Test set 1:     %.5f  %7.3f%%                      Test set 2:     %.8f  %7.3f%%" % 
          (scoresTest1[0],  100*scoresTest1[1], scoresTest2[0],  100*scoresTest2[1]))
    
    

## First run with the default parameters

In [None]:
### Set the values of the default parameters
params1 = defaultParameters()

### Run main function
myModel1, fitData1, x_train1, y1C_train1, x_test1, y1C_test1 = mainFunction(params1)

In [None]:
### Plot results
showResults1(myModel1, fitData1, x_train1, y1C_train1, x_test1, y1C_test1)

## Now we change the default parameters and compare

In [None]:
### Set the values of the default parameters
params2 = copy.copy(params1)
params2.LEARNING_RATE = 0.1

### Run main function
myModel2, fitData2, x_train2, y1C_train2, x_test2, y1C_test2 = mainFunction(params2)

In [None]:
showResults2(myModel1, fitData1, x_train1, y1C_train1, x_test1, y1C_test1,
             myModel2, fitData2, x_train2, y1C_train2, x_test2, y1C_test2)

# This is a list of things that you can test:
- Check that the results can be very different:
  - For different runs with the same parameters (non-linear networks)
  - By changing the value of several data parameters:
    - TRAIN\_SIZE\_PCT\_SPLIT: 0.70,0.20,0.01 (ionosphere, MLP-100-50, SGD, LR = 0.001)
    - SCALE\_INPUTS\_FUNCTION (hepatits, linear or non-linear networks)
  - By changing the value of several critical training parameters (ionosphere, MLP-100-50):
    - TRAINING\_ALGORITHM: SGD, RMSprop, Adam
    - LEARNING\_RATE: 0.1,0.01,0.001,0.0001
    - MOMENTUM\_RATE: 0.80, 0.00 (SGD, LR = 0.001)
    - FACTIVATION\_HIDDEN1 and FACTIVATION\_HIDDEN2: tanh, relu (SGD, LR = 0.001)
    - FACTIVATION\_OUTPUT: softmax, linear
- Underfitting and overfitting:
  - Underfitting:
    - Train a linear model and compare with a non-linear one (xor)
    - Train a non-linear model with small LR few epochs (ionosphere - MLP-100-50, SGD, LR <= 0.0001, EPOCHS = 200)
  - Overfitting:
    - Train a non-linear model with large LR few epochs (ionosphere - MLP-100-50, SGD, LR >= 0.01, EPOCHS = 200)
    - Train a non-linear model with small LR many epochs (ionosphere - MLP-100-50, SGD, LR <= 0.0001, EPOCHS = 5000)
    - Train a non-linear model with many hidden layers (sonar - MLP-20 vs MLP-20-20-20-20-20, SGD, LR = 0.002,   EPOCHS = 200)
    - Curse of Dimensionality: small value for TRAIN\_SIZE\_PCT\_SPLIT (ionosphere - MLP-100-50, SGD, LR = 0.01, EPOCHS = 200)
- The effect of the noise in the data:
  - Add random inputs and labels: NRANDOM_EXAMPLES = 1000 (ionosphere - MLP-100-50, SGD, LR = 0.001, EPOCHS = 200)
  - Add random features: NRANDOM_FEATURES = 1000,10000 (ionosphere - MLP-100-50, SGD, LR = 0.001, EPOCHS = 200)
  - Add noise to the inputs: ADD_NOISE_INPUTS = True
  - Shuffle the labels: SHUFFLE\_LABELS = True
