# Training Multilayer Perceptrons

The aim of this tutorial is to learn how to train multilayer perceptrons in Julia. To do so, we will make use of the `Flux` library, whose documentation can be consulted at https://fluxml.ai/Flux.jl/.

If not previously, the packege has to be installed on the system by executing the following commands

In [1]:
using Pkg; Pkg.add("Flux")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m GPUArraysCore ──────── v0.1.2
[32m[1m   Installed[22m[39m IRTools ────────────── v0.4.6
[32m[1m   Installed[22m[39m RealDot ────────────── v0.1.0
[32m[1m   Installed[22m[39m Adapt ──────────────── v3.4.0
[32m[1m   Installed[22m[39m IrrationalConstants ── v0.1.1
[32m[1m   Installed[22m[39m DiffRules ──────────── v1.11.1
[32m[1m   Installed[22m[39m Functors ───────────── v0.3.0
[32m[1m   Installed[22m[39m ShowCases ──────────── v0.1.0
[32m[1m   Installed[22m[39m DiffResults ────────── v1.1.0
[32m[1m   Installed[22m[39m ProgressLogging ────── v0.1.4
[32m[1m   Installed[22m[39m Flux ───────────────── v0.13.6
[32m[1m   Installed[22m[39m SpecialFunctions ───── v2.1.7
[32m[1m   Installed[22m[39m IfElse ─────────────── v0.1.1
[32m[1m   Installed[22m[39m NNlib ──────────────── v0.8.9

[32m  ✓ [39m[90mMissings[39m
[32m  ✓ [39m[90mArrayInterfaceCore[39m
[32m  ✓ [39m[90mNaNMath[39m
[32m  ✓ [39m[90mFunctors[39m
[32m  ✓ [39m[90mOpenSpecFun_jll[39m
[32m  ✓ [39m[90mLLVMExtra_jll[39m
[32m  ✓ [39m[90mGPUArraysCore[39m
[32m  ✓ [39m[90mFillArrays[39m
[32m  ✓ [39m[90mDiffResults[39m
[32m  ✓ [39m[90mStatic[39m
[32m  ✓ [39m[90mRandomNumbers[39m
[32m  ✓ [39m[90mTimerOutputs[39m
[32m  ✓ [39m[90mChainRulesCore[39m
[32m  ✓ [39m[90mStructArrays[39m
[32m  ✓ [39m[90mArrayInterface[39m
[32m  ✓ [39m[90mRandom123[39m
[32m  ✓ [39m[90mMacroTools[39m
[32m  ✓ [39m[90mAbstractFFTs[39m
[32m  ✓ [39m[90mChangesOfVariables[39m
[32m  ✓ [39m[90mDataStructures[39m
[32m  ✓ [39m[90mCommonSubexpressions[39m
[32m  ✓ [39m[90mOptimisers[39m
[32m  ✓ [39m[90mSortingAlgorithms[39m
[32m  ✓ [39m[90mZygoteRules[39m
[32m  ✓ [39m[90mLogExpFunctions[39m
[32m  ✓ [39m[90mIRTools[39m
[32m  ✓ [39m[90mNNlib[39m
[3

Flux is a library which provides a set of functions to create neural networks with an arbitrary number of layers. This is a library designed to develop Deep Learning projects, whose ANNs usually have a large number of layers of different types, for example, convolutional or maxpooling layers.  In these exercises, only multilayer perceptrons will be developed, with fully-connected (dense) layers, with a maximum of two hidden layers. 

In order to implement an ANN in Julia, there is a function called `Chain`. This function receives as parameters the layers that the network will have (excluding the input layer, which does not perform any processing), which can be of different types.This function receives as parameters the layers that the network will have (excluding the input layer, which does not perform any processing), which can be of different types. Therefore, it is a function with a variable number of parameters.  Depending on the type of layer desired, there are different functions to create each one of them. A couple of examples are the functions `Conv`, which allows for the creation of convolutional layers, or `MaxPool`, which allows for the creation of MaxPooling layers. These layers are used in more advanced models that will be seen in other subjects of the programme and that are beyond the syllabus of this subject.  In this subject, as only multilayer perceptrons will be covered, the layers will always be created fully connected with the function `Dense`. This function accepts as parameters the number of inputs, outputs, and the transfer function of the neurons in the layer. In this sense, two different cases can be distinguished when creating ANNs, depending on the problem to be solved: 

- ***Regression problems***. In this type of problem, the output layer usually has a linear transfer function, while the hidden layers have a non-linear transfer function.  In the following examples, a sigmoidal transfer function is used as the transfer function in the hidden layers; for further information on other supported transfer functions, please refer to the library documentation.  

  In the first example, an ANN is implemented with 10 inputs, a hidden layer with 5 neurons, and an output layer of 1 neuron.

In [2]:
using Flux;

ann = Chain(    
    Dense(10, 5, σ),    
    Dense(5, 1, identity) )

Chain(
  Dense(10 => 5, σ),                    [90m# 55 parameters[39m
  Dense(5 => 1),                        [90m# 6 parameters[39m
) [90m                  # Total: 4 arrays, [39m61 parameters, 500 bytes.

* The second example builds an ANN with 15 inputs, two hidden layers with 12 and 5 neurons, and an output layer with 2 neurons. 

In [3]:
ann = Chain(    
    Dense(15, 12, σ),    
    Dense(12, 5, σ),    
    Dense(5, 2, identity) )

Chain(
  Dense(15 => 12, σ),                   [90m# 192 parameters[39m
  Dense(12 => 5, σ),                    [90m# 65 parameters[39m
  Dense(5 => 2),                        [90m# 12 parameters[39m
) [90m                  # Total: 6 arrays, [39m269 parameters, 1.426 KiB.

**Warning:** Be aware that the number of neurons between the different layers has to match

### Question
What would happen if all the layers of the ANN have a linear transfer function?

`If we use a linear transfer function in the hidden layers of the neuron we will make the ANN system to only be able to represent linear functions (i.e. straight lines).`

- ***Classification problems***.  In this case, as explained in theory class, two different situations are considered, depending on whether there are two classes or more than two classes, whereas not belonging to any class is treated as another class:
1. When there are only two classes, this is usually referred to as positives and negatives. In this case, the desired outputs will be either 1 or 0, and a single hidden neuron is, therefore, present.  Thus, such neuron is desired to return values between 0 and 1, which will be interpreted as the degree of certainty the system has that the output is positive.To ensure it returns a bounded value between 0 and 1, a sigmoidal function is applied to the output layer.In the following example, an ANN is defined with a hidden layer and an output neuron; in this example, the sigmoid function has also been used in the hidden layer, but this could be modified.

In [4]:
ann = Chain(    
    Dense(8, 4, σ),    
    Dense(4, 1, σ) )

Chain(
  Dense(8 => 4, σ),                     [90m# 36 parameters[39m
  Dense(4 => 1, σ),                     [90m# 5 parameters[39m
) [90m                  # Total: 4 arrays, [39m41 parameters, 420 bytes.

2. With more than two classes, you have one output neuron per class. The desired output of a pattern is 1 for the neuron of the class it belongs to, and 0 for the rest. This kind of encoding is known as one-hot-encoding.  In this way, the output of a neuron for a pattern can be interpreted as the degree of certainty that the pattern belongs to the class corresponding to that neuron.  Unlike the previous case, a sigmoidal transfer function is not applied to the outputs of each neuron in the output layer to bound the output between 0 and 1, but no function (identity function) is applied.  Instead, a `softmax` function is applied to the outputs of all the neurons, which takes unbounded numeric values and returns numeric values between 0 and 1 such that the sum of all values is 1.  Even though this function does not constitute a layer of neurons, sometimes this last `softmax` function is considered as an additional layer, and in fact, from the point of view of programming with the `Flux` library, it is effectively performed as if it were a last layer, as can be seen in this example:

In [5]:
ann = Chain(    
    Dense(9, 5, σ),    
    Dense(5, 3, identity),    
    softmax )

Chain(
  Dense(9 => 5, σ),                     [90m# 50 parameters[39m
  Dense(5 => 3),                        [90m# 18 parameters[39m
  NNlib.softmax,
) [90m                  # Total: 4 arrays, [39m68 parameters, 528 bytes.

Another possibility to use the `Chain` function is by successive calls, adding layers to an already created network.  To do this, the ellipsis operator is used when specifying arguments to a function. In the following example, an equivalent ANN is created by first creating an empty ANN and successively adding layers to it: 

In [6]:
ann = Chain(); 
ann = Chain(ann...,  Dense(4, 8, σ) );
ann = Chain(ann...,  Dense(8, 3, identity) ); 
ann = Chain(ann...,  softmax )

Chain(
  Dense(4 => 8, σ),                     [90m# 40 parameters[39m
  Dense(8 => 3),                        [90m# 27 parameters[39m
  NNlib.softmax,
) [90m                  # Total: 4 arrays, [39m67 parameters, 524 bytes.

This variable, obtained in either form, can be used as a function.  For example, the matrix inputs created in the previous tutorial can be taken and passed to the ANN resulting in the outputs of the network by simply writing the following code:

In [7]:
import Pkg; Pkg.add("DelimitedFiles");
using DelimitedFiles 

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Manifest.toml`


In [71]:
dataset = readdlm("iris.data",',');

# Split inputs and transform into floats
inputs = dataset[:,1:4]
inputs = convert(Array{Float32,2}, inputs); 

# Split outputs and one hot encode
targets = dataset[:,5]
function encode_categories(targets)
    if (length(unique(targets)) > 2)
        cats = unique(targets) .== permutedims(targets)
        return cats'
    else
        cats = targets .== unique(targets)[1]
        return cats
    end
end

targets = encode_categories(targets)

150×3 adjoint(::BitMatrix) with eltype Bool:
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 ⋮     
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1

In [9]:
outputs = ann(inputs')

3×150 Matrix{Float32}:
 0.149642   0.148868   0.15154    …  0.20533    0.209915   0.206341
 0.767079   0.767524   0.763899      0.722156   0.71636    0.719564
 0.0832786  0.0836083  0.0845608     0.0725139  0.0737245  0.0740947

***Warning***: due to the formulation used in the world of ANNs, the input and output matrices for the ANN have each pattern in each column, not in each row. Therefore, each row of the input matrix will represent one of the input attributes, while in the output matrix, each row will correspond to one of the desired outputs of the ANN. Consequently, a transpose of the matrices created in the previous practice is required.

Thus, even if the ANN is not properly trained, it may be interesting to make a call similar to this one to verify that the ANN has been created correctly. Once it has been verified that the ANN has been created correctly, it is time to train it. 

To train an ANN, following the workflow described in the theory class, the patterns are presented to the network, then the output is compared with the desired output, and finally a loss value is calculated.  This loss value will be used to modify the weights of the connections and bias. Therefore, a key point is to define this loss function, which will be different for regression and classification problems. The Flux library includes the Losses module with a large number of loss functions used to train NRs. In this subject only the most common ones will be used and, therefore, the first step is to load this module with: 

In [10]:
using Flux.Losses

For any kind of problem, the usage of the `loss` functions is the same. The first argument is the outputs of the model, the second argument is the desired outputs (in both cases with a pattern in each column), and the third argument is the optional keyword `agg`, which indicates how to aggregate the loss values for each pattern. If no value is specified for this keyword, by default an average of all loss values for all patterns will be performed.

* For a regression problem, the most commonly loss function is the **Mean Square Error** (MSE) between the model outputs and the desired targets.  This MSE function is already defined in the Losses module of Flux, although it is very simple to define. It can be used as follows, where `x` are the model inputs and `y` are the targets:

In [11]:
loss(x, y) = Losses.mse(ann(x), y)

loss (generic function with 1 method)

  Other loss functions that may be of interest for regression problems are `Flux.Loss.mae` or `Flux.Loss.msle`.

* For a classification problem the loss function is different. As it was appointed during theory sessions, the binary cross-entropy function is the common choice for a  2-classes problem (only one output neuron).

In [12]:
loss(x, y) = Losses.binarycrossentropy(ann(x), y)

loss (generic function with 1 method)

  Whereas, the cross-entropy function is main used for problems with more than 2  classes (one output neuron per class).

In [13]:
loss(x, y) = Losses.crossentropy(ann(x), y)

loss (generic function with 1 method)

Although, each of these functions could be declared in the various branches of an if statement, Julia has problems when making such function declarations. For this reason, these two declarations can be merged into one, with the following line of code:

In [14]:
loss(x, y) = (size(y,1) == 1) ? Losses.binarycrossentropy(ann(x),y) : Losses.crossentropy(ann(x),y)

loss (generic function with 1 method)

### Question
As it can be seen, first of all the number of rows of the desired output matrix (`y`) is checked, why is the number of rows checked and not the number of columns?

`Because the output matrix y is transposed, making each one of the possible outputs of the classification problem into a row and making each one of the patterns into a column. Meaning, if we want to check if we are in a "binary classification" problem we need to check that the number of rows is 1 (meaning we have either 0 or 1 possible outputs), however in a multiclass classification we need to check if the number of rows is more than one (meaning we can have 001, 010, 100...)`

It is important to remember that in both `x` (inputs) and `y` (targets) each pattern must be in a column, contrary to the usual practice. For this reason, as will be shown below, the matrices of inputs and desired outputs will be transposed.

These functions use the variable `ann`, which is used as a function as it was previously described.  Therefore, it needs to be defined within the environment in which the function is defined.

Once the loss function has been defined, it is necessary to indicate the optimizer to be used during training. The optimizer is nothing more than a specific implementation of one of the alternative backpropagation algorithms. Flux has a large number of those implementations, from the classical one based on gradient descent (`Descent`) or adding also the momentum (`Momentum`) to more advanced ones: `ADAM`, `RADAM`, `AdaMax`, `ADAGrad`, `ADADelta`, `AMSGrad`, `RMSProp`, etc.  Possibly, nowadays, the most widely used optimizer is ADAM, to which you have to indicate the learning rate. This value is usualy a small amount, however, you can find more information in the documentation of the library.

Thus, training **an epoch** of the previous ANN can be done with the `train!` function as follows, where `learningRate` is a variable defined beforehand:

In [15]:
learningRate=2

Flux.train!(loss, Flux.params(ann), [(inputs', targets')], ADAM(learningRate))

"Done"

"Done"

**Important**. It is worth mentioning that in Julia, by convention, when a function is defined by adding `!` (bang) as the last symbol, it is understood taht it modifies the contents of one or more of its arguments, which has therefore been passed by reference.

That is the case of the function `train!`, which has four arguments:

1. The `loss` function, which has been previously defined
2. Weights and bias of the ANN. This can be achieved, as shown in the example, with the function `params`. 
3. A set of patterns, inputs and targets. As you can see in the example, an array with only one element is being passed. This element is a tuple with two elements: the arrays of inputs and desired outputs.  This way of passing the patterns, which may seem cumbersome, has its motivation, since when the set of patterns is very large, calculating the modifications to the weights with all the patterns can be very costly.  For this reason, the patterns are usually divided into batches so that each update is done with only one of these batches. If this were done, the array passed as a parameter, instead of having one tuple, would have several, one per batch.  However, in the exercises to be carried out in this subject, this will not be done, and all the patterns will be passed together. 
  * **Important**: As indicated above, these matrices of inputs and targets have each pattern in a column, contrary to what is usual.  For this reason, the input and target matrices passed as parameters are transposed, i.e. instead of passing `inputs` and `targets`, `inputs'` and `targets'` are passed. If the input and/or target matrices already had a pattern in each column, there would be no need to transpose the corresponding matrix.
  * **Important**: The matrices given in this parameter are used for training the ANN. Therefore, they have to be completely different from those used for testing.
4. Optimizer.  In this example, it is an ADAM with a learning rate equal to  `learningRate`, which usually takes values between 0.001 and 0.1. A common value is 0.01, although you should try different values until you find one that gives good results for the specific problem.

In this way only one loop is trained.  Therefore, to train an ANN it is necessary to create a loop that executes this function as long as some stop criterion is not met. Some of the most common criteria can be:

* The loss in training is good enough.
* The number of training cycles has reached a predefined maximum. 
* The change in training error is less than a predefined value. 
* etc.

#### Question
Could a similar stopping criterion to first one be made but with the test error? Why?

`A similar stop criterium to "the loss in training is good enough" but for test error could lead to overfitting the network and therefore loosing generalization capabilities.`

This way, it is possible to train an ANN so that the error or loss in the training set is minimised.   However, the training of ANNs is not deterministic, but has a random component, which is the random initialisation of the weights.  When this happens, to minimise the random component, the ANN is created and trained several times and the results are averaged.  If you want to train several ANNs, it will be necessary to nest two loops, where the outer loop will iterate through the different networks, and the inner loop will execute the different training cycles of each network.

#### Question
Where in the code (outside both loops, inside the first loop or inside the second loop) will it be necessary to put the call to the Chain function to create each network? Why shouldn't it be put elsewhere?

`It would be inside the first loop in order to initialize the networks with different weights each time, as the second one is devoted to training each network. It cannot be called outside both loops, as that would only create a network once.`

Through this process, the weights and biases, starting from initially random values, will take on different values until one of the stop criteria is met. 

In this sense, it is necessary to bear in mind that the weights corresponding to the connection of inputs with a very high absolute value will take on a low absolute value; on the other hand, the weights of those connections that connect inputs with a low absolute value will take on a high absolute value.  In this way, the ANN is able to combine inputs that are in very different ranges, passing them to values of similar scale.  similar scale values. In other words, the ANN is able to "learn" the relationship between the different scales on which the inputs move.

When dealing with classification problems, the value of loss is often not easy to interpret.  For classification problems there are different metrics that will be the focus of later sessions.  For now, to assess the goodness of output of ANN, we will only use classification accuracy, defined as the ratio of well-classified patterns (number of well-classified patterns divided by the total number of patterns). Therefore, Two different cases can be identified:
1. When there are two classes, the ANN has a single output neuron. As described above, a sigmoidal function, which returns a value between 0 and 1, is typically used as a transfer function. By simply passing a threshold (usually 0.5), the pattern can be classified as "positive" or "negative" depending on whether the output is greater or less than the threshold respectively.
2. When there are more than two classes, applying the softmax function will result in different values of certainty, certainty or probability of belonging to each class.  Therefore, a pattern is classified as the class with the highest probability.

**Important**: This process of normalisation, although it is applied here to ANNs, is common to the rest of the Machine Learning techniques. Therefore, the code to be developed regarding normalisation will also be used in the other models.

On the following paragraph the requirements of the different exercises to devop are explained with the signature of the methods that has to be developed.

1. Develop a function called oneHotEncoding, containing the code developed in the previous session regarding the encoding of a categorical input or output.  That is, to receive a vector of values and encode it as explained in the previous tutorial.  This function will receive two parameters, called `feature` and `classes`, both of type `AbstractArray{<:Any,1}`, that is, they are vectors containing any type of value. The first one has the values of that attribute or desired output for each pattern, and the second one has the values of the categories. This function should perform the following tasks:
  * When the number of values in the class vector is equal to 2, the attribute vector is compared with one of the two classes by broadcasting the `==` operator to generate a vector of Boolean values.   This vector is then transformed into a two-dimensional matrix of one column and returned. To do the latter, see the reshape function.
  * When the number of classes is greater than 2, first a matrix of boolean values is created (of type Array{Bool,2} or BitArray{2}) with as many rows as patterns and as many columns as categories (one column per category). Subsequently, iterate over each column/category, and assign the values of that column as the result of comparing the vector `feature` with the corresponding category by performing a broadcast as in the previous point.

In [22]:
function oneHotEncoding(feature::AbstractArray{<:Any,1}, classes::AbstractArray{<:Any,1})
    # feature vector: vector containing a set of atributes (e.g. color, shape, taste)
    # classes vector: vector containing the values of the categories (e.g. 'red', 'circle', 'sweet')
    numClasses = length(unique(classes))

    if (numClasses == 2)
        oneHot = Array{Bool,2}(undef, size(feature,1), 1)
        oneHot[:,1] .= (feature.==classes[1])
    else
        oneHot = Array{Bool,2}(undef, size(feature,1), numClasses)
        for numClass = 1:numClasses
            oneHot[:,numClass] .= (feature.==classes[numClass])
        end
    end
    return oneHot
end

oneHotEncoding (generic function with 2 methods)

2. Overload this function called oneHotEncoding in these two ways: 
  * One that receives a single parameter called `feature`, of type AbstractArray{<:Any,1}, which takes the categories and makes a call to the previous function. The unique function can be used to extract the categories. Develop this function without explicitly declaring it using the word function.

In [24]:
function oneHotEncoding(feature::AbstractArray{<:Any,1})
    return oneHotEncoding(feature, unique(feature))
end

oneHotEncoding (generic function with 2 methods)

  * Another function that receives a single parameter called `feature`, of type AbstractArray{Bool,1}, and therefore it will be a vector that contains for each pattern only two possibilities. As there are two categories, the output array must have a single column, which has the same elements. A simple way to do this function is to use the reshape function.

In [None]:
function oneHotEncoding(feature::AbstractArray{Bool,1})
    #TODO

3. Develop a set of functions to normalise the data using the code developed in the previous practice.  For this, develop the following functions: 
  * Two functions called `calculateMinMaxNormalizationParameters` and `calculateZeroMeanNormalizationParameters` that receive a parameter of type `AbstractArray{<:Real,2}` and return a tuple with two values, each of them being a matrix with one row with the minimum and maximum values for each column (first function) or means and standard deviations for each column (second function).

In [72]:
using Statistics;

# function defined in Unit1
function stats(outputs)
    minimum = mapslices(Statistics.minimum, outputs; dims=1)[1]
    maximum = mapslices(Statistics.maximum, outputs; dims=1)[1]
    mean = mapslices(Statistics.mean, outputs; dims=1)[1]
    std = mapslices(Statistics.std, outputs; dims=1)[1]
    return [minimum, maximum, mean, std]
end

function calculateMinMaxNormalizationParameters(dataset::AbstractArray{<:Real,2})
    # function that takes a real matrix (i.e. array of reals with dimension 2)
    # this matrix is the data-set to our problem, where each row is a sample and each column is an attribute
    # return a 2-tuple of matrixes where each row is the minimum and maximum respectivelly
    
    min_matrix = []
    max_matrix = []
    
    for column in eachcol(dataset)
        r = stats(column)
        if min_matrix == [] || max_matrix == []
            min_matrix = r[1]
            max_matrix = r[2]
        else
            min_matrix = vcat(min_matrix, r[1])
            max_matrix = hcat(max_matrix, r[2])
        end
    end
    return reshape(min_matrix, (4,1)), reshape(max_matrix, (4,1))
end

(Float32[4.3; 2.0; 1.0; 0.1;;], Float32[7.9; 4.4; 6.9; 2.5;;])


In [156]:
function calculateZeroMeanNormalizationParameters(dataset::AbstractArray{<:Real,2})
    means_matrix = []
    std_matrix = []
    
    for column in eachcol(dataset)
        r = stats(column)
        if means_matrix == [] || std_matrix == []
            means_matrix = r[1]
            std_matrix = r[2]
        else
            means_matrix = vcat(means_matrix, r[1])
            std_matrix = vcat(std_matrix, r[2])
        end
    end
    return means_matrix, std_matrix
end

(Float32[4.3, 2.0, 1.0, 0.1], Float32[7.9, 2.0, 1.0, 0.1])

  * Develop four functions to normalise between maximum and minimum. 
    * The first one, called normalizeMinMax! receives two parameters, an array of values to normalise (of type `AbstractArray{<:Real,2})` and the normalisation parameters (of type `NTuple{2, AbstractArray{<:Real,2}}`), executes the code of the previous tutorial referred to normalise between maximum and minimum, and returns the same array with the normalised data. Subsequently, make another function with the same name (overloaded) but with a single parameter that is the data matrix, and what it will do is to calculate the normalisation parameters with the function developed in the previous point and call the function normalizeMinMax! These two functions end in `!` because the array of values passed as a parameter is modified.

In [56]:
function normalizeMinMax!(dataset::AbstractArray{<:Real,2},      
    normalizationParameters::NTuple{2, AbstractArray{<:Real,2}})
    # x scaled = x - min(x) / max(x) - min(x)
    min, max = normalizationParameters
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmin, cmax = min[j], max[j]
            dataset[i,j] = dataset[i,j] - cmin / (cmax - cmin)
        end
    end
end

150×4 Matrix{Float32}:
 1.51667   1.0         0.891525   0.075
 1.31667   0.5         0.891525   0.075
 1.11667   0.7         0.791525   0.075
 1.01667   0.6         0.991525   0.075
 1.41667   1.1         0.891525   0.075
 1.81667   1.4         1.19153    0.275
 1.01667   0.9         0.891525   0.175
 1.41667   0.9         0.991525   0.075
 0.816666  0.4         0.891525   0.075
 1.31667   0.6         0.991525  -0.025
 1.81667   1.2         0.991525   0.075
 1.21667   0.9         1.09153    0.075
 1.21667   0.5         0.891525  -0.025
 ⋮                               
 2.41667   0.5         4.29152    1.675
 3.31667   0.6         4.89153    1.975
 3.11667   0.6         5.09153    2.275
 3.31667   0.6         4.59153    2.175
 2.21667   0.2         4.59153    1.775
 3.21667   0.7         5.39153    2.175
 3.11667   0.8         5.19152    2.375
 3.11667   0.5         4.69152    2.175
 2.71667   1.19209f-7  4.49152    1.775
 2.91667   0.5         4.69152    1.875
 2.61667   0.9         

In [68]:
function normalizeMinMax!(dataset::AbstractArray{<:Real,2})
    # x scaled = x - min(x) / max(x) - min(x)
    min, max = calculateMinMaxNormalizationParameters(dataset)
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmin, cmax = min[j], max[j]
            dataset[i,j] = dataset[i,j] - cmin / (cmax - cmin)
        end
    end
end

Float32[0.51759243; -0.29166663; 0.408216; -0.014583328;;]Float32[4.117592; 2.1083336; 6.308216; 2.3854165;;]


150×4 Matrix{Float32}:
 1.17382   1.32986   0.739027   0.0914931
 0.973817  0.829861  0.739027   0.0914931
 0.773816  1.02986   0.639027   0.0914931
 0.673816  0.929861  0.839027   0.0914931
 1.07382   1.42986   0.739027   0.0914931
 1.47382   1.72986   1.03903    0.291493
 0.673816  1.22986   0.739027   0.191493
 1.07382   1.22986   0.839027   0.0914931
 0.473817  0.729861  0.739027   0.0914931
 0.973817  0.929861  0.839027  -0.00850694
 1.47382   1.52986   0.839027   0.0914931
 0.873817  1.22986   0.939027   0.0914931
 0.873817  0.829861  0.739027  -0.00850694
 ⋮                             
 2.07382   0.829861  4.13903    1.69149
 2.97382   0.929861  4.73903    1.99149
 2.77382   0.929861  4.93903    2.29149
 2.97382   0.929861  4.43903    2.19149
 1.87382   0.529861  4.43903    1.79149
 2.87382   1.02986   5.23903    2.19149
 2.77382   1.12986   5.03903    2.39149
 2.77382   0.829861  4.53903    2.19149
 2.37382   0.329861  4.33903    1.79149
 2.57382   0.829861  4.53903    1.89149

  Make two other functions with the name normalizeMinMax that do the same, but do not modify the data matrix, i.e. create a new one, modify it and return it.  To do this, see the copy function.  These two functions should make calls to the previous functions.

In [73]:
function normalizeMinMax( dataset::AbstractArray{<:Real,2},      
                normalizationParameters::NTuple{2, AbstractArray{<:Real,2}}) 
    # x scaled = x - min(x) / max(x) - min(x)
    min, max = normalizationParameters
    out = zeros(size(dataset, 1), size(dataset, 2))
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmin, cmax = min[j], max[j]
            out[i,j] = dataset[i,j] - cmin / (cmax - cmin)
        end
    end
    
    return out
end

Float32[4.3; 2.0; 1.0; 0.1;;]Float32[7.9; 4.4; 6.9; 2.5;;]


150×4 Matrix{Float64}:
 3.90556  2.66667  1.23051  0.158333
 3.70556  2.16667  1.23051  0.158333
 3.50556  2.36667  1.13051  0.158333
 3.40556  2.26667  1.33051  0.158333
 3.80556  2.76667  1.23051  0.158333
 4.20556  3.06667  1.53051  0.358333
 3.40556  2.56667  1.23051  0.258333
 3.80556  2.56667  1.33051  0.158333
 3.20556  2.06667  1.23051  0.158333
 3.70556  2.26667  1.33051  0.0583333
 4.20556  2.86667  1.33051  0.158333
 3.60556  2.56667  1.43051  0.158333
 3.60556  2.16667  1.23051  0.0583333
 ⋮                          
 4.80556  2.16667  4.63051  1.75833
 5.70556  2.26667  5.23051  2.05833
 5.50556  2.26667  5.43051  2.35833
 5.70556  2.26667  4.93051  2.25833
 4.60556  1.86667  4.93051  1.85833
 5.60556  2.36667  5.73051  2.25833
 5.50556  2.46667  5.53051  2.45833
 5.50556  2.16667  5.03051  2.25833
 5.10556  1.66667  4.83051  1.85833
 5.30556  2.16667  5.03051  1.95833
 5.00556  2.56667  5.23051  2.25833
 4.70556  2.16667  4.93051  1.75833

In [74]:
function normalizeMinMax( dataset::AbstractArray{<:Real,2})
    # x scaled = x - min(x) / max(x) - min(x)
    min, max = calculateMinMaxNormalizationParameters(dataset)
    out = zeros(size(dataset, 1), size(dataset, 2))
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmin, cmax = min[j], max[j]
            out[i,j] = dataset[i,j] - cmin / (cmax - cmin)
        end
    end
    
    return out
end

normalizeMinMax (generic function with 2 methods)

- Develop four similar functions for the case of performing a 0-mean normalisation, whose names are `normalizeZeroMean!` and  `normalizeZeroMean`.

In [None]:
function normalizeZeroMean!(dataset::AbstractArray{<:Real,2},      
                        normalizationParameters::NTuple{2, AbstractArray{<:Real,2}}) 
    mean, std = normalizationParameters
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmean, cstd = mean[j], std[j]
            dataset[i,j] = dataset[i,j] - cmean / cstd
        end
    end
end
    

In [None]:
function normalizeZeroMean!(dataset::AbstractArray{<:Real,2})
    mean, std = calculateZeroMeanNormalizationParameters(dataset)
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmean, cstd = mean[j], std[j]
            dataset[i,j] = dataset[i,j] - cmean / cstd
        end
    end
end             

In [None]:
function normalizeZeroMean( dataset::AbstractArray{<:Real,2},      
                            normalizationParameters::NTuple{2, AbstractArray{<:Real,2}})
    mean, std = normalizationParameters
    out = zeros(size(dataset, 1), size(dataset, 2))
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmean, cstd = mean[j], std[j]
            out[i,j] = dataset[i,j] - cmean / cstd
        end
    end
    return out
end
                     

In [None]:
function normalizeZeroMean( dataset::AbstractArray{<:Real,2}) 
    mean, std = calculateZeroMeanNormalizationParameters(dataset)
    out = zeros(size(dataset, 1), size(dataset, 2))
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmean, cstd = mean[j], std[j]
            out[i,j] = dataset[i,j] - cmean / cstd
        end
    end
    return out
end        

4. Develop a function called `classifyOutputs`, which receives a parameter called outputs of type `AbstractArray{<:Real,2}`. It will contain the outputs of a model (not necessarily for an ANN) with `a single sample/patten in each row` and convert it to an array of boolean values that each row only has a value of true, indicating the class to which that pattern is classified.  To do this, first look at the number of columns the outputs matrix has, and then do the following: 

   * If you have one column, compare the matrix with a threshold value by broadcasting the >= operator to generate a matrix of boolean values of a column to be returned. To do this, the function needs to receive the threshold value as an optional parameter, with a default value of 0.5. 

   * If you have more than one column, you must create an array of boolean values of the same size, and, for each row, set the column with a larger value to true. This can be done without writing any loops. 

  This aim of this exercise, together with the next one, is to develop skills in  vector programming. Below this lines are the steps that could be taken to write the code for this second scenario, when you have more than one column (assuming that each pattern is in each row):
  
   * First, for each row, it needs to be found out in which column is the maximum output value for each pattern. This can be done with the `findmax` function, which returns a tuple with two values: the maximum in each row or column, and the coordinates in the matrix in which that maximum was found, which is what we are really interested in. With the keyword `dims` you can indicate whether you want to search for the maximums in the rows or in the columns.  The line of code would be as follows:
   ```julia
        (_,indicesMaxEachInstance) = findmax(outputs, dims=2);
    ```
  * Once you have it, you can create a boolean array of the same dimensionality as the output array, where each value indicates the membership of the corresponding class of that pattern. This matrix is initialised to false, so it can be easily created with the function `false`. Such as:
 ```julia
        outputs = falses(size(outputs));
    ``` 
  * Finally, the values of the indices containing the largest values of each row, which are collected in the `indicesMaxEachInstance` variable created earlier, are set to true in this array.  This can be done by comparing with the `outputs` array as follows:
 ```julia
        outputs[indicesMaxEachInstance] .= true;
    ``` 

In [75]:
function classifyOutputs(outputs::AbstractArray{<:Real,2}; 
                        threshold::Real=0.5)
    if size(outputs, 2) == 1
            output = dataset .>= threshold
    else
        bool_array = falses(size(outputs, 1), size(outputs, 2))
        max_val, max_val_idx = findmax(outputs, dims=2)
        outputs[max_val_idx] .= true
    end
    return outputs
end

classifyOutputs (generic function with 1 method)

5.  Develop a function called `accuracy` given a matrix of desired outputs (`targets`) and a matrix of outputs emitted by a model (not necessarily for an ANN), calculate the accuracy in a classification problem.  Both matrices should have a number of rows equal to the number of patterns, i.e. each pattern will be placed in each row. Develop this function in such a way that it works for the cases of having 2 classes (one output neuron) as well as more than two classes (one output neuron per class).

    To develop this function, four functions with the same name must be carried out:
    
    * In the first one, `targets` and `outputs` are of type `AbstractArray{Bool,1}`, i.e. vectors of boolean values.  The precision will simply be the average value of the comparison of both vectors.

In [None]:
function accuracy(outputs::AbstractArray{Bool,1}, targets::AbstractArray{Bool,1}) 
    #TODO

  * In the second of the functions, `targets` and `outputs` are of type `AbstractArray{Bool,2}`, i.e. two-dimensional arrays of Boolean values.  In this case, it will be necessary to examine the number of columns.  If they have only one column, a call to the above function will be made taking the first column of targets and outputs as vectors. If the number of columns is greater than 2, it will be necessary to compare both matrices to see in which rows the values do not match.
  
### Question
 What would happen if the number of columns is equal to 2? (put your answer into a comment in the code below)
 

In [None]:
function accuracy(outputs::AbstractArray{Bool,2}, targets::AbstractArray{Bool,2}) 
    #TODO

  * In the third of the functions, `targets` will be of type `AbstractArray{Bool,1}` (a vector of boolean values) while `outputs` will be of type `AbstractArray{<:Real,1}`, i.e. real outputs that have not yet been interpreted as "positive"/"negative" class membership values.  In this case, the function could accept an optional `threshold` parameter with a default value of 0.5, and what it would do would be to pass the threshold to the outputs vector, and call the first `accuracy` function.

In [None]:
function accuracy(outputs::AbstractArray{<:Real,1}, targets::AbstractArray{Bool,1};      
                threshold::Real=0.5)
    #TODO

  * Finally, in the last of the functions, `targets` will be of type `AbstractArray{Bool,2}` (an array of boolean values) while `outputs` will be of type `AbstractArray{<:Real,2}`, i.e. be real outputs that have not yet been interpreted as values belonging to N classes (N: number of columns). In this case, it is again necessary to distinguish whether you have 1 or more than 2 columns. In the first case, a call to `accuracy` should be made with the vectors corresponding to the first column of outputs and targets. In the second case, a call should be made to the function classifyOutputs to convert outputs into a variable of type AbstractArray{Bool,2}, and then make a call to `accuracy`.

In [None]:
function accuracy(outputs::AbstractArray{<:Real,2}, targets::AbstractArray{Bool,2};
                threshold::Real=0.5)
    #TODO

As in the previous exercise, due to the aim of this one is to develop the vectorial programming skills of the students, the main steps will be pointed out considering that we want ot calculate the accurry of more than 2 classes, the samples are in the rows of the matrices, and both outputs and targgets matrices are `AbstractArray{Bool, 2}`.

  * Once the outputs variable has the desired form (array of Boolean values), it is compared with the targets array as follows: 
   ```julia
        classComparison = targets .== outputs
    ```
  * In this new array, for each pattern, when the class matches, all elements of that row will be true. On the other hand, when the class does not match, more than one element in that row will be false. Therefore, one way to know for a pattern if it is well sorted is to look if in its row all elements are true.  This can be checked with the function `all`, which receives an array and returns true if all elements are true, but also accepts the keyword `dims`, with which it applies this same function across the given dimension. To do this on rows, you would have to do like this:
   ```julia
        correctClassifications = all(classComparison, dims=2)
    ```
  * Finally, the only thing left to do is to average this matrix. Remember that you can operate with boolean values in the same way as with real values, in this case they will be treated as 0 or 1. 
     ```julia
        accuracy = mean(correctClassifications)
    ```
  * These last steps could have been performed by looking, instead of the matches between the two arrays, at the patterns where there are no matches.  For that you can use the function `any`, which receives an array of boolean values and returns true if there is any value equal to true, and also accepts the keyword `dims`. This would calculate the error rate instead of the accuracy, but the accuracy can be calculated from the error rate by simply subtracting it from 1. The code would look like:
      ```julia
        classComparison = targets .!= outputs 
        incorrectClassifications = any(classComparison, dims=2)
        accuracy = 1 - mean(incorrectClassifications)
    ```

6. Develop a function to create ANNs to solve classification problems.  This function must receive the `topology` (number of hidden layers and neurons in each, and optionally activation functions in each hidden layer, the number of input neurons and the number of output neurons).
   **Important** It is worth pointing out that the transfer function of the output layer is not given by the user but by the problem itself (regression/classification).  Similarly, the number of neurons in the input and output layers is given by the problem to be solved.
   A straightforward way to create this ANN is to receive the topology as a parameter called topology of type AbstractArray{<:Int,1}, which contains the number of neurons in each hidden layer (empty for networks without hidden layers), and create the ANN as follows:
   * Create an empty ANN
   ```julia
            ann = Chain();
   ```
   * Create a variable called numInputsLayer, initialized to the number of inputs of the ANN.
   * If there are hidden layers, i.e. if the `topology` vector is not empty, iterate through this vector (the value of the loop will be equal to the number of neurons in each layer) and in each iteration create a hidden layer where the number of inputs will be equal to the value of the numInputsLayer variable and the number of outputs equal to the current value of the loop. If the transfer function of each hidden layer has not been specified, use the same transfer function σ in all hidden layers. After this, update the value of numInputsLayer to the value used in that iteration.
   ```julia
            for numOutputsLayer = topology      
                ann = Chain(ann..., Dense(numInputsLayer, numOutputsLayer, σ) );      
                numInputsLayer = numOutputsLayer; 
            end;
   ```
   If these lines are written in the script without being inside a loop or function, compiling the code will give several warnings and the code will not properly work. This is automatically corrected by using this code inside a function.  To use it in the main body, you should write `global ann, numInputsLayer;` at the beginning of the loop.
   * Finally, add the final layer, with the number of neurons and transfer function appropriate to the number of classes as described above, adding the `softmax` function if there are more than two output classes. 


In [None]:
function buildClassANN(numInputs::Int, topology::AbstractArray{<:Int,1}, numOutputs::Int;
                    transferFunctions::AbstractArray{<:Function,1}=fill(σ, length(topology))) 
    #TODO                                                  

7. Develop a function that creates an ANN to perform classification (via a call to the above function) and trains it.   To do this, this function should implement a loop in which the ANN is trained with the training set passed as a parameter until one of the stop criteria passed as parameters is met.  The function should return the trained ANN.  The argumentes shopuld be:
  * **topology**, of type `AbstractArray{<:Int,1}` with the topology (hidden layers) of the ANN. 
  * **dataset**, of type `Tuple{AbstractArray{<:Real,2},  AbstractArray{Bool,2}} with the  matrix of inputs and targets.  From these matrices, the number of input and output neurons necessary to call the previous function can be obtained by means of the size function.
  * Optional parameters controlling other aspects of the algorithm such as the stopping criteria of the algorithm, with default values:
    * **maxEpochs**, of type Int, with a default value of 1000
    * **minLoss**, of type Real, with a default value of 0
    * **learningRate**, of type Real, with a default value of 0.01.
    
  It is important to highlight that the set of patterns used here will have each pattern in a row, while the Flux library (in general, most Genetic Algorithms libraries do) expect matrices in which each pattern is in a column.  This is not a problem, matrices only need to be transposed on certain occasions, such as when supplying the function to train an ANN cycle, when calculating the loss value, or when taking the output matrices obtained by running the ANN.
  
  Bear in mind that a call to the train! function trains the ANN for one loop only. Therefore, it is necessary, first, to create the ANN by calling the above function, and then to run a loop that calls the `train!` function only once in each iteration.  That is, if you want to train for n cycles, this loop will have to perform `n` iterations.
  
  The output of this function should be at least the trained ANN and a vector with the loss values for each training cycle.

In [77]:
function trainClassANN(topology::AbstractArray{<:Int,1},      
                    dataset::Tuple{AbstractArray{<:Real,2}, AbstractArray{Bool,2}};
                    transferFunctions::AbstractArray{<:Function,1}=fill(σ, length(topology)),
                    maxEpochs::Int=1000, minLoss::Real=0.0, learningRate::Real=0.01) 
    
    # Create ANN and loss function for classification problems
    inputs, outputs = dataset
    n_inputs, n_outputs = size(inputs,2), size(outputs,2)
    ann = buildClassANN(n_inputs, topology, n_outputs);
    loss(x,y) = (size(y,1) == 1) ? Flux.Losses.binarycrossentropy(ann(x),y) : Flux.Losses.crossentropy(ann(x),y);
    
    # Compute base array
    losses = Float64[];
    
    # Compute initial epoch metrics
    current_epoch = 0;                                 
    ann_outputs = ann(inputs')    
    loss = loss(inputs', ann_outputs');
    acc = accuracy(inputs, ann_outputs);
    
    # Train
    while current_epoch < maxEpoch || loss < minLoss
        Flux.train!(loss, params(ann), [(inputs', outputs')], ADAM(learningRate));
        current_epoch += 1
        ann_outputs = ann(inputs')    
        loss = loss(inputs', ann_outputs');
        acc = accuracy(inputs, ann_outputs);
        push!(losses, loss)
        println("epoch:",current_epoch,"; loss:",loss,"; accuracy",acc,"...")
    end
    
    return (ann, loss)
end
    

trainClassANN (generic function with 1 method)

8. Occasionally, when the classification problem is of two classes, instead of having the desired outputs as a single column matrix, it will be as a vector.  To deal with these cases, a function of the same name as above and accepting the same arguments is requested, except that the second argument, dataset, is of type `Tuple{AbstractArray{<:Real,2}, AbstractArray{Bool,1}}`. This function should only convert the desired output vector into an array with one column and call the above function.  To do this conversion, see the reshape function.

In [81]:
function trainClassANN(topology::AbstractArray{<:Int,1},      
                    (inputs, targets)::Tuple{AbstractArray{<:Real,2}, AbstractArray{Bool,1}};      
                    transferFunctions::AbstractArray{<:Function,1}=fill(σ, length(topology)),      
                    maxEpochs::Int=1000, minLoss::Real=0.0, learningRate::Real=0.01)
    targets_reshape = reshape(targets, (size(targets,1),1))
    trainClassANN(topology, (inputs, targets_reshape), transferFunctions, maxEpoch, minLoss, learningRate)
end

trainClassANN (generic function with 2 methods)

#### Warning
Remember to integrate this functions with the code developed on the previous tutorial in a separate file.

Once the training function returns a trained ANN, it can be used to simulated on different problems by passing it a set of inputs (with the patterns in columns, i.e. transposed) and will return the outputs for that set (again, the patterns will be in columns, so the outputs of the ANN will have to be transposed).  These outputs, along with the desired outputs for that set of inputs, can be applied to the accuracy function to calculate the accuracy on that set. 

Run multiple times to train different networks with different architectures - which one will give the highest accuracy on the training set? Also test different learning rate values.

`Answer here`

Repeat the experiments performed above, with the unstandardised data, and compare the results with the standardised data. Are there important differences when using standardised data? Which one is the est topology?

`Answer here`

# Julia Notes

As any other language, Julia allows the creation of functions. There are several ways to create functions, the most common are these two:

1. When the operation to be performed is simple, the function can be declared on one line without the need for the reserved word function.  This is the case for operations that can be performed in one or a few lines of code.  In the latter case, parentheses can be used to enclose the actions to be performed, and `;` to separate them.  The value to be returned by the function will be the last thing to be evaluated, although the reserved word `return` can also be used.  Here are a couple of examples of this way of declaring functions:
    ```julia
         add(x::Float32, y::Float32) = x+x; 
         mse(outputs::Array{Float32,1}, targets:: Array{Float32,1}) = mean((targets.-outputs).^2);
         avgGreaterThan0(valores::Array{Float32,1}) = mean(values[values.>0]); 
         avgGreaterThan0(valores::Array{Float32,1}) = ( positives=values.>0; mean(values[positives]); )   
    ```
2. In many other cases, a function may perform many complex operations that are impractical to write on a single line.  In this case, you can declare the function with the reserved word `function`, and return the result with the reserved word `return`.  As is common in programming languages, evaluating `return` immediately exits the function.  If `return` is not used, the result returned by function will be the last one evaluated.  The example above is shown below: 
   ```julia
         function avgGreaterThan0(Values::Array{Float32,1})    
            positives=values.>0;    
            return mean(Values[positives]); 
         end;  
    ```
  When passing parameters, it is not mandatory, but recommended, to indicate the type of the parameters.  If this is not done, they are assumed to be of type `Any`.  However, a common practice in Julia is to overload functions, i.e. to define functions with the same name but different parameters or of different types.  When calling a function, the correct function will be executed, if any of the ones defined match the parameters passed. This allows different behaviours to be defined. When a function call is made, Julia checks that there exists in memory a definition with that name in which the types of the passed arguments match the defined ones and, if it exists, it will execute it with those parameters.
  
  In the examples above, the parameters have been defined as `Float32` or `Array{Float32,2}`, which means that the functions are defined for those specific types, and therefore, if calls are made with parameters of type `Float64` or `Array{Float64,2}`, a suitable function definition will not be found, and an error will be raised.  To solve this, it is worthy to apply the subtyping properties as seen in the previous tutorial to define the most generic types that the function can work with. For example, instead of using `Float32`, you can use `AbstractFloat`, `Real` or `Number` types (be aware that the latter type includes complex numbers). Instead of using `Array{Float32,2}`, you could use `Array{<:AbstractFloat}`, `Array{<:Real}` or `Array{<:Number}`. Additionaly, it is needed to take into remember that there are other types that behave like arrays but are not arrays. An important example that will be used in this subject is when transposing arrays, which creates an object that is not of type `Array`, but of type `LinearAlgebra.Adjoint`, but that can be used as if it were an array because its operations are defined as such.  Both `Array` and `LinearAlgebra.Adjoint` are subtypes of `AbstractArray`, as can be seen in the following example: 

In [None]:
m = [1. 2.; 3. 4.]; 

In [None]:
isa(m, Array)

In [None]:
isa(m, AbstractArray)

In [None]:
typeof(m') 

In [None]:
isa(m', Array)

In [None]:
isa(m', AbstractArray)

Since both types, `Array` and `LinearAlgebra.Adjoint` are subtypes of `AbstractArray`, to allow an argument to be of one type or the other, this will be the type to be used in the function arguments. Therefore, the above functions will be as follows: 

   ```julia
        add(x::Real, y:: Real) = x+x;
        mse(outputs::AbstractArray{<:Real,1},targets::AbstractArray{<:Real,1})=mean((targets.-outputs).^2)
        avgGreaterThan0(values::AbstractArray{<:Real,1}) = mean(values[values.>0]); 
        avgGreaterThan0(values:: AbstractArray{<:Real,1})=(positives=values.>0; mean(values[poitives]);) 
        
        function avgGreaterThan0(values::AbstractArray{<:Real,1})    
            positives=values.>0;    
            return mean(Values[positives]); 
        end;  
   ```
When an argument is a vector or array of boolean values, as is the case of the desired outputs in a classification problem, the type to use will be `AbstractArray{Bool,N}`, where N is the dimensionality of the array.  As mentioned in the previous tutorial, this type is a supertype of both `Array{Bool,N}` and `BitArray{N}`. 

An interesting feature of Julia functions is that they can return more than one value. This is done by using the type `Tuple{...}`, which designates tuples where each element has a certain type, for example `Tuple{Float32,Float32}`, `Tuple{Array{Float32,2}, Int64}` or `Tuple{Float32, Tuple{Int64, Int4}}`. In general, if you want to return more than one element, you return a tuple with the elements you want to return, for example:

   ```julia
        function avgGreaterThan0(values::AbstractArray{<:Real,1})    
            positives=valores.>0;    
            return (positives, mean(values[positives])) 
        end; 

        (positives, average) = avgGreaterThan0([1.2 , -1.3 , 5.5 , -3.8 , -2.1])
   ```

When creating tuples where all elements have the same type, a simplified way to define the type is to use `NTuple`, specifying the number of elements and the type. For example:

In [None]:
Tuple{Float64,Float64}==NTuple{2,Float64}

As was aforementioned, the names of those functions that modify the values of one of their arguments are usually terminated with `!` by convention.