# The ScikitLearn.jl library

The Scikit-learn library is an open source machine learning library developed for the Python programming language, the first version of which dates back to 2010. It implements a large number of machine learning models, related to tasks such as classification, regression, clustering or dimensionality reduction. These models include Support Vector Machines (SVM), decision trees, random forests, or k-means. It is currently one of the most widely used libraries in the field of machine learning, due to the large number of functionalities it offers as well as its ease of use, since it provides a uniform interface for training and using models. The documentation for this library is available at https://scikit-learn.org/stable/.

For Julia, the ScikitLearn.jl library implements this interface and the algorithms contained in the scikit-learn library, supporting both Julia's own models and those of the scikit-learn library. The latter is done by means of the PyCall.jl library, which allows code written in Python to be executed from Julia in a transparent way for the user, who only needs to have ScikitLearn.jl installed. Documentation for this library can be found at https://scikitlearnjl.readthedocs.io/en/latest/.

As mentioned above, this library provides a uniform interface for training different models. This is reflected in the fact that the names of the functions for creating and training models will be the same regardless of the models to be developed. In the assignments of this course, in addition to ANNs, the following models available in the scikit-learn library will be used:

- Support Vector Machines (SVM)
- Decision trees
- kNN

In order to use these models, it is first necessary to import the library (using ScikitLearn, which must be previously installed with

```Julia
import Pkg;
Pkg.add("ScikitLearn"))
```

The scikit-learn library offers more than 100 types of  different models. To import the models to be used, you can use @sk_import. In this way, the following lines import respectively the first 3 models mentioned above that will be used in the practices of this subject:

```Julia
@sk_import svm: SVC
@sk_import tree: DecisionTreeClassifier
@sk_import neighbours: KNeighborsClassifier
```

When training a model, the first step is to generate it. This is done with a different function for each model. This function receives as parameters the model's own parameters. Below are 3 examples, one for each type of model that will be used in these course assignments:

```Julia
model = SVC(kernel="rbf", degree=3, gamma=2, C=1);
model = DecisionTreeClassifier(max_depth=4, random_state=1);
model = KNeighborsClassifier(3);
```

An explanation of the parameters accepted by each of these functions can be found in the library documentation. In the particular case of decision trees, as can be seen, one of these parameters is called `random_state`. This parameter controls the randomness in a particular part of the tree construction process, namely in the selection of features to split a node of the tree. The Scikit-Learn library uses a random number generator in this part, which is updated with each call, so that different calls to this function (together with its subsequent calls to the `fit!` function) to train the model will result in different models. To control the randomness of this process and make it deterministic, it is best to give it an integer value as shown in the example. Thus, the creation of a decision tree with a set of desired inputs and outputs and a given set of hyperparameters is a deterministic process. In general, it is more advisable to be able to control the randomness of the whole model development process (cross-validation, etc.) by means of a random seed that is set at the beginning of the whole process.

Once created, any of these models can be adjusted with the `fit!` function.

### Question

What does the fact that the name of this function ends in bang (!) indicate?

`The exclamation mark (!) at the end of a function name signifies that the function changes the object it operates on directly, rather than creating a new version of that object.`

`It means that the model received as parameter is modified inside the function`

`For example, the fit! function updates a machine learning model in place, modifying the original model instead of producing a new one. This approach can improve memory usage and performance because it avoids duplicating data unnecessarily.`


Contrary to the Flux library, where it was necessary to write the ANN training loop, in this library the loop is already implemented, and it is called automatically when the `fit!` function is executed. Therefore, it is not necessary to write the code for the training loop.

### Question

As in the case of ANNs, a loop is necessary for training several models. Where in the code (inside or outside the loop) will you need to create the model? Which models will need to be trained several times and which ones only once? Why?

`The model has to be created inside the loop, to have several models.`

`Deterministic models, such as decision trees, k-nearest neighbors (KNN), and support vector machines (SVM), are generally trained only once since their training processes produce the same results given the same inputs and parameters. In contrast, non-deterministic models like artificial neural networks (ANNs) can show variability in their results due to factors such as random weight initialization. ANNs often require multiple training iterations with different initializations or data subsets. To improve reliability, the results from these various training runs can be combined, typically by calculating statistical measures like the mean, which offers a more consistent representation of the model's performance or standard deviation, which represent the variability across the trained models.`

An example of the use of this function can be seen in the following line:

```Julia
fit!(model, trainingInputs, trainingTargets);
```

As can be seen, the first argument of this function is the model, the second is an array of inputs, and the third is a vector of desired outputs. It is important to realise that this parameter with the desired outputs is not an array like in the case of ANNs but a vector whose each element will correspond to the label associated to that pattern, and can be of any type: integer, string, etc. The main reason for this is that there are some models that do not accept desired outputs with the one-hot-encoding.

An important issue to consider is the layout of the data to be used. As has been shown in previous assignments, the patterns must be arranged in columns to train an ANN, being each row an attribute. Outside the world of ANNs, and therefore with the rest of the techniques to be used in this course, the patterns are usually assumed to be arranged in rows, and therefore each column in the input matrix corresponds to an attribute, being a much more intuitive way.

### Question

Which condition must the matrix of inputs and the vector of desired outputs passed as an argument to this function fulfil?

`The main condition is that the data must be correctly formatted. If we have an input matrix with dimensions (m, n), where m represents the number of samples and n is the number of features or input variables, then the output vector must have dimensions (1, m), providing one output value for each input sample.`


Finally, once the model has been trained, it can be used to make predictions. This is done by means of the predict function. An example of its use is shown below:

```Julia
testOutputs = predict(model, testInputs);
```

The model being used is an in-memory structure with different fields, and it can be very useful to look up the contents of these fields. To see which fields each model has, you can write the following:

```Julia
println(keys(model));
```

Depending on the type of model, there will be different fields. For example, for a kNN, the following fields, among others, could be consulted:

```Julia
model.n_neighbors
model.metric
model.weights
```

For an SVM, some other interesting fields could be the following:

```Julia
model.C
model.support_vectors_
model.support_
model.support_
```

In the case of an SVM, a particularly interesting function is `decision_function`, which returns the distances to the hyperplane of the passed patterns. This is useful, for example, to implement a "one-against-all" strategy to perform multi-class classification. An example of the use of this function is shown below:

```Julia
distances = decision_function(model, inputs);
```

### Question

In the case of using decision trees or kNN, a corresponding function is not necessary to perform the "one-against-all" strategy, why?

`This is because, in contrast to algorithms like ANN, decision trees and k-nearest neighbors can naturally address multi-class classification problems.`

`These models can directly perform a multiclass classification. The one-against all strategy is used with binary classification models (for example, the SVM, or ANN with sigmoid activation function).`

However, the SVM implementation in the Scikit-Learn library already allows multi-class classification, so it is not necessary to use a "one-against-all" strategy for these cases.

Finally, it should be noted that these models usually receive pre-processed inputs and outputs, with the most common pre-processing being the normalisation already described in a previous assignment. Therefore, the developed normalisation functions should also be used on the data to be used by these models.

In this assignment, you are asked to develop a function called ```modelCrossValidation``` based on the functions developed in previous assignments that allows to validate models in the selected classification problem using the three techniques described here.

This function should perform cross-validation and use the metrics deemed most appropriate for the specific problem. This cross-validation can be done by modifying the code developed in the previous assignment.

This function must receive the following parameters:

- Algorithm to be trained, among the 4 used in this course, together with its parameter. The most important parameters to specify for each technique are:
    </br>
    
    - ANN
        - Architecture (number of hidden layers and number of neurons in each hidden layer) and transfer funtion in each layer. In "shallow" networks such as those used in this course, the transfer function has less impact, so a standard one, shuch as `tansig` or `logsig`, can be used.
        - Learning rate
        - Ratio of patterns used for validation
        - Number of consecutive iterations without improving the validation loss to stop the process
        - Number of times each ANN is trained.
        
        ### Question
        
        Why should a linear transfer function not be used for neurons in the hidden layers?
        
        `Using a linear transfer function in the hidden layers limits the network’s expressive power. When a linear function is used in these layers, the network’s output remains a linear transformation of the input, regardless of the depth of the network. As a result, it’s incapable of capturing non-linear patterns in the data.`
        
        `Instead, non-linear transfer functions like ReLU or sigmoid are preferred, as they enable the network to represent complex, non-linear relationships, thus enhancing its learning capability.`
        
        ### Question
        
        The other models do not have the number of times to train them as a parameter. Why? If you train several times, Which statistical properties will the results of these trainings have?

        `Unlike neural networks, the other models in question are deterministic and non-iterative, meaning they don’t require repeated training over epochs. These models don’t perform iterative parameter updates and aren’t trained over multiple epochs.`
        
       `Training these deterministic models multiple times on the same dataset with the same parameters will yield identical results each time. This consistency means the results have a standard deviation of 0. Thus, during cross-validation, these models only need to be trained once per fold, unlike neural networks which may require multiple runs due to their iterative nature.`
    </br>  
    
    - SVM
        - Kernel (and kernel-specific parameters)
        - C
        
    </br>  
    - Decision trees
        - Maximum tree depth
        
    </br>  
    - kNN
        - k (number of neighbours to be considered)

    </br>        
- Already standardised input and desired outputs matrices.
    </br>  

    - As stated above, the desired outputs must be indicated as a vector where each element is the label corresponding to each pattern (therefore, of type `Array{Any,1}`). In the case of ANN training, the desired outputs shall be encoded as done in previous assignments.
    
    ### Question
    
    Has it been necessary to standardise the desired outputs? Why?
    
    `It’s  not necessary to standardize target outputs in machine learning models. The target outputs, which are often categorical labels in classification tasks, do not require scaling, as they are not treated as continuous values by the model. In classification, the outputs indicate class membership rather than numerical values that need to be on a common scale. For instance, one-hot encoding is typically used to represent each class as a probability distribution in the range [0, 1]. This encoding provides a straightforward representation that doesn’t require further standardization. However, in regression tasks, where output values can vary widely, standardizing the outputs can sometimes help the model by reducing variability, especially if the output range is large.`
    
    </br>  
    - As previously described, in the case of using techniques such as SVM, decision trees or kNN, the one-hot-encoding configuration will not be used. In these cases, the `confusionMatrix` function developed in a previous assignment will be used to calculate the metrics, which accepts as input two vectors (outputs and desired outputs) of type `Array{Any,1}`.
    
    </br>  
- Cross-validation indices. It is important to note that, as in the previous assignment, the partitioning of the patterns in each fold need to be done outside this function, because this allows this same partitioning to be used then training other models. In this way, cross-validation is performed with the same data and the same partitions in all classes.

Since most of the code will be the same, do not develop 4 different functions, one for each model, but only one function. Inside it, at the time of generation the model in each fold, and depending on the model, the following changes should be made:

- If the model is an ANN, the desired outputs shall be encoded by means of the code developed in previous assignments. As this model is non-deterministic, it will be nevessary to make a new loop to train several ANNs, splitting the training data into training and validation (if validation set is used) and calling the function defined in previous assignments to create and traing an ANN.

- If the model is not an ANN, the code that trains the model shall be developed. This code shall be the same for each of the rematining 3 types of models (SVM, decision trees, and KNN), with the line where the model is called being the only difference.

In turn, this function should return, at least, the values for the selected metrics. Once this function has been developed, the experimental part of the assignment begins. The objective is to determine which model with a specific combination of hyperparameters offers the best results, for which the above function will be run for each of the 4 types of models, and for each model it will be run with different values in its hyperparameters.

- The results obtained should be documented in the report to be produced, for which it will be useful to show the results in tabular and/or graphical form.

- When it comes to displaying a confusion matrix in the report, an important question is which one to show given that a lot of trainings have been performed. The cross-validation technique does not generate a final model, but allows comparing different algorithms and configurations to choose the model or parameter configuration that returns the best results. Once chosen, it is necessary to train a "final" model from scratch by using all the patterns as the training set, that is, without separating patterns for testing. In this way, the performance of this model and configuration is expected to be slightly higher than that obtained through cross-validation, since more patterns have been used to train it. This is the final model that would be used in production, and from which a confusion matrix can be obtained.

In [30]:
import Pkg;
using Pkg
Pkg.add("Flux")
Pkg.add("ScikitLearn")
Pkg.add("OptimizationOptimisers");
using ScikitLearn

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`


In [31]:
using CSV
using DataFrames
using Random
using Statistics
using Flux

iris_df = CSV.read("./data/iris/iris.data", DataFrame)

inputs = Matrix(iris_df[:,1:4])
targets = iris_df[:, 5]

149-element PooledArrays.PooledVector{String15, UInt32, Vector{UInt32}}:
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 ⋮
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"

In [32]:
using Random
include("functionsLibrary.jl")

Random.seed!(888);
sizeDataset = size(inputs, 1)
println(sizeDataset)
(trainIndex, testIndex) = holdOut(sizeDataset, 0.2)

trainInputs         = convert(Array{Real}, inputs[trainIndex, :])
trainTargets        = targets[trainIndex, :]
trainTargets        = trainTargets[:, 1]

testInputs          = convert(Array{Real}, inputs[testIndex, :])
testTargets         = targets[testIndex, :]
testTargets         = testTargets[:, :] # Convert to vector

trainingDataset     = (trainInputs, trainTargets);
testDataset         = (testInputs, testTargets);

149


In [33]:
inputsNormalized            = normalizeMinMax(convert(Array{Real}, inputs));
trainInputsNormalized       = normalizeMinMax(convert(Array{Real}, trainInputs));
testInputsNormalized        = normalizeMinMax(convert(Array{Real}, testInputs));

import Random
XValVector          = crossvalidation(trainTargets, 10);

In [34]:
@sk_import svm: SVC;
@sk_import tree: DecisionTreeClassifier;
@sk_import neighbors: KNeighborsClassifier;



In [35]:
function modelCrossValidation(modelType::Symbol,
    modelHyperparameters::Dict,
    inputs::AbstractArray{<:Real,2},
    targets::AbstractArray{<:Any,1},
    crossValidationIndices::Array{Int64,1}) 


    @assert (length(crossValidationIndices) == size(targets, 1) == size(inputs, 1))

    if modelType == :ANN
        topologyANN             = modelHyperparameters["topology"]
        transferFunctionsANN    = modelHyperparameters["transferFunctions"]
        maxEpochsANN            = modelHyperparameters["maxEpochs"]
        maxEpochsValANN         = modelHyperparameters["maxEpochsVal"]
        learningRateANN         = modelHyperparameters["learningRate"]
        validationRatioANN      = modelHyperparameters["validationRatio"]

        targetsANN = oneHotEncoding(targets)

        trainingDatasetANN = (inputs, targetsANN)

        return trainClassANN(topologyANN, trainingDatasetANN, crossValidationIndices, transferFunctions=transferFunctionsANN, maxEpochs=maxEpochsANN, maxEpochsVal=maxEpochsValANN,
                        learningRate=learningRateANN)


    elseif modelType == :kNN
        k = modelHyperparameters["k"]
        model = KNeighborsClassifier(k);
 
    elseif modelType == :SVM
        
        # Extracting data from dictionary
        kernelSVM               = modelHyperparameters["kernel"]
        degreeSVM               = modelHyperparameters["degree"]
        gammaSVM                = modelHyperparameters["gamma"]
        cSVM                    = modelHyperparameters["c"]
        # Model
        model = SVC(kernel=kernelSVM, degree=degreeSVM, gamma=gammaSVM, C=cSVM);

    elseif modelType == :tree
        # Extracting data from dictionary
        treeDepth  = modelHyperparameters["depth"]
        # Model
        model = DecisionTreeClassifier(max_depth=treeDepth, random_state=1);

    end
    numFolds = maximum(crossValidationIndices);

    numMetrics = 7;
    metrics = Matrix{Float64}(undef, numFolds, numMetrics);

    sensivity   =   zeros(Float64, numFolds)
    specificity =   zeros(Float64, numFolds)
    PPV         =   zeros(Float64, numFolds)
    NPV         =   zeros(Float64, numFolds)
    Fscore      =   zeros(Float64, numFolds)

    accuracyVector      = zeros(Float64, numFolds)
    errorRateVector     = zeros(Float64, numFolds)
    recallVector        = zeros(Float64, numFolds)
    specificityVector   = zeros(Float64, numFolds)
    PPVVector           = zeros(Float64, numFolds)
    NPVVector           = zeros(Float64, numFolds)
    FscoreVector        = zeros(Float64, numFolds)

    globalCMatrix = zeros(Float64, size(targets, 2), size(targets, 2))
    

    for numFold in 1:numFolds
        trainingInputs  = inputs[crossValidationIndices.!=  numFold,:]; 
        valInputs   = inputs[crossValidationIndices.==  numFold,:];
        
        trainingTargets = targets[crossValidationIndices.!= numFold,:];
        valTargets  = targets[crossValidationIndices.== numFold,:];

        fit!(model, trainingInputs, trainingTargets)

        valOutputs = predict(model, valInputs)

        (accuracyVal, errorRateVal, sensivityVal, specificityVal, 
        PPVVal, NPVVal, FscoreVal, globalCMatrix) = confusionMatrix(valOutputs, valTargets[:,1])
        
        accuracyVector[numFold]     = accuracyVal
        errorRateVector[numFold]    = errorRateVal
        recallVector[numFold]       = sensivityVal
        specificityVector[numFold]  = specificityVal
        PPVVector[numFold]          = PPVVal
        NPVVector[numFold]             = NPVVal
        FscoreVector[numFold]       = FscoreVal


    end;

    accuracyMean    = mean(accuracyVector)
    errorRateMean   = mean(errorRateVector)
    recallMean      = mean(recallVector)
    specificityMean = mean(specificityVector)
    PPVMean         = mean(PPVVector)
    NPVMean         = mean(NPVVector)
    FscoreMean      = mean(FscoreVector)
    cMatrixMean     = mean(globalCMatrix)

    return ([accuracyMean, errorRateMean, recallMean, specificityMean, PPVMean, NPVMean, FscoreMean], globalCMatrix)
    
end

modelCrossValidation (generic function with 1 method)

In [36]:
include("functionsLibrary.jl")


trainClassANN (generic function with 4 methods)

##### TEST FUNCTION

In [37]:
ANNdict1     = Dict("topology" => [26,8], "transferFunctions" => [σ, σ], "maxEpochs" => 1000, "maxEpochsVal" => 20, "learningRate" => 0.01, "validationRatio" => 0.3)
KNNdict1     = Dict("k" => 5)
SVMdict1     = Dict("kernel" => "rbf", "degree" =>3, "gamma" => 2, "c" => 1)
treedict1    = Dict("depth" => 4)

ANNdict2     = Dict("topology" => [10,10], "transferFunctions" => [σ, σ], "maxEpochs" => 1000, "maxEpochsVal" => 20, "learningRate" => 0.01, "validationRatio" => 0.3)
KNNdict2     = Dict("k" => 7)
SVMdict2     = Dict("kernel" => "rbf", "degree" =>4, "gamma" => 2, "c" => 1)
treedict2    = Dict("depth" => 6)

ANNdict3     = Dict("topology" => [4,3], "transferFunctions" => [σ, σ], "maxEpochs" => 1000, "maxEpochsVal" => 20, "learningRate" => 0.01, "validationRatio" => 0.3)
KNNdict3     = Dict("k" => 3)
SVMdict3     = Dict("kernel" => "rbf", "degree" =>2, "gamma" => 2, "c" => 1)
treedict3    = Dict("depth" => 3)

modelANN1 = modelCrossValidation(:ANN, ANNdict1, trainInputsNormalized, trainTargets, XValVector)
modelkNN1 = modelCrossValidation(:kNN, KNNdict1, trainInputsNormalized, trainTargets, XValVector);
modelSVM1 = modelCrossValidation(:SVM, SVMdict1, trainInputsNormalized, trainTargets, XValVector);
modelTree1 = modelCrossValidation(:tree, treedict1, trainInputsNormalized, trainTargets, XValVector);

modelANN2 = modelCrossValidation(:ANN, ANNdict2, trainInputsNormalized, trainTargets, XValVector)
modelkNN2 = modelCrossValidation(:kNN, KNNdict2, trainInputsNormalized, trainTargets, XValVector);
modelSVM2 = modelCrossValidation(:SVM, SVMdict2, trainInputsNormalized, trainTargets, XValVector);
modelTree2 = modelCrossValidation(:tree, treedict2, trainInputsNormalized, trainTargets, XValVector);

modelANN3 = modelCrossValidation(:ANN, ANNdict3, trainInputsNormalized, trainTargets, XValVector)
modelkNN3 = modelCrossValidation(:kNN, KNNdict3, trainInputsNormalized, trainTargets, XValVector);
modelSVM3 = modelCrossValidation(:SVM, SVMdict3, trainInputsNormalized, trainTargets, XValVector);
modelTree3 = modelCrossValidation(:tree, treedict3, trainInputsNormalized, trainTargets, XValVector);

Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.107021995>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.14986682>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.13398531>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.16314691>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.10768156>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.1391544>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.15184768>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.13486607>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss

  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.13551979>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.13784263>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.14962652>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.109253414>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.13443087>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.108466215>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.13275594>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.094320476>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingL

  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.07938973>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.07868914>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.07190178>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.06475118>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.07258343>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.07004556>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.065400474>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLoss 0.04804859>minLoss 0.0
numEpochsValidation 1<maxEpochsVal 20
Stop criteria:
numEpoch 1000<maxEpochs 1000
trainingLos

  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  return self._fit(X, y)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


##### COMPARE MODELS

In [38]:

function modelComparison(modelANN::Tuple{Vector{Float64}, Vector{Float64}}, modelkNN::Tuple{Vector{Float64}, Matrix{Float64}},
    modelSVM::Tuple{Vector{Float64}, Matrix{Float64}}, modelTree::Tuple{Vector{Float64}, Matrix{Float64}})
       
       accuracyIndex =     1
       errorRateIndex =    2
       recallIndex =       3
       specificityIndex =  4
       PPVIndex =          5
       NPVIndex =          6
       FscoreIndex =       7
   
   
       println("                   >>>   MODEL COMPARISON   <<< ")
       println(" ")
       println("                       ANN         kNN         SVM         TREE    ")
       println("---------------------------------------------------------------------")
       println("Accuracy:             $(round(modelANN[1][accuracyIndex], digits=3))       $(round(modelkNN[1][accuracyIndex], digits=3))        $(round(modelSVM[1][accuracyIndex], digits=3))        $(round(modelTree[1][accuracyIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println("Error rate:           $(round(modelANN[1][errorRateIndex], digits=3))       $(round(modelkNN[1][errorRateIndex], digits= 3))        $(round(modelSVM[1][errorRateIndex], digits=3))        $(round(modelTree[1][errorRateIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println("Recall:               $(round(modelANN[1][recallIndex], digits=3))       $(round(modelkNN[1][recallIndex], digits=3))        $(round(modelSVM[1][recallIndex], digits=3))        $(round(modelTree[1][recallIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println("Specificity::         $(round(modelANN[1][specificityIndex], digits=3))       $(round(modelkNN[1][specificityIndex], digits=3))        $(round(modelSVM[1][specificityIndex], digits=3))        $(round(modelTree[1][specificityIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println("PPV::                 $(round(modelANN[1][PPVIndex], digits=3))       $(round(modelkNN[1][PPVIndex], digits=3))        $(round(modelSVM[1][PPVIndex], digits=3))        $(round(modelTree[1][PPVIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println("NPV::                 $(round(modelANN[1][NPVIndex], digits=3))       $(round(modelkNN[1][NPVIndex], digits=3))        $(round(modelSVM[1][NPVIndex], digits=3))        $(round(modelTree[1][NPVIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println("Fscore::              $(round(modelANN[1][FscoreIndex], digits=3))       $(round(modelkNN[1][FscoreIndex], digits=3))        $(round(modelSVM[1][FscoreIndex], digits=3))        $(round(modelTree[1][FscoreIndex], digits=3))")
       println("---------------------------------------------------------------------")
       println(" ")
   end
   

modelComparison (generic function with 1 method)

In [39]:
modelComparison(modelANN1, modelkNN1, modelSVM1, modelTree1)
modelComparison(modelANN2, modelkNN2, modelSVM2, modelTree2)
modelComparison(modelANN3, modelkNN3, modelSVM3, modelTree3)

                   >>>   MODEL COMPARISON   <<< 
 
                       ANN         kNN         SVM         TREE    
---------------------------------------------------------------------
Accuracy:             0.922       0.946        0.966        0.932
---------------------------------------------------------------------
Error rate:           0.078       0.054        0.034        0.068
---------------------------------------------------------------------
Recall:               1.0       0.942        0.942        0.917
---------------------------------------------------------------------
Specificity::         1.0       0.988        0.988        0.962
---------------------------------------------------------------------
PPV::                 1.0       0.967        0.983        0.925
---------------------------------------------------------------------
NPV::                 1.0       0.976        0.978        0.965
---------------------------------------------------------------------
Fsc

In [40]:
function trainAnyModel(modelType::Symbol,
    modelHyperparameters::Dict,
    inputs::AbstractArray{<:Real,2},
    targets::AbstractArray{<:Any,1})

     @assert (size(targets, 1) == size(inputs, 1))

    if modelType == :ANN
        topologyANN             = modelHyperparameters["topology"]
        transferFunctionsANN    = modelHyperparameters["transferFunctions"] # What if there is an empty field?
        maxEpochsANN            = modelHyperparameters["maxEpochs"]
        maxEpochsValANN         = modelHyperparameters["maxEpochsVal"]
        learningRateANN         = modelHyperparameters["learningRate"]
        validationRatioANN      = modelHyperparameters["validationRatio"]

        targetsANN = oneHotEncoding(targets) 

        trainingDatasetANN = (inputs, targetsANN)

        model = trainClassANN(topologyANN, trainingDatasetANN, transferFunctions=transferFunctionsANN, maxEpochs=maxEpochsANN, maxEpochsVal=maxEpochsValANN,
                        learningRate=learningRateANN)[1]

        return model
 
 elseif modelType == :kNN
        k = modelHyperparameters["k"]
        model = KNeighborsClassifier(k);

    elseif modelType == :SVM
        
        kernelSVM               = modelHyperparameters["kernel"]
        degreeSVM               = modelHyperparameters["degree"]
        gammaSVM                = modelHyperparameters["gamma"]
        cSVM                    = modelHyperparameters["c"]
        # Model
        model = SVC(kernel=kernelSVM, degree=degreeSVM, gamma=gammaSVM, C=cSVM);

    elseif modelType == :tree
        treeDepth  = modelHyperparameters["depth"]
        model = DecisionTreeClassifier(max_depth=treeDepth, random_state=1);

    end

    fit!(model, inputs, targets)

    return model

end

trainAnyModel (generic function with 1 method)

In the comparison chunk of code, we can see that the model with the best metrics is the second SVM.  Therefore, we'll train the definitive model with that configuration.

In [41]:

bestModel = trainAnyModel(:SVM, SVMdict2, trainInputsNormalized, trainTargets)


In [42]:
testOutputs = predict(bestModel, testInputsNormalized)
printConfusionMatrix(testOutputs, testTargets[:,1])

 
Accuracy : 0.967
Error rate: 0.033
Recall: 1.0
Specificity: 1.0
Precision: 1.0
Negative Predictive value: 1.0
F1 Score: 1.0


### Learn Julia

In this assignment, it is necessary to pass parameters which are dependent on the model. To do this, the simplest way is to create a variable of type Dictionary (actually the type is `Dict`) which works in a similar way to Python. For example, to specify the parameters of an SVM, you could create a variable as follows:

```Julia
parameters = Dict("kernel" => "rbf", "degree" => 3, "gamma" => 2, "C" => 1);
```

Another way of defining such a variable could be the following:

```Julia
parameters = Dict();

parameters["kernel"] = "rbf";
parameters["kernelDegree"] = 3;
parameters["kernelGamma"] = 2;
parameters["C"] = 1;
```

Once inside the function to be developed, the model parameters can be used to create the model objet as follows:

```Julia
model = SVC(kernel=parameters["kernel"], 
    degree=parameters["kernelDegree"], 
    gamma=parameters["kernelGamma"], 
    C=parameters["C"]);
```

In the same way, something similar could be done for decision trees and kNN.

Another type of Julia that may be interesting for this assignment is the `Symbol` type. An object of this type can be any symbol you want, simply by typing its name after a colon (":"). In this practice, you can use it to indicate which model you want to train, for example `:ANN`, `:SVM`, `:DecisionTree` or `:kNN`.