# The ScikitLearn.jl library

The Scikit-learn library is an open source machine learning library developed for the Python programming language, the first version of which dates back to 2010. It implements a large number of machine learning models, related to tasks such as classification, regression, clustering or dimensionality reduction. These models include Support Vector Machines (SVM), decision trees, random forests, or k-means. It is currently one of the most widely used libraries in the field of machine learning, due to the large number of functionalities it offers as well as its ease of use, since it provides a uniform interface for training and using models. The documentation for this library is available at https://scikit-learn.org/stable/.

For Julia, the ScikitLearn.jl library implements this interface and the algorithms contained in the scikit-learn library, supporting both Julia's own models and those of the scikit-learn library. The latter is done by means of the PyCall.jl library, which allows code written in Python to be executed from Julia in a transparent way for the user, who only needs to have ScikitLearn.jl installed. Documentation for this library can be found at https://scikitlearnjl.readthedocs.io/en/latest/.

As mentioned above, this library provides a uniform interface for training different models. This is reflected in the fact that the names of the functions for creating and training models will be the same regardless of the models to be developed. In the assignments of this course, in addition to ANNs, the following models available in the scikit-learn library will be used:

- Support Vector Machines (SVM)
- Decision trees
- kNN

In order to use these models, it is first necessary to import the library (using ScikitLearn, which must be previously installed with

```Julia
import Pkg;
Pkg.add("ScikitLearn"))
```

The scikit-learn library offers more than 100 types of  different models. To import the models to be used, you can use @sk_import. In this way, the following lines import respectively the first 3 models mentioned above that will be used in the practices of this subject:

```Julia
@sk_import svm: SVC
@sk_import tree: DecisionTreeClassifier
@sk_import neighbours: KNeighborsClassifier
```

When training a model, the first step is to generate it. This is done with a different function for each model. This function receives as parameters the model's own parameters. Below are 3 examples, one for each type of model that will be used in these course assignments:

```Julia
model = SVC(kernel="rbf", degree=3, gamma=2, C=1);
model = DecisionTreeClassifier(max_depth=4, random_state=1);
model = KNeighborsClassifier(3);
```

An explanation of the parameters accepted by each of these functions can be found in the library documentation. In the particular case of decision trees, as can be seen, one of these parameters is called `random_state`. This parameter controls the randomness in a particular part of the tree construction process, namely in the selection of features to split a node of the tree. The Scikit-Learn library uses a random number generator in this part, which is updated with each call, so that different calls to this function (together with its subsequent calls to the `fit!` function) to train the model will result in different models. To control the randomness of this process and make it deterministic, it is best to give it an integer value as shown in the example. Thus, the creation of a decision tree with a set of desired inputs and outputs and a given set of hyperparameters is a deterministic process. In general, it is more advisable to be able to control the randomness of the whole model development process (cross-validation, etc.) by means of a random seed that is set at the beginning of the whole process.

Once created, any of these models can be adjusted with the `fit!` function.

### Question

What does the fact that the name of this function ends in bang (!) indicate?

`Answer here` fit!() takes an unfitted model and a set of data, performs model fitting using that data, and modifies the model object itself to reflect the fitted parameters.

Contrary to the Flux library, where it was necessary to write the ANN training loop, in this library the loop is already implemented, and it is called automatically when the `fit!` function is executed. Therefore, it is not necessary to write the code for the training loop.

### Question

As in the case of ANNs, a loop is necessary for training several models. Where in the code (inside or outside the loop) will you need to create the model? Which models will need to be trained several times and which ones only once? Why?

`Answer here`

An example of the use of this function can be seen in the following line:

```Julia
fit!(model, trainingInputs, trainingTargets);
```

As can be seen, the first argument of this function is the model, the second is an array of inputs, and the third is a vector of desired outputs. It is important to realise that this parameter with the desired outputs is not an array like in the case of ANNs but a vector whose each element will correspond to the label associated to that pattern, and can be of any type: integer, string, etc. The main reason for this is that there are some models that do not accept desired outputs with the one-hot-encoding.

An important issue to consider is the layout of the data to be used. As has been shown in previous assignments, the patterns must be arranged in columns to train an ANN, being each row an attribute. Outside the world of ANNs, and therefore with the rest of the techniques to be used in this course, the patterns are usually assumed to be arranged in rows, and therefore each column in the input matrix corresponds to an attribute, being a much more intuitive way.

### Question

Which condition must the matrix of inputs and the vector of desired outputs passed as an argument to this function fulfil?

`Answer here` The number of rows in the matrix of inputs must be the same as the number of instances in the desired outputs vector.

Finally, once the model has been trained, it can be used to make predictions. This is done by means of the predict function. An example of its use is shown below:

```Julia
testOutputs = predict(model, testInputs);
```

The model being used is an in-memory structure with different fields, and it can be very useful to look up the contents of these fields. To see which fields each model has, you can write the following:

```Julia
println(keys(model));
```

Depending on the type of model, there will be different fields. For example, for a kNN, the following fields, among others, could be consulted:

```Julia
model.n_neighbors
model.metric
model.weights
```

For an SVM, some other interesting fields could be the following:

```Julia
model.C
model.support_vectors_
model.support_
model.support_
```

In the case of an SVM, a particularly interesting function is `decision_function`, which returns the distances to the hyperplane of the passed patterns. This is useful, for example, to implement a "one-against-all" strategy to perform multi-class classification. An example of the use of this function is shown below:

```Julia
distances = decision_function(model, inputs);
```

### Question

In the case of using decision trees or kNN, a corresponding function is not necessary to perform the "one-against-all" strategy, why?

`Answer here` 

Decision trees can support multi-class classification without needing a "one-against-all" strategy. Each leaf node in a decision tree corresponds to a specific class, and the decision path from the root to a leaf determines the predicted class.

kNN can naturally handle multiclass classification by considering the neighbors of the query point and assigning the class that appears most frequently among those neighbors.

However, the SVM implementation in the Scikit-Learn library already allows multi-class classification, so it is not necessary to use a "one-against-all" strategy for these cases.

Finally, it should be noted that these models usually receive pre-processed inputs and outputs, with the most common pre-processing being the normalisation already described in a previous assignment. Therefore, the developed normalisation functions should also be used on the data to be used by these models.

In this assignment, you are asked to develop a function called ```modelCrossValidation``` based on the functions developed in previous assignments that allows to validate models in the selected classification problem using the three techniques described here.

This function should perform cross-validation and use the metrics deemed most appropriate for the specific problem. This cross-validation can be done by modifying the code developed in the previous assignment.

This function must receive the following parameters:

- Algorithm to be trained, among the 4 used in this course, together with its parameter. The most important parameters to specify for each technique are:
    </br>
    
    - ANN
        - Architecture (number of hidden layers and number of neurons in each hidden layer) and transfer funtion in each layer. In "shallow" networks such as those used in this course, the transfer function has less impact, so a standard one, shuch as `tansig` or `logsig`, can be used.
        - Learning rate
        - Ratio of patterns used for validation
        - Number of consecutive iterations without improving the validation loss to stop the process
        - Number of times each ANN is trained.
        
        ### Question
        
        Why should a linear transfer function not be used for neurons in the hidden layers?
        
        ```Answer here``` Linear transfer function should not be used for neurons in the hidden layers because a neural network with only linear activation functions is essentially equivalent to a single-layer perceptron, and it cannot capture the complexity needed for many real-world problems. For example, if we have a dataset with non linear relationship, a ann with linear transfer function won't be able to address the problem.
        
        ### Question
        
        The other models do not have the number of times to train them as a parameter. Why? If you train several times, Which statistical properties will the results of these trainings have?
        
         ```Answer here``` Because these algorithms typically have deterministic training procedures. Once trained on a specific dataset, the model's parameters are fixed, and retraining it on the same data multiple times would yield the same result.
    </br>  
    
    - SVM
        - Kernel (and kernel-specific parameters)
        - C
        
    </br>  
    - Decision trees
        - Maximum tree depth
        
    </br>  
    - kNN
        - k (number of neighbours to be considered)

    </br>        
- Already standardised input and desired outputs matrices.
    </br>  

    - As stated above, the desired outputs must be indicated as a vector where each element is the label corresponding to each pattern (therefore, of type `Array{Any,1}`). In the case of ANN training, the desired outputs shall be encoded as done in previous assignments.
    
    ### Question
    
    Has it been necessary to standardise the desired outputs? Why?
    
    ```Answer here``` Standardization or normalization is generally not necessary. However, if we use a ANN to a classification problem, the desired outputs shall be encoded using techniques like one-hot encoding. And if we use one of the other three models, they must be indicated as a vector where each element is the label corresponding to each pattern
    
    </br>  
    - As previously described, in the case of using techniques such as SVM, decision trees or kNN, the one-hot-encoding configuration will not be used. In these cases, the `confusionMatrix` function developed in a previous assignment will be used to calculate the metrics, which accepts as input two vectors (outputs and desired outputs) of type `Array{Any,1}`.
    
    </br>  
- Cross-validation indices. It is important to note that, as in the previous assignment, the partitioning of the patterns in each fold need to be done outside this function, because this allows this same partitioning to be used then training other models. In this way, cross-validation is performed with the same data and the same partitions in all classes.

Since most of the code will be the same, do not develop 4 different functions, one for each model, but only one function. Inside it, at the time of generation the model in each fold, and depending on the model, the following changes should be made:

- If the model is an ANN, the desired outputs shall be encoded by means of the code developed in previous assignments. As this model is non-deterministic, it will be nevessary to make a new loop to train several ANNs, splitting the training data into training and validation (if validation set is used) and calling the function defined in previous assignments to create and traing an ANN.

- If the model is not an ANN, the code that trains the model shall be developed. This code shall be the same for each of the rematining 3 types of models (SVM, decision trees, and KNN), with the line where the model is called being the only difference.

In turn, this function should return, at least, the values for the selected metrics. Once this function has been developed, the experimental part of the assignment begins. The objective is to determine which model with a specific combination of hyperparameters offers the best results, for which the above function will be run for each of the 4 types of models, and for each model it will be run with different values in its hyperparameters.

- The results obtained should be documented in the report to be produced, for which it will be useful to show the results in tabular and/or graphical form.

- When it comes to displaying a confusion matrix in the report, an important question is which one to show given that a lot of trainings have been performed. The cross-validation technique does not generate a final model, but allows comparing different algorithms and configurations to choose the model or parameter configuration that returns the best results. Once chosen, it is necessary to train a "final" model from scratch by using all the patterns as the training set, that is, without separating patterns for testing. In this way, the performance of this model and configuration is expected to be slightly higher than that obtained through cross-validation, since more patterns have been used to train it. This is the final model that would be used in production, and from which a confusion matrix can be obtained.

In [1]:
import Pkg;
Pkg.add("ScikitLearn")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`


In [2]:
using ScikitLearn
@sk_import neural_network: MLPClassifier
@sk_import svm: SVC
@sk_import tree: DecisionTreeClassifier
@sk_import neighbors: KNeighborsClassifier


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mRunning `conda install -y -c anaconda conda` in root environment


Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/martin/.julia/conda/3/x86_64

  added / updated specs:
    - conda


The following packages will be UPDATED:

  ca-certificates    conda-forge::ca-certificates-2023.7.2~ --> anaconda::ca-certificates-2023.08.22-h06a4308_0 

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            conda-forge/noarch::certifi-2023.7.22~ --> anaconda/linux-64::certifi-2023.7.22-py310h06a4308_0 



Downloading and Extracting Packages

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... 



  current version: 23.3.1
  latest version: 23.10.0

Please update conda by running

    $ conda update -n base -c conda-forge conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.10.0




done


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mRunning `conda install -y -c conda-forge 'libstdcxx-ng>=3.4,<13.0'` in root environment


Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/martin/.julia/conda/3/x86_64

  added / updated specs:
    - libstdcxx-ng[version='>=3.4,<13.0']


The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    anaconda::ca-certificates-2023.08.22-~ --> conda-forge::ca-certificates-2023.7.22-hbcca054_0 
  certifi            anaconda/linux-64::certifi-2023.7.22-~ --> conda-forge/noarch::certifi-2023.7.22-pyhd8ed1ab_0 



Downloading and Extracting Packages

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... 



  current version: 23.3.1
  latest version: 23.10.0

Please update conda by running

    $ conda update -n base -c conda-forge conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.10.0




done


PyObject <class 'sklearn.neighbors._classification.KNeighborsClassifier'>

In [3]:
using NBInclude
#@nbinclude("/home/martin/Escritorio/ML1/MIA_ML1/Unit 2 - Multilayer Perceptron_SOLVED.ipynb")
@nbinclude("/home/martin/Escritorio/ML1/MIA_ML1/Unit 5 - Cross-validation.ipynb")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`


Matrix{Float32}
Epoch 0: loss: 1.1962069
Epoch 1: loss: 1.1845782
Epoch 2: loss: 1.1737514
Epoch 3: loss: 1.1637354
Epoch 4: loss: 1.1545314
Epoch 5: loss: 1.1461293
Epoch 6: loss: 1.138514
Epoch 7: loss: 1.1316622
Epoch 8: loss: 1.1255448
Epoch 9: loss: 1.1201278
Epoch 10: loss: 1.1153723
Epoch 11: loss: 1.1112401
Epoch 12: loss: 1.107693
Epoch 13: loss: 1.104692
Epoch 14: loss: 1.1021975
Epoch 15: loss: 1.100167
Epoch 16: loss: 1.0985558
Epoch 17: loss: 1.0973163
Epoch 18: loss: 1.0963992
Epoch 19: loss: 1.0957538
Epoch 20: loss: 1.0953293
Epoch 21: loss: 1.0950766
Epoch 22: loss: 1.0949469
Epoch 23: loss: 1.0948968
Epoch 24: loss: 1.0948856
Epoch 25: loss: 1.094879
Epoch 26: loss: 1.0948478
Epoch 27: loss: 1.0947692
Epoch 28: loss: 1.094627
Epoch 29: loss: 1.0944095
Epoch 30: loss: 1.0941108
Epoch 31: loss: 1.0937289
Epoch 32: loss: 1.0932657
Epoch 33: loss: 1.0927247
Epoch 34: loss: 1.0921121
Epoch 35: loss: 1.0914333
Epoch 36: loss: 1.0906954
Epoch 37: loss: 1.0899044
Epoch 38: lo

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`


Input vector: [-1.0, -1.0, -0.2]
Output vector (softmax): [0.23665609135556676, 0.23665609135556676, 0.5266878172888664]
num_elements50
subsets1[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
subsets2[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
subsets3[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
class_indices[6, 8, 5, 9, 7, 2, 9, 5, 10, 4, 7, 1, 3, 2, 8, 6, 3, 6, 8, 4, 10, 4, 9, 9, 10, 1, 2, 10, 6, 4, 5, 8, 8, 7, 2, 9, 1, 3, 3, 7, 5, 3, 1, 2, 1, 6, 7, 4, 10, 5]
(50,)
indices[6, 8, 5, 9, 7, 2, 9, 5, 10, 4, 7, 1, 3, 2, 8, 6, 3, 6, 8, 4, 10, 4, 9, 9, 10, 1, 2, 10, 6, 4, 5, 8, 8, 7, 2, 9, 1, 3, 3, 7, 5, 3, 1, 2, 1, 6, 7, 4, 10, 5, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90

In [14]:
function modelCrossValidation(modelType::Symbol,
        modelHyperparameters::Dict,
        inputs::AbstractArray{<:Real,2},
        targets::AbstractArray{Any,1},
        crossValidationIndices::Array{Int64,1})

    
    #Check that the number of inputs is equal to the number of targets
    @assert length(inputs[:,1])==length(targets) 

    #Check that the model type is valid
    @assert modelType==:ann || modelType==:svm || modelType==:decision_tree || modelType==:knn

    #Get the number of folds
    numFolds=maximum(crossValidationIndices)
    
    #Check that the crossvalidation indices are valid
    @assert all(1 .<= crossValidationIndices .<= numFolds)    

    #Define the variables that will be returned
    accuracy_list=[]
    confusion_matrix_list=[]
    
    #If the model is not SVM, we need to one-hot encode the targets
    if modelType!= :svm
        targets=oneHotEncoding(targets)
    end
    
    #For each fold, train the model and test it
    for i = 1:numFolds
        #Get the training and test inputs
        trainInputs = inputs[crossValidationIndices.!=i,:]
        testInputs = inputs[crossValidationIndices.==i,:]

        #Get the training and test targets
        if modelType!= :svm
            trainTargets = targets[crossValidationIndices.!=i,:]
            testTargets = targets[crossValidationIndices.==i,:]            

            testTargets=BitMatrix(testTargets')
        else
            trainTargets = targets[crossValidationIndices.!=i]
            testTargets = targets[crossValidationIndices.==i]
        end


        ################
        #Train the model
        ################


        #Train ann model
        if modelType == :ann
            @assert haskey(modelHyperparameters, "hidden_layer_sizes") && haskey(modelHyperparameters, "activation") && haskey(modelHyperparameters, "learning_rate_init") && haskey(modelHyperparameters, "validation_fraction") && haskey(modelHyperparameters, "max_iter")
            model = MLPClassifier(hidden_layer_sizes=modelHyperparameters["hidden_layer_sizes"],
                                  activation=modelHyperparameters["activation"],
                                  learning_rate_init=modelHyperparameters["learning_rate_init"],
                                  validation_fraction=modelHyperparameters["validation_fraction"],
                                  max_iter=modelHyperparameters["max_iter"])
            model.fit(trainInputs, trainTargets)
            predictedTargets = model.predict(testInputs)

        #Train svm model
        elseif modelType == :svm
            @assert haskey(modelHyperparameters, "kernel") && haskey(modelHyperparameters, "C") && haskey(modelHyperparameters, "gamma") && haskey(modelHyperparameters, "degree")
            model = SVC(kernel=modelHyperparameters["kernel"], C=modelHyperparameters["C"], gamma=modelHyperparameters["gamma"], degree=modelHyperparameters["degree"])
            model.fit(trainInputs, trainTargets)
            predictedTargets = model.predict(testInputs)
            predictedTargets=oneHotEncoding(predictedTargets)
            testTargets=oneHotEncoding(testTargets)
            testTargets=testTargets'

        #Train decision tree model
        elseif modelType == :decision_tree
            @assert haskey(modelHyperparameters, "max_depth") && haskey(modelHyperparameters, "criterion") && haskey(modelHyperparameters, "splitter")
            model = DecisionTreeClassifier(max_depth=modelHyperparameters["max_depth"],criterion=modelHyperparameters["criterion"],splitter=modelHyperparameters["splitter"])
            model.fit(trainInputs, trainTargets)
            predictedTargets = model.predict(testInputs)
        
        #Train knn model
        elseif modelType == :knn
            @assert haskey(modelHyperparameters, "n_neighbors") && haskey(modelHyperparameters, "weights")
            model = KNeighborsClassifier(n_neighbors=modelHyperparameters["n_neighbors"],weights=modelHyperparameters["weights"])
            model.fit(trainInputs, trainTargets)
            predictedTargets = model.predict(testInputs)
            
        end

        #Check that the number of predicted targets is equal to the number of test targets
        @assert length(predictedTargets)==length(testTargets)
        println("Type target",typeof(predictedTargets))
        println("Size",size(predictedTargets))
        #Calculate the accuracy and confusion matrix
        accuracy1, error_rate, sensitivity, specificity, positive_predictive_value, negative_predictive_value, f_score, confusion_matrix=confusionMatrix(predictedTargets,testTargets',weighted=false)
        #Add the accuracy and confusion matrix to the lists
        push!(accuracy_list,accuracy1)
        push!(confusion_matrix_list,confusion_matrix)

    end
    #Return the accuracy and confusion matrix lists
    return accuracy_list, confusion_matrix_list
end


modelCrossValidation (generic function with 1 method)

In [15]:
# Load the dataset
dataset = readdlm("../iris/iris.data",',');
# Prepare the data
inputs = convert(Array{Float32,2}, dataset[:,1:4]);
targets=dataset[:,5];
# Normalize the inputs
inputs = normalizeMinMax!(inputs);

# Check that the inputs are normalized
@assert(all(minimum(inputs, dims=1) .== 0));
@assert(all(maximum(inputs, dims=1) .== 1));

# Call the crossvalidation function
vector=crossvalidation(targets,10)

println("Crossvalidation vector: ", vector);
println(targets);
println(typeof(targets));
println(size(targets))
# Define the hyperparameters
ANNhyperparameters=Dict("hidden_layer_sizes"=>(10,3),"activation"=>"relu","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>2000);
SVMhyperparameters=Dict("kernel"=>"rbf","degree"=>3,"gamma"=>2,"C"=>1);
DThyperparameters=Dict("max_depth"=>3,"criterion"=>"gini","splitter"=>"best");
KNNhyperparameters=Dict("n_neighbors"=>3,"weights"=>"uniform");


# Call the modelCrossValidation function
accuracyANN,confusionMatrixANN = modelCrossValidation(:ann, ANNhyperparameters, inputs, targets, vector);
accuracySVM,confusionMatrixSVM = modelCrossValidation(:svm, SVMhyperparameters, inputs, targets, vector);
accuracyDT,confusionMatrixDT = modelCrossValidation(:decision_tree, DThyperparameters, inputs, targets, vector);
accuracyKNN,confusionMatrixKNN = modelCrossValidation(:knn, KNNhyperparameters, inputs, targets, vector);


# Print the results
println("ANN accuracy: ", mean(accuracyANN)," ANN confusion matrix: ", mean(confusionMatrixANN));
println("SVM accuracy: ", mean(accuracySVM)," SVM confusion matrix: ", mean(confusionMatrixSVM));
println("Decision tree accuracy: ", mean(accuracyDT)," Decision tree confusion matrix: ", mean(confusionMatrixDT));
println("KNN accuracy: ", mean(accuracyKNN)," KNN confusion matrix: ", mean(confusionMatrixKNN));


encoded_targets = oneHotEncoding(targets)
println(encoded_targets)
println(size(encoded_targets))


Onehot
num_elements50
subsets1[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
subsets2[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
subsets3[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
class_indices[6, 3, 4, 4, 1, 5, 2, 3, 2, 5, 10, 10, 10, 3, 3, 1, 6, 5, 8, 6, 9, 10, 2, 9, 8, 7, 5, 4, 9, 1, 9, 4, 3, 8, 6, 10, 9, 1, 2, 7, 8, 1, 7, 2, 4, 7, 8, 7, 6, 5]
(50,)
indices[6, 3, 4, 4, 1, 5, 2, 3, 2, 5, 10, 10, 10, 3, 3, 1, 6, 5, 8, 6, 9, 10, 2, 9, 8, 7, 5, 4, 9, 1, 9, 4, 3, 8, 6, 10, 9, 1, 2, 7, 8, 1, 7, 2, 4, 7, 8, 7, 6, 5, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 1

------------------------------------------------------------------------------------------------------------------------------------------

## Experimentation

In [6]:
###################
##ANN experiments##
###################

# Define the hyperparameters
ANNhyperparameters1=Dict("hidden_layer_sizes"=>(10,3),"activation"=>"relu","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters2=Dict("hidden_layer_sizes"=>(4,3),"activation"=>"relu","learning_rate_init"=>0.001,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters3=Dict("hidden_layer_sizes"=>(20,3),"activation"=>"relu","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters4=Dict("hidden_layer_sizes"=>(6,10,3),"activation"=>"relu","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters5=Dict("hidden_layer_sizes"=>(20,10,3),"activation"=>"logistic","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters6=Dict("hidden_layer_sizes"=>(20,10,3),"activation"=>"tanh","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters7=Dict("hidden_layer_sizes"=>(10,3),"activation"=>"tanh","learning_rate_init"=>0.01,"validation_fraction"=>0.1,"max_iter"=>5000);
ANNhyperparameters8=Dict("hidden_layer_sizes"=>(10,3),"activation"=>"tanh","learning_rate_init"=>0.001,"validation_fraction"=>0.1,"max_iter"=>5000);


# Call the modelCrossValidation function
accuracyANN1,confusionMatrixANN1 = modelCrossValidation(:ann, ANNhyperparameters1, inputs, targets, vector);
accuracyANN2,confusionMatrixANN2 = modelCrossValidation(:ann, ANNhyperparameters2, inputs, targets, vector);
accuracyANN3,confusionMatrixANN3 = modelCrossValidation(:ann, ANNhyperparameters3, inputs, targets, vector);
accuracyANN4,confusionMatrixANN4 = modelCrossValidation(:ann, ANNhyperparameters4, inputs, targets, vector);
accuracyANN5,confusionMatrixANN5 = modelCrossValidation(:ann, ANNhyperparameters5, inputs, targets, vector);
accuracyANN6,confusionMatrixANN6 = modelCrossValidation(:ann, ANNhyperparameters6, inputs, targets, vector);
accuracyANN7,confusionMatrixANN7 = modelCrossValidation(:ann, ANNhyperparameters7, inputs, targets, vector);
accuracyANN8,confusionMatrixANN8 = modelCrossValidation(:ann, ANNhyperparameters8, inputs, targets, vector);


# Print the results
println("ANN accuracy: ", mean(accuracyANN1)," ANN confusion matrix: ", mean(confusionMatrixANN1));
println("ANN accuracy: ", mean(accuracyANN2)," ANN confusion matrix: ", mean(confusionMatrixANN2));
println("ANN accuracy: ", mean(accuracyANN3)," ANN confusion matrix: ", mean(confusionMatrixANN3));
println("ANN accuracy: ", mean(accuracyANN4)," ANN confusion matrix: ", mean(confusionMatrixANN4));
println("ANN accuracy: ", mean(accuracyANN5)," ANN confusion matrix: ", mean(confusionMatrixANN5));
println("ANN accuracy: ", mean(accuracyANN6)," ANN confusion matrix: ", mean(confusionMatrixANN6));
println("ANN accuracy: ", mean(accuracyANN7)," ANN confusion matrix: ", mean(confusionMatrixANN7));
println("ANN accuracy: ", mean(accuracyANN8)," ANN confusion matrix: ", mean(confusionMatrixANN8));


ANN accuracy: 0.6933333333333334 ANN confusion matrix: [5.0 3.4 1.1; 0.0 1.5 0.0; 0.0 0.1 3.9]
ANN accuracy: 0.6266666666666667 ANN confusion matrix: [4.3 2.4 1.1; 0.7 1.4 0.2; 0.0 1.2 3.7]
ANN accuracy: 0.8066666666666666 ANN confusion matrix: [5.0 0.8 1.1; 0.0 4.0 0.8; 0.0 0.2 3.1]
ANN accuracy: 0.7 ANN confusion matrix: [5.0 2.5 1.0; 0.0 2.3 0.8; 0.0 0.2 3.2]
ANN accuracy: 0.5266666666666666 ANN confusion matrix: [5.0 2.8 3.1; 0.0 1.1 0.1; 0.0 1.1 1.8]
ANN accuracy: 0.9533333333333334 ANN confusion matrix: [5.0 0.0 0.0; 0.0 4.6 0.3; 0.0 0.4 4.7]
ANN accuracy: 0.9733333333333334 ANN confusion matrix: [5.0 0.0 0.0; 0.0 4.7 0.1; 0.0 0.3 4.9]
ANN accuracy: 0.9133333333333333 ANN confusion matrix: [5.0 0.1 0.0; 0.0 4.7 1.0; 0.0 0.2 4.0]


After the result, it seems that the best option is the last model, where the activation was changed from "relu" to "tanh". We also viewed that if the number of layers and neurons is very high, the results are not good. In our experiments, the best results is the topology of (10,3). Other changes made is to modify the value of the learning rate, but the change we made was small and has hardly affected the final result.

In [7]:
###################
##SVM experiments##
###################

# Define the hyperparameters
SVMhyperparameters1=Dict("kernel"=>"rbf","degree"=>3,"gamma"=>2,"C"=>1);
SVMhyperparameters2=Dict("kernel"=>"poly","degree"=>3,"gamma"=>2,"C"=>1);
SVMhyperparameters3=Dict("kernel"=>"sigmoid","degree"=>3,"gamma"=>2,"C"=>1);
SVMhyperparameters4=Dict("kernel"=>"linear","degree"=>3,"gamma"=>2,"C"=>1);
SVMhyperparameters5=Dict("kernel"=>"rbf","degree"=>5,"gamma"=>2,"C"=>1);
SVMhyperparameters6=Dict("kernel"=>"poly","degree"=>5,"gamma"=>2,"C"=>1);
SVMhyperparameters7=Dict("kernel"=>"linear","degree"=>5,"gamma"=>2,"C"=>1);
SVMhyperparameters8=Dict("kernel"=>"rbf","degree"=>5,"gamma"=>"scale","C"=>0.1);
SVMhyperparameters9=Dict("kernel"=>"poly","degree"=>5,"gamma"=>"scale","C"=>0.1);
SVMhyperparameters10=Dict("kernel"=>"linear","degree"=>5,"gamma"=>2,"C"=>0.1);


# Call the modelCrossValidation function
accuracySVM1,confusionMatrixSVM1 = modelCrossValidation(:svm, SVMhyperparameters1, inputs, targets, vector);
accuracySVM2,confusionMatrixSVM2 = modelCrossValidation(:svm, SVMhyperparameters2, inputs, targets, vector);
accuracySVM3,confusionMatrixSVM3 = modelCrossValidation(:svm, SVMhyperparameters3, inputs, targets, vector);
accuracySVM4,confusionMatrixSVM4 = modelCrossValidation(:svm, SVMhyperparameters4, inputs, targets, vector);
accuracySVM5,confusionMatrixSVM5 = modelCrossValidation(:svm, SVMhyperparameters5, inputs, targets, vector);
accuracySVM6,confusionMatrixSVM6 = modelCrossValidation(:svm, SVMhyperparameters6, inputs, targets, vector);
accuracySVM7,confusionMatrixSVM7 = modelCrossValidation(:svm, SVMhyperparameters7, inputs, targets, vector);
accuracySVM8,confusionMatrixSVM8 = modelCrossValidation(:svm, SVMhyperparameters8, inputs, targets, vector);
accuracySVM9,confusionMatrixSVM9 = modelCrossValidation(:svm, SVMhyperparameters9, inputs, targets, vector);
accuracySVM10,confusionMatrixSVM10 = modelCrossValidation(:svm, SVMhyperparameters10, inputs, targets, vector);


# Print the results
println("SVM accuracy: ", mean(accuracySVM1)," SVM confusion matrix: ", mean(confusionMatrixSVM1));
println("SVM accuracy: ", mean(accuracySVM2)," SVM confusion matrix: ", mean(confusionMatrixSVM2));
println("SVM accuracy: ", mean(accuracySVM3)," SVM confusion matrix: ", mean(confusionMatrixSVM3));
println("SVM accuracy: ", mean(accuracySVM4)," SVM confusion matrix: ", mean(confusionMatrixSVM4));
println("SVM accuracy: ", mean(accuracySVM5)," SVM confusion matrix: ", mean(confusionMatrixSVM5));
println("SVM accuracy: ", mean(accuracySVM6)," SVM confusion matrix: ", mean(confusionMatrixSVM6));
println("SVM accuracy: ", mean(accuracySVM7)," SVM confusion matrix: ", mean(confusionMatrixSVM7));
println("SVM accuracy: ", mean(accuracySVM8)," SVM confusion matrix: ", mean(confusionMatrixSVM8));
println("SVM accuracy: ", mean(accuracySVM9)," SVM confusion matrix: ", mean(confusionMatrixSVM9));
println("SVM accuracy: ", mean(accuracySVM10)," SVM confusion matrix: ", mean(confusionMatrixSVM10));


LoadError: AssertionError: length(predictedTargets) == length(testTargets)

After the results, the best options for the kernel are "rbf" and "poly", we also made changes in the values of gamma, C and degree, but the final result was very similar. In this case the best model is the first model.

In [None]:
##################
##DT experiments##
##################


# Define the hyperparameters
DThyperparameters1=Dict("max_depth"=>3,"criterion"=>"gini","splitter"=>"best");
DThyperparameters2=Dict("max_depth"=>5,"criterion"=>"entropy","splitter"=>"best");
DThyperparameters3=Dict("max_depth"=>7,"criterion"=>"log_loss","splitter"=>"best");
DThyperparameters4=Dict("max_depth"=>3,"criterion"=>"gini","splitter"=>"random");
DThyperparameters5=Dict("max_depth"=>5,"criterion"=>"entropy","splitter"=>"random");
DThyperparameters6=Dict("max_depth"=>7,"criterion"=>"log_loss","splitter"=>"random");


# Call the modelCrossValidation function
accuracyDT1,confusionMatrixDT1 = modelCrossValidation(:decision_tree, DThyperparameters1, inputs, targets, vector);
accuracyDT2,confusionMatrixDT2 = modelCrossValidation(:decision_tree, DThyperparameters2, inputs, targets, vector);
accuracyDT3,confusionMatrixDT3 = modelCrossValidation(:decision_tree, DThyperparameters3, inputs, targets, vector);
accuracyDT4,confusionMatrixDT4 = modelCrossValidation(:decision_tree, DThyperparameters4, inputs, targets, vector);
accuracyDT5,confusionMatrixDT5 = modelCrossValidation(:decision_tree, DThyperparameters5, inputs, targets, vector);
accuracyDT6,confusionMatrixDT6 = modelCrossValidation(:decision_tree, DThyperparameters6, inputs, targets, vector);


# Print the results
println("DT accuracy: ", mean(accuracyDT1)," DT confusion matrix: ", mean(confusionMatrixDT1));
println("DT accuracy: ", mean(accuracyDT2)," DT confusion matrix: ", mean(confusionMatrixDT2));
println("DT accuracy: ", mean(accuracyDT3)," DT confusion matrix: ", mean(confusionMatrixDT3));
println("DT accuracy: ", mean(accuracyDT4)," DT confusion matrix: ", mean(confusionMatrixDT4));
println("DT accuracy: ", mean(accuracyDT5)," DT confusion matrix: ", mean(confusionMatrixDT5));
println("DT accuracy: ", mean(accuracyDT6)," DT confusion matrix: ", mean(confusionMatrixDT6));

DT accuracy: 0.9533333333333334 DT confusion matrix: [5.0 0.0 0.0; 0.0 4.6 0.2; 0.0 0.3 4.7]
DT accuracy: 0.9466666666666667 DT confusion matrix: [5.0 0.0 0.0; 0.0 4.5 0.3; 0.0 0.5 4.7]
DT accuracy: 0.9333333333333333 DT confusion matrix: [5.0 0.0 0.0; 0.0 4.5 0.5; 0.0 0.5 4.5]
DT accuracy: 0.9400000000000001 DT confusion matrix: [5.0 0.0 0.0; 0.0 4.8 0.7; 0.0 0.2 4.3]
DT accuracy: 0.9266666666666667 DT confusion matrix: [5.0 0.0 0.0; 0.0 4.3 0.4; 0.0 0.6 4.6]
DT accuracy: 0.9266666666666667 DT confusion matrix: [5.0 0.0 0.0; 0.0 4.5 0.5; 0.0 0.4 4.4]


If we compare the accuracies obtaniend in these experiment, we can find that DT models seems to have a nice performance for the given problem. In this manner, we find that the 3 criteria seems to have similar accuracy values, but the one who performs better is "gini", with a small differece. Analyzing the splitter, at the begining the model seems to work better using weights="uniform"; but there are times when weights="random" have a better performance. 

With all of this, we have seen that there are no so much different in the accuracy of the model between the parameters used in this experimentation in the given problem, but the one to seems to perform better is the first one with "max_depth=3","criterion=gini" and "splitter=best".


In [None]:
###################
##KNN experiments##
###################

# Define the hyperparameters
KNNhyperparameters1=Dict("n_neighbors"=>3,"weights"=>"uniform");
KNNhyperparameters2=Dict("n_neighbors"=>5,"weights"=>"uniform");
KNNhyperparameters3=Dict("n_neighbors"=>7,"weights"=>"uniform");
KNNhyperparameters4=Dict("n_neighbors"=>3,"weights"=>"distance");
KNNhyperparameters5=Dict("n_neighbors"=>5,"weights"=>"distance");
KNNhyperparameters6=Dict("n_neighbors"=>7,"weights"=>"distance");


# Call the modelCrossValidation function
accuracyKNN1,confusionMatrixKNN1 = modelCrossValidation(:knn, KNNhyperparameters1, inputs, targets, vector);
accuracyKNN2,confusionMatrixKNN2 = modelCrossValidation(:knn, KNNhyperparameters2, inputs, targets, vector);
accuracyKNN3,confusionMatrixKNN3 = modelCrossValidation(:knn, KNNhyperparameters3, inputs, targets, vector);
accuracyKNN4,confusionMatrixKNN4 = modelCrossValidation(:knn, KNNhyperparameters4, inputs, targets, vector);
accuracyKNN5,confusionMatrixKNN5 = modelCrossValidation(:knn, KNNhyperparameters5, inputs, targets, vector);
accuracyKNN6,confusionMatrixKNN6 = modelCrossValidation(:knn, KNNhyperparameters6, inputs, targets, vector);


# Print the results
println("KNN accuracy: ", mean(accuracyKNN1)," KNN confusion matrix: ", mean(confusionMatrixKNN1));
println("KNN accuracy: ", mean(accuracyKNN2)," KNN confusion matrix: ", mean(confusionMatrixKNN2));
println("KNN accuracy: ", mean(accuracyKNN3)," KNN confusion matrix: ", mean(confusionMatrixKNN3));
println("KNN accuracy: ", mean(accuracyKNN4)," KNN confusion matrix: ", mean(confusionMatrixKNN4));
println("KNN accuracy: ", mean(accuracyKNN5)," KNN confusion matrix: ", mean(confusionMatrixKNN5));
println("KNN accuracy: ", mean(accuracyKNN6)," KNN confusion matrix: ", mean(confusionMatrixKNN6));


KNN accuracy: 0.9533333333333334 KNN confusion matrix: [5.0 0.0 0.0; 0.0 4.7 0.4; 0.0 0.3 4.6]
KNN accuracy: 0.9533333333333334 KNN confusion matrix: [5.0 0.0 0.0; 0.0 4.7 0.4; 0.0 0.3 4.6]
KNN accuracy: 0.9733333333333334 KNN confusion matrix: [5.0 0.0 0.0; 0.0 4.9 0.3; 0.0 0.1 4.7]
KNN accuracy: 0.9533333333333334 KNN confusion matrix: [5.0 0.0 0.0; 0.0 4.7 0.4; 0.0 0.3 4.6]
KNN accuracy: 0.9533333333333334 KNN confusion matrix: [5.0 0.0 0.0; 0.0 4.7 0.4; 0.0 0.3 4.6]
KNN accuracy: 0.9666666666666668 KNN confusion matrix: [5.0 0.0 0.0; 0.0 4.8 0.3; 0.0 0.2 4.7]


Comparing the results obtained in the experimentation, we found that kNN has proved to be one of the best fitting model for the given problem. It can be seen in the accuracies, which are above 0.95. With this model we really do not have so much parameters to adapt to the problem, so in all the experiment we get similar results. At first sight, the model works better when we use weights="uniform", but the difference is minimal. The model also seems to have better results when we increase "n_neighbors".  

### Final Conclusion
Analyzing all the experiment we have performed, we can conclude that the best performing models for the given problem are SVMs and kNN, which both have accuracies between 0.95 - 0.98.


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Learn Julia

In this assignment, it is necessary to pass parameters which are dependent on the model. To do this, the simplest way is to create a variable of type Dictionary (actually the type is `Dict`) which works in a similar way to Python. For example, to specify the parameters of an SVM, you could create a variable as follows:

```Julia
parameters = Dict("kernel" => "rbf", "degree" => 3, "gamma" => 2, "C" => 1);
```

Another way of defining such a variable could be the following:

```Julia
parameters = Dict();

parameters["kernel"] = "rbf";
parameters["kernelDegree"] = 3;
parameters["kernelGamma"] = 2;
parameters["C"] = 1;
```

Once inside the function to be developed, the model parameters can be used to create the model objet as follows:

```Julia
model = SVC(kernel=parameters["kernel"], 
    degree=parameters["kernelDegree"], 
    gamma=parameters["kernelGamma"], 
    C=parameters["C"]);
```

In the same way, something similar could be done for decision trees and kNN.

Another type of Julia that may be interesting for this assignment is the `Symbol` type. An object of this type can be any symbol you want, simply by typing its name after a colon (":"). In this practice, you can use it to indicate which model you want to train, for example `:ANN`, `:SVM`, `:DecisionTree` or `:kNN`.