# Metrics

We have used simple functions in the previous assignments, such as MSE in regression problems or accuracy in classification problems, to quantify the goodness-of-fit of the model. While in regression problems the functions are based on an error calculated in one way or another (mean error, mean square error, etc.), in classification problems other types of metrics can be derived depending on what the problem being solved is like. Many of these metrics, at least those that will be used in practice, are based on the calculation of the confusion matrix.

A confusion matrix is a square matrix, with as many rows and columns as classes, showing the distribution of patterns in classes, and the classification performed by the model. Usually the rows show how the model has performed the classification, and the columns show the actual classification values, although this may vary depending on the source consulted.

The simplest case corresponds to 2 classes, where one is considered "negative" and the other "positive". A two-class confusion matrix would be as follows:

This confusion matrix contains 4 values, which can be divided

- according to the output of the model: positive or negative.
- depending on whether the model is wrong or not: true or false. 

Thus, these 4 values are called true negatives (TN), false positives (FP), false negatives (FN) and true positives (TP). For example, false negatives would be the number of patterns that the system has classified as negative, and has been wrong because they were actually positive.

From this confusion matrix, different metrics can be calculated. Depending on the problem you are working on, it will be more interesting to follow one or the other. Some of the most most commonly used metrics are:

- **Accuracy**. Ratio of patterns in which the prediction is correct. Calculated as $$\frac{TN+TP}{TN+TP+FN+FP}$$
- **Error rate**. Ratio of patterns in which the prediction is wrong. Calculated as $$\frac{FP+FN}{TN+TP+FN+FP}$$
- **Sensitivity** or **recall**. Indicates the probability that a positive classification result is obtained for a positive case. It is calculated as $$\frac{TP}{FN+TP}$$
    - In a medical test, the test sensitivity represents the probability that a sick (positive) subject will have a positive test result.
- **Specificity**. Indicates the probability that a negative classification result is obatined for a negative case. It is calculated as $$\frac{TN}{FP+TN}$$
    - The specificity of a test represents the probability that a healthy (negative) subject will have a negative test result.
- **Precision** or **positive predictive value**. Ratio of positive patterns that have been correctly classified. Calculated as $$\frac{TP}{TP+FP}$$
- **Negative predictive value**. Ratio of positive patterns that have been correctly classified. Calculated as $$\frac{TN}{TN+FN}$$
- **F-score**, **F1-score** or **F-measurement**. It is defined as the harmonic mean of precision and recall.

It is worth clarifying that these metrics, as well as others seen in theory class (ROC curve, Kappa index) are used to assess already trained classifiers, not to perform the training process. To be trained, each model has its own function to quantify the error or goodness of fit, such as the cross-entropy function in the case of neural networks.

The accuracy is probably the most commonly used value, as it indicates the success rate of the classifier in a simple way. However, depending on the problem you are working with, it may not be the most appropriate metric. For example, in a mass population-based test for a disease where it is known that most people do not have the disease, a model that classifies everyone as negative (healthy) will have a very high accuracy, even though the model does not actually do anything.

For this reason, it is necessary to assess which metric or metrics are the most used for each kind of problem. In many problems where the different classes are of equal importance, the accuracy value may be enough. However, in other problems, it may be of more interest to evaluate the situations in which a positive response is or should be produced by the model, as it could indicate something critical, such as detecting a disease or raising some kind of alarm. For this reason, sensitivity and positive predictive value values are often taken into account in addition to accuracy. There is a more extensive discussion of this in the theory notes, but a possible informal guide might be the following:

- If you want to minimise the number of positives incorrectly classified as negative (e.g. maximise the number of correctly diagnosed sick subjects, or maximise the number of alarms given for risky situations), the indicated metric is sensitivity (recall).
- If one wishes to minimise the number of samples incorrectly classified as positives (false positives, e.g. sick subjects diagnosed as healthy, or situations where an alarm should not has been raised but it was), the indicated metric is the positive predictive value (precision).

Therefore, the most appropriate metric depends entirely on the specific problem, according to the relative importance of the classifier output and its behaviour. In this type of problem, the F-score is a metric that may be more useful than accuracy.

Another issue to be considered is the data imbalance. Accuracy is a metric that gives a "global" view, which can be misleading when the distribution in classes is unbalanced. In these cases, F-score is a better metric. Having unbalanced databases is very common, which provides an extra argument for using F-score rather than accuracy.

Finally, if you have more than two classes, it is possible to build a confusion matrix in a similar way by having one row and column per class. For example:


In these cases, it is no longer possible to speak of positive or negative patterns, since there are more than two classes, nor to take values for sensitivity or positive predictive value. However, this confusion matrix can offer very interesting information when it comes to understanding how the model works, assessing which are the classes between which the model finds it easiest and most difficult to separate.

### Question
If the pattern set has been divided into training and test subsets, which subset should be used to calculated the confusion matrix?

`We would use the test set`

In this assignment, you are asked to:

1. Develop a function called `confusionMatrix` which takes two vectors of equal length (the number of patterns), the first one containing the outputs obtained by a model (`outputs`) and the second with the desired outputs (`targets`), both of type `AbstractArray{Bool,1}`. This function should return:
    - Accuracy
    - Error rate
    - Sensitivity
    - Specificity
    - Positive predicitive value
    - Negative predicitve value
    - F-score
    - Confusion matrix, as an object of type `Array{Int64,2}` with two rows and two columns
    
    As this function is being fed with boolean-valued vectors, it will be applicable to problems with two classes (positive and negative cases).

    It is necessary to consider some particular situations when calculated the required classification metrics.
    
        - If every pattern is a true negative, neither the sensitivity nor the positive predictive value can be calculated. In this case the system works correctly, so these two metrics will be 1.
        - Similarly, neither the specificity nor the negative predictive value can be obtained and if every pattern is a true positive, so both metrics have to be manually set to 1.
        - If neither of these two cases has occurred and there is still any metric which cannot be calculated, it will take the value of 0. For example, if the sensitivity could not be calculated, it means that there was no pattern with a positive desired output.
        - It both sensitivity and positive predictive values are equal to 0, the value of F-score cannot be obtained, and thus it will be 0.
        
    Do not use loops inside the developed function.


In [1]:
using Statistics
using DelimitedFiles


function stats(outputs)
    minimum = mapslices(Statistics.minimum, outputs; dims=1)[1]
    maximum = mapslices(Statistics.maximum, outputs; dims=1)[1]
    mean = mapslices(Statistics.mean, outputs; dims=1)[1]
    std = mapslices(Statistics.std, outputs; dims=1)[1]
    return [minimum, maximum, mean, std]
end

function calculateMinMaxNormalizationParameters(dataset::AbstractArray{<:Real,2})
    # function that takes a real matrix (i.e. array of reals with dimension 2)
    # this matrix is the data-set to our problem, where each row is a sample and each column is an attribute
    # return a 2-tuple of matrixes where each row is the minimum and maximum respectivelly
    
    min_matrix = []
    max_matrix = []
    
    for column in eachcol(dataset)
        r = stats(column)
        if min_matrix == [] || max_matrix == []
            min_matrix = r[1]
            max_matrix = r[2]
        else
            min_matrix = vcat(min_matrix, r[1])
            max_matrix = hcat(max_matrix, r[2])
        end
    end
    return reshape(min_matrix, (4,1)), reshape(max_matrix, (4,1))
end

function normalizeMinMax( dataset::AbstractArray{<:Real,2})
    # x scaled = x - min(x) / max(x) - min(x)
    min, max = calculateMinMaxNormalizationParameters(dataset)
    out = zeros(size(dataset, 1), size(dataset, 2))
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmin, cmax = min[j], max[j]
            out[i,j] = dataset[i,j] - cmin / (cmax - cmin)
        end
    end
    
    return out
end
function normalizeZeroMean( dataset::AbstractArray{<:Real,2}) 
    mean, std = calculateZeroMeanNormalizationParameters(dataset)
    out = zeros(size(dataset, 1), size(dataset, 2))
    for i in axes(dataset, 1)
        for j in axes(dataset, 2)
            cmean, cstd = mean[j], std[j]
            out[i,j] = dataset[i,j] - cmean / cstd
        end
    end
    return out
end       

function classifyOutputs(outputs::AbstractArray{<:Real,2}; 
                        threshold::Real=0.5)
    if size(outputs, 2) == 1
        output = dataset .>= threshold
    else
        (_,indicesMaxEachInstance) = findmax(outputs, dims=2);
        bool_outputs = falses(size(outputs));
        bool_outputs[indicesMaxEachInstance] .= true
    end
    return bool_outputs
end

function accuracy(outputs::AbstractArray{Bool,2}, targets::AbstractArray{Bool,2}) 

    if (size(targets,2)==1)
        return accuracy(outputs[:,1], targets[:,1])
    else
        classComparison = targets .== outputs
        correctClassifications = all(classComparison, dims=2)
        return mean(correctClassifications)
    end
end

function accuracy(outputs::AbstractArray{<:Real,2}, targets::AbstractArray{Bool,2}, threshold::Real=0.5)
    if (size(targets,2)==1)
        return accuracy(outputs[:,1], targets[:,1])
    else
        classified_outputs=classifyOutputs(outputs)
        return accuracy(classified_outputs, targets)
    end
end


function encode_categories(targets)
    if (length(unique(targets)) > 2)
        cats = unique(targets) .== permutedims(targets)
        return cats'
    else
        cats = targets .== unique(targets)[1]
        return cats
    end
end

encode_categories (generic function with 1 method)

In [2]:
function confusionMatrix(outputs::AbstractArray{Bool,1}, targets::AbstractArray{Bool,1})
    
    tp = sum(outputs .& targets)      # select all true outputs that are true on target
    fp = sum(outputs .& .!targets)    # select all true outputs that are false on target
    tn = sum(.!outputs .& .!targets)  # select all false outputs that are false on target
    fn = sum(.!outputs .& targets)    # select all false outputs that are true on target
    
    conf_matrix = [tn fp; fn tp]
    
    accu= (tn+tp)/(tn+tp+fn+fp)
    erra= (fp+fn)/(tn+tp+fn+fp)
    reca= (tn==length(targets)) ? (tp/(fn+tp)) : 1
    spec= (tp==length(targets)) ? (tn/(fp+tn)) : 1
    prec= (tn==length(targets)) ? (tp/(tp+fp)) : 1
    npre= (tp==length(targets)) ? (tn/(tn+fn)) : 1
    
    f1 = (reca==prec==0) ? 2*(prec*reca/prec+reca) : 0
    
    return accu,erra,reca,spec,prec,npre,f1,conf_matrix
end
    
outputs = [true, false, false, false, false, true, false, false, false, false]
targets = [true, false, false, false, false, true, false, false, false, false]

confusionMatrix(outputs, targets) 

(1.0, 0.0, 1, 1, 1, 1, 0, [8 0; 0 2])

2. Many models (e.g. artificial neural networks) do not return a categorial output for each pattern, but the probability that it is "positive". For this reason, it is requested to develop a function with the same name as the previous one, whose first parameter is not a vector of boolean values but a vector of real values (of type `AbstractArray{<:Real}`). It also receives an optional third parameter with a threshold, with a default value, which is used to apply the previous function and return, therefore, the same values.

In [13]:
function confusionMatrix(outputs::AbstractArray{<:Real,1},targets::AbstractArray{Bool,1}; threshold::Real=0.5)
    outputs_boolean = outputs .> threshold
    return confusionMatrix(outputs_boolean, targets)
end

outputs = [0.6, 0.3, 0.3, 0.3, 0.3, 0.6, 0.3, 0.3, 0.3, 0.3]
targets = [true, false, false, false, false, true, false, false, false, false]

confusionMatrix(outputs, targets) 

(1.0, 0.0, 1, 1, 1, 1, 0, [8 0; 0 2])

3. Develop two functions with the same name, `printConfusionMatrix`, that receive the model outputs and the desired outputs, call the previous functions and display the results obtained, including the confusion matrix. One of these functions shall receive a vector of model classifications (`outputs`) of type `AbstractArray{Bool,1}`, while for the other one this parameter shall be a vector of real values (of type `AbstractArray{<:Real}`). These functions will make calls to the previous functions.

In [10]:
function printConfusionMatrix(outputs::AbstractArray{Bool,1},targets::AbstractArray{Bool,1})
    accu,erra,reca,spec,prec,npre,f1,cm = confusionMatrix(outputs, targets)
    println("Accuracy = ",accu)
    println("Error rate = ",erra)
    println("Recall = ",reca)
    println("Specificity = ",spec)
    println("Precision = ",prec)
    println("Negative Precision = ",npre)
    println("F-Score = ",f1)
    
    println("")
    
    println("            Prediction")
    println("      +-----------------+")
    println("      |      ",cm[1,1]," |      ",cm[1,2]," |")
    println("Real  +-----------------+")
    println("      |      ",cm[2,1]," |      ",cm[2,2]," |")
    println("      +-----------------+")
    

end

outputs = [true, false, false, false, false, true, false, false, false, false]
targets = [true, false, false, false, false, true, false, false, false, false]

printConfusionMatrix(outputs, targets) 

            Prediction
      +-----------------+
      |      8 |      0 |
Real  +-----------------+
      |      0 |      2 |
      +-----------------+

Accuracy = 1.0
Error rate = 0.0
Recall = 1
Specificity = 1
Precision = 1
Negative Precision = 1
F-Score = 0


In [12]:
function printConfusionMatrix(outputs::AbstractArray{<:Real,1},targets::AbstractArray{Bool,1}; threshold::Real=0.5)
    accu,erra,reca,spec,prec,npre,f1,cm = confusionMatrix(outputs, targets, threshold)
    
    println("Accuracy = ",accu)
    println("Error rate = ",erra)
    println("Recall = ",reca)
    println("Specificity = ",spec)
    println("Precision = ",prec)
    println("Negative Precision = ",npre)
    println("F-Score = ",f1)
    
    println("")
    
    println("            Prediction")
    println("      +-----------------+")
    println("      |      ",cm[1,1]," |      ",cm[1,2]," |")
    println("Real  +-----------------+")
    println("      |      ",cm[2,1]," |      ",cm[2,2]," |")
    println("      +-----------------+")

end

outputs = [0.6, 0.3, 0.3, 0.3, 0.3, 0.6, 0.3, 0.3, 0.3, 0.3]
targets = [true, false, false, false, false, true, false, false, false, false]

printConfusionMatrix(outputs, targets, threshold=0.5)

LoadError: MethodError: no method matching confusionMatrix(::Vector{Float64}, ::Vector{Bool}, ::Float64)
[0mClosest candidates are:
[0m  confusionMatrix(::AbstractVector{<:Real}, ::AbstractVector{Bool}; threshold) at In[3]:1
[0m  confusionMatrix([91m::AbstractVector{Bool}[39m, ::AbstractVector{Bool}) at In[2]:1