# ${\large \textbf{Homework two: Rock or Mine}}$.

In this notebook we'll work with the dataset within the file sonar.mines which can be found at Connectionist Bench (Sonar, Mines vs. Rocks) Data Se. It contains $111$ patterns obtained by bouncing sonar signals off a metal cylinder at various angles spanning $90$ degrees, and $97$ patterns obtained by bouncing sonar signals off from cylindrically shaped rocks spanning $180$ degrees. The dataset in short containts $206$ samples having as feature vectors a representation of the energy within a particular frequency band obtained through the sonar returns from $60$ spectral angle normalized samples, thus taking on values between $0.0$ and $1.0$.

The label associated with each record is "R" if the object is a rock and "M" if it is a mine (metal cylinder). We'll perform an analysis of the data to obtain most representative examples by removing outliars, then split the data and perform 5-cross validatition to get a better settlement for hyperparameters through f1-scorersto finally run some algorithms of the most usual machine learning methods for binary classification: SVM, decision trees, K-nearest neighbors, logistic regression and also linear regression.


# ${\normalsize{Implementation}}$.

Read sonar's data.

In [136]:
import Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("StatsBase")
Pkg.add("Random")
Pkg.add("Plots")
Pkg.add("Distances")
Pkg.add("Distributions")
Pkg.add("MultivariateStats")
Pkg.add("Metrics")

In [137]:
using CSV
using DataFrames
using Random
using Plots

In [298]:
Path = "C:\\dataSonar.csv";
dataSonar = CSV.read(Path,DataFrame, header=0);

In order to run some methods and analysis for the dataset we'll first reaplace the "M" label with the numeric label 1 and the "R" label with numeric label 0.

In [299]:
# Converting char labels to numeric labels
using DataFrames
n = nrow(dataSonar)

dataSonar."Column61".=replace.(dataSonar."Column61", "M" => 1)
dataSonar."Column61".=replace.(dataSonar."Column61", "R" => 0)
dataSonar."Column61"=parse.(Float64, dataSonar."Column61");

DSMat=Matrix(dataSonar);


# Data descriptive analysis

First we use Mahalanobis measure to remove outliers which we'll set as the datapoints that are further from the mean than $99\%$ of datapoints. In order to do this we assume the squared Mahalanobis distance distribution for this instance as being Chi squared with 61 degrees of freedom. 

In [228]:
using Distributions

quantile.(Chisq(61), 0.99)

89.59134449068706

In [229]:
using Distances

meanDS=mean(DSMat, dims=1);
meanDS=vec(meanDS);

#From the chi-squared distribution of Mahalanobis distance from mean
Q=inv(cov(DSMat,corrected=true, dims=1));
DSout=dataSonar[:,:];

for i in 1:208;
    rowDS=DSMat[i,:];
    m=mahalanobis(rowDS, meanDS, Q)^2;
   if m>=89.59;
    delete!(DSout, i);
    end;
end;

MDSo=Matrix(DSout); #matrix of sonar's data without outliers
MDSoX=MDSo[:,1:60];

└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ Distances C:\Users\Ramos\.julia\packages\Distances\6E33b\src\mahalanobis.jl:26
└ @ 

# Partitioning the data and 5-fold Cross Validation.

We'll split our data set taking $80\%$ for training our model and $20\%$ for testing our algorithms. Since we are not dealing with a lot of data (166 examples), and the amount of input features is 60, taking less percentage for the training set wouldn't be very convenient or informative for the learning algorithms. Furthermore, we'll perform 5-fold cross validation on our training set for the selection of hyperparameters, in order to increase the reliability of the testing f hyperparameters carried out in the validatition sets.

This means that our training set will be partitioned into 5 pairwise disjoint subsets, and for each subset we'll train our models with an array of hyperparameters with its complement and then use the subset as a validatition set, to evaluate performance of the available hyperparameters and choose the best based on the F1-score criterion.

In [258]:
#Partitioning of data
using StatsBase

n = length(MDSo[:,1]);

ptrain = 0.8;  # % training data
ntrain = convert(Integer, floor(ptrain*n));   # No. of examples in training data
trainps = StatsBase.sample(1:n, ntrain, replace = false);
Mtrain = MDSo[trainps,:]; #training data


notrain=setdiff(1:n, trainps);
ntest=n-length(trainps)
testps= StatsBase.sample(notrain, ntest, replace = false);
Mtest = MDSo[testps, :]; #testing data

#5-Fold Cross validatition partitioning of training data

val1ps=StatsBase.sample(trainps, 26, replace = false); 
avail=setdiff(trainps, val1ps)
val2ps=StatsBase.sample(avail, 26, replace = false);
CVhsets=[MDSo[val1ps,:],MDSo[val2ps,:]];
CVhind=[val1ps, val2ps];
avail=setdiff(avail, val2ps);


for i in 3:5
    valips=StatsBase.sample(avail, 27, replace = false);
    push!(CVhsets, MDSo[valips,:]);
    push!(CVhind, valips);
    avail=setdiff(avail,valips);
end

In [259]:
nlabeltrain = countmap(Mtrain[:,61])
println("Mines in training set: ", nlabeltrain[1])
println("Rocks in training set: ", nlabeltrain[0])

Mines in training set: 69
Rocks in training set: 64


In [260]:
nlabeltest = countmap(Mtest[:,61])
println("Mines in test: ", nlabeltest[1])
println("Rocks in test: ", nlabeltest[0])

Mines in test: 17
Rocks in test: 17


Let's examine the imbalance of the data

In [261]:
mine=dataSonar[dataSonar[:,61].==1, :];
rock=dataSonar[dataSonar[:,61].==0, :];
println("Percentage of majorist class: ", round(max(nrow(mine),nrow(rock))/2.08, digits= 3))
println("Percentage of minorst class: ", round(min(nrow(mine),nrow(rock))/2.08, digits=3))

Percentage of majorist class: 53.365
Percentage of minorst class: 46.635


Since it's very slightly imbalanced, we won't implement any methods of resampling of the data apart from Croos Validation, such as oversampling or undersampling. 

Now I'll chose F1 as metric of performance of our algorithms, since there's no high class imbalance and we'd like to also penalise if we get too many false positives or too many false negatives even if accuracy is still high. Now that we have all of our $5$ partitions of validation and training sets, we can start choosing our hyperparameters.

# Choosing Hyperparameters

For each method, we'll obtain a model for each number in a set of hyperparameters, and then with the validation set we'll choose the best hyperparameter for each model.

Now, we'll first perform a PCA for the training sets before running each learning algorithm with the range of distinct hyperparameters, and then apply the transformation obtained with each PCA to the correspondent validation set before testing the models to choose the best hyperparameter.

In [264]:
using MultivariateStats

Cross_val_part=[]
PCAtrain=[];
PCAval=[];
for i in 1:5;
    #println(trainps, CVhind[i]);
    CVtraini=setdiff(trainps,CVhind[i]) #Cross Validation training indixes when taking the i-th subset;
    PCAm=MultivariateStats.fit(PCA, MDSo[CVtraini,1:60]', maxoutdim=8);;
    
    PCAt=MultivariateStats.predict(PCAm, MDSo[CVtraini,1:60]')';
    PCAv=MultivariateStats.predict(PCAm,MDSo[CVhind[i],1:60]')';
    push!(PCAtrain, PCAt);
    push!(PCAval, PCAv);
    PCAt=hcat(PCAt,MDSo[CVtraini,61])
    println()
    PCAv=hcat(PCAv, MDSo[CVhind[i], 61])    
    push!(Cross_val_part,[PCAt,PCAv])
    
end









Now we can start the selecting process of our hyperparameters, applying 5-fold Cross validation.

## Linear regression 

In [265]:
using ScikitLearn
@sk_import linear_model: LinearRegression;

linregm = LinearRegression() # Inicializamos el método
param=linregm.get_params() #observamos los hiperparámetros




Dict{Any, Any} with 5 entries:
  "copy_X"        => true
  "fit_intercept" => true
  "n_jobs"        => nothing
  "normalize"     => "deprecated"
  "positive"      => false

Since this method doesn't have usual hyperparameters to tune, we'll skip this step with this one.

## Logistic regression 

In [266]:
using ScikitLearn
@sk_import linear_model: LogisticRegression;

plg=LogisticRegression().get_params() #observamos los hiperparámetros



Dict{Any, Any} with 15 entries:
  "penalty"           => "l2"
  "fit_intercept"     => true
  "C"                 => 1.0
  "multi_class"       => "auto"
  "dual"              => false
  "n_jobs"            => nothing
  "max_iter"          => 100
  "random_state"      => nothing
  "l1_ratio"          => nothing
  "solver"            => "lbfgs"
  "class_weight"      => nothing
  "tol"               => 0.0001
  "verbose"           => 0
  "warm_start"        => false
  "intercept_scaling" => 1

We'll settle a grid for the parameters "C" and "penalty", and look for the ones for which the model gets a better F1- score, of course using the features provided by the PCA applied to the training sets.

In [267]:
using EvalMetrics

c_values = [100, 10, 1.0, 0.1, 0.01];
penal_v=["l2", "none"];

maxlr=[0];

for j in 1:length(penal_v)
   for i in 1:length(c_values);
        
        f1_avscore=0;
      
        for k in 1:5; 
         logregm=LogisticRegression(penalty=penal_v[j], C=c_values[i]);

         ScikitLearn.fit!(logregm,Cross_val_part[k][1][:,1:8],Cross_val_part[k][1][:,9]);
            
    
         pred=ScikitLearn.predict(logregm,(Cross_val_part[k][2])[:,1:8]);
         pred=round.(Int, pred);   
        
        real=round.(Int, Cross_val_part[k][2][:,9]);
        precision=EvalMetrics.precision(pred,real);
        recall=EvalMetrics.recall(pred,real);
        f1_score=2*precision*recall/(precision+recall)
        f1_avscore=f1_avscore+f1_score/length(PCAtrain)
        
        end
    
        if f1_avscore>maxlr[1];
                maxlr=[f1_avscore, c_values[i] , penal_v[j]];
        end
        
    end
end
println(maxlr)

println("Max average f1 score: $(maxlr[1]) \n Given by parameters C=$(maxlr[2]) and penalty=$(maxlr[3])")

Any[0.7027741935483871, 1.0, "l2"]
Max average f1 score: 0.7027741935483871 
 Given by parameters C=1.0 and penalty=l2




## Support Vector Machine:

In [268]:
using ScikitLearn;

@sk_import svm: SVC

psvm=SVC().get_params() #observe hyperparameters




Dict{Any, Any} with 15 entries:
  "C"                       => 1.0
  "shrinking"               => true
  "decision_function_shape" => "ovr"
  "max_iter"                => -1
  "random_state"            => nothing
  "class_weight"            => nothing
  "tol"                     => 0.001
  "verbose"                 => false
  "coef0"                   => 0.0
  "kernel"                  => "rbf"
  "probability"             => false
  "cache_size"              => 200
  "gamma"                   => "scale"
  "degree"                  => 3
  "break_ties"              => false

The hyperparameters we'll choose using our 5-cross validatition technique for support vector machine are "kernel", "C", and  "gamma". 

In [269]:

c_values=[1, 10, 100, 1000];
kern=["linear", "poly", "rbf"];
gamma=[0.001, 0.0001]
maxsvm=[0];


for i in 1:length(c_values);
   for j in 1:length(kern);        
        for t in 1:length(gamma);
        
        f1_avscore=0;
      
        for k in 1:5; 
            

         svmm=SVC(C=c_values[i], kernel=kern[j], gamma=gamma[t]);

         ScikitLearn.fit!(svmm,Cross_val_part[k][1][:,1:8],Cross_val_part[k][1][:,9]);
            
    
         pred=ScikitLearn.predict(svmm,(Cross_val_part[k][2])[:,1:8]);
         pred=round.(Int, pred);   
        
        real=round.(Int, Cross_val_part[k][2][:,9]);
        precision=EvalMetrics.precision(pred,real);
        recall=EvalMetrics.recall(pred,real);
        f1_score=2*precision*recall/(precision+recall)
        f1_avscore=f1_avscore+f1_score/length(PCAtrain)
        
        end
    
        if f1_avscore>maxsvm[1];
                maxsvm=[f1_avscore, c_values[i] , kern[j], gamma[t]];
        end
        end
    end
end
println(maxsvm)

println("Max average f1 score: $(maxsvm[1]) \n Given by parameters C=$(maxsvm[2]), kernel=$(maxsvm[3]), gamma=$(maxsvm[4])")

Any[0.7200473801560758, 10, "linear", 0.001]
Max average f1 score: 0.7200473801560758 
 Given by parameters C=10, kernel=linear, gamma=0.001


From which we choose the parameters C=1, kernel="linear", and gamma= 0.01.

## K-nearest neighbors

In [270]:
@sk_import neighbors: KNeighborsClassifier

pknn=KNeighborsClassifier().get_params() #observe hyperparameters



Dict{Any, Any} with 8 entries:
  "leaf_size"     => 30
  "n_jobs"        => nothing
  "n_neighbors"   => 5
  "metric"        => "minkowski"
  "weights"       => "uniform"
  "algorithm"     => "auto"
  "metric_params" => nothing
  "p"             => 2

For this method we'll select from a range of values for hyperparameters "n-neighbors", "weights", and "metric".

In [271]:
n_neig=[3,5,7,9,11,13,15];
weigh= ["uniform","distance"];
metrics=["minkowski","euclidean","manhattan"]
maxkn=[0];


for i in 1:length(n_neig);
   for j in 1:length(weigh);        
        for t in 1:length(metrics);
       
        f1_avscore=0;
        
        for k in 1:5; #Cross Validation
                    
         knnm=KNeighborsClassifier(n_neighbors=n_neig[i], weights=weigh[j], metric=metrics[t]);
         ScikitLearn.fit!(knnm,Cross_val_part[k][1][:,1:8],Cross_val_part[k][1][:,9]);
            
        
         pred=ScikitLearn.predict(knnm,(Cross_val_part[k][2])[:,1:8]);
         pred=round.(Int, pred);   
        
        real=round.(Int, Cross_val_part[k][2][:,9]);
        
        precision=EvalMetrics.precision(pred,real);
        recall=EvalMetrics.recall(pred,real);
        
        f1_score=2*precision*recall/(precision+recall)
    
        f1_avscore=f1_avscore+f1_score/5        
        end
    
        if f1_avscore>maxkn[1];
                maxkn=[f1_avscore, n_neig[i] , weigh[j], metrics[t]];

        end
            
        end
    end
end
println(maxi)

println("Max average f1 score: $(maxkn[1]) \nGiven by parameters n-neighbors=$(maxkn[2]), weights=$(maxkn[3]), metrics=$(maxkn[4])")


Any[0.6704725175006482, "gini"]
Max average f1 score: 0.774840549649987 
Given by parameters n-neighbors=15, weights=distance, metrics=minkowski


## Decision trees:

In [272]:
@sk_import tree: DecisionTreeClassifier

pdct=DecisionTreeClassifier().get_params() #observe hyperparameters



Dict{Any, Any} with 12 entries:
  "criterion"                => "gini"
  "min_weight_fraction_leaf" => 0.0
  "ccp_alpha"                => 0.0
  "min_impurity_decrease"    => 0.0
  "max_depth"                => nothing
  "random_state"             => nothing
  "splitter"                 => "best"
  "class_weight"             => nothing
  "min_samples_split"        => 2
  "max_leaf_nodes"           => nothing
  "min_samples_leaf"         => 1
  "max_features"             => nothing

In [278]:
Criter=["gini", "entropy"]     # Criterion    
maxdt=[0]

for i in 1:length(Criter);
        f1_avscore=0;
        
        for k in 1:5; #Cross Validation
                    
         dtm=DecisionTreeClassifier(criterion=Criter[i]);
         ScikitLearn.fit!(dtm,Cross_val_part[k][1][:,1:8],Cross_val_part[k][1][:,9]);
            
        
         pred=ScikitLearn.predict(dtm,(Cross_val_part[k][2])[:,1:8]);
         pred=round.(Int, pred);   
        
        real=round.(Int, Cross_val_part[k][2][:,9]);
        
        precision=EvalMetrics.precision(pred,real);
        recall=EvalMetrics.recall(pred,real);
        
        f1_score=2*precision*recall/(precision+recall);    
        f1_avscore=f1_avscore+f1_score/5;
        
        end
    
        if f1_avscore>maxdt[1];
                maxdt=[f1_avscore, Criter[i]];

        end
end

println(maxdt)

println("Max average f1 score: $(maxdt[1]) \nGiven by parameters Criterion=$(maxdt[2])")

Any[0.7124861090378332, "gini"]
Max average f1 score: 0.7124861090378332 
Given by parameters Criterion=gini


# Training and testing of models with chosen hyperparameters.

Now that we chose out hyperparameters, we'll perform a PCA on our training set, then train the different models with the chosen hyperparameters, apply the tranformation given by the previous PCA to our testing set, and finally applying the model to our training set for testing, and choosing the best one using f1-score criterion.

In [279]:
using MultivariateStats


PCAmt=MultivariateStats.fit(PCA, Mtrain[:,1:60]', maxoutdim=8);;
PCAtr=MultivariateStats.predict(PCAmt, Mtrain[:,1:60]')';
PCAte=MultivariateStats.predict(PCAmt,Mtest[:,1:60]')';
PCAtr=hcat(PCAtr,Mtrain[:,61]);
testval=  round.(Int, Mtest[:,61]); 


## Linear regression 

In [290]:
lin_reg_m = LinearRegression() #model
ScikitLearn.fit!(lin_reg_m, PCAtr[:,1:8], PCAtr[:,9]) #training model
linreg_pred = ScikitLearn.predict(lin_reg_m, PCAte[:,1:8]) #prediction
lnrpred=[]

for i in 1:length(linreg_pred)
    if linreg_pred[i]>= 0.5
        push!(lnrpred, 1)
    else
        push!(lnrpred, 0)
    end
end

lnrpred=round.(Int, lnrpred);   
        
        
precisionlr=EvalMetrics.precision(lnrpred,testval);
recalllr=EvalMetrics.recall(lnrpred,testval);
f1_score_lr=2*precisionlr*recalllr/(precisionlr+recalllr)
f1_score_lr=round(f1_score_lr,digits=3)


println("F1-score for linear regression method is: ", f1_score_lr)

F1-score for linear regression method is: 0.812


## Logistic regression

In [291]:
logreg_m = LogisticRegression(C= maxlr[2], penalty= maxlr[3]) # model
ScikitLearn.fit!(logreg_m, PCAtr[:,1:8], PCAtr[:,9]) # training of the model
logreg_pred = ScikitLearn.predict(logreg_m, PCAte) # Prediction

logreg_pred=round.(Int, logreg_pred);   
        
        
precisionlg=EvalMetrics.precision(logreg_pred,testval);
recalllg=EvalMetrics.recall(logreg_pred,testval);
f1_score_lg=2*precisionlg*recalllg/(precisionlg+recalllg)
f1_score_lg=round(f1_score_lg,digits=3)



println("F1-score for logistic regression method is: ", f1_score_lg)

F1-score for logistic regression method is: 0.824


## SVM

In [292]:
svm_m = SVC(C= maxsvm[2], kernel= maxsvm[3], gamma=maxsvm[4]) #model
ScikitLearn.fit!(svm_m, PCAtr[:,1:8], PCAtr[:,9]) # training of the model
svm_pred = ScikitLearn.predict(svm_m, PCAte) # Prediction

svm_pred=round.(Int, svm_pred);   
        
        
precisionsvm=EvalMetrics.precision(svm_pred,testval);
recallsvm=EvalMetrics.recall(svm_pred,testval);
f1_score_svm=2*precisionsvm*recallsvm/(precisionsvm+recallsvm)
f1_score_svm=round(f1_score_svm,digits=3)


println("F1-score for support vector machine method is: ", f1_score_svm)

F1-score for support vector machine method is: 0.914


## KNN

In [293]:
knn_m = KNeighborsClassifier(n_neighbors=maxkn[2], weights=maxkn[3], metric=maxkn[4])
ScikitLearn.fit!(knn_m, PCAtr[:,1:8], PCAtr[:,9]) # training of the model
kn_pred = ScikitLearn.predict(knn_m, PCAte) #  Prediction

kn_pred=round.(Int, kn_pred);   
        
        
precisionkn=EvalMetrics.precision(kn_pred,testval);
recallkn=EvalMetrics.recall(kn_pred,testval);
f1_score_kn=2*precisionkn*recallkn/(precisionkn+recallkn)
f1_score_kn=round(f1_score_kn,digits=3)



println("F1-score for K-nearest neighbors method is: ", f1_score_kn)

F1-score for K-nearest neighbors method is: 0.8


## Decision trees

In [294]:
dt_m = DecisionTreeClassifier(criterion=maxdt[2]);
ScikitLearn.fit!(dt_m, PCAtr[:,1:8], PCAtr[:,9]) # training of the model
dt_pred = ScikitLearn.predict(dt_m, PCAte) #  Prediction

dt_pred=round.(Int, dt_pred);   
        
        
precisiondt=EvalMetrics.precision(dt_pred,testval);
recalldt=EvalMetrics.recall(dt_pred,testval);
f1_score_dt=2*precisiondt*recalldt/(precisiondt+recalldt)
f1_score_dt=round(f1_score_dt,digits=3)



println("F1-score for K-nearest neighbors method is: ", f1_score_dt)

F1-score for K-nearest neighbors method is: 0.706


# Results and Conclusions:

We summarise the resulting performances of the classification models as follows:

In [295]:
Results = DataFrame(
    Models = ["Linear Regression", "Logistic Regression", "SVM", "KNN", "Decision tree"], 
    F1_score = [f1_score_lr, f1_score_lg, f1_score_svm, f1_score_kn, f1_score_dt],
    )

Unnamed: 0_level_0,Models,F1_score
Unnamed: 0_level_1,String,Float64
1,Linear Regression,0.812
2,Logistic Regression,0.824
3,SVM,0.914
4,KNN,0.8
5,Decision tree,0.706


Thus we can finally conclude that the best model according to this criteria is Support Vector Machine, set with the following hyperparameters.

In [297]:
println("C= ", maxsvm[2], "\nKernel= ",maxsvm[3], "\nGamma= ", maxsvm[4])

C= 10
Kernel= linear
Gamma= 0.001


From the work portrayed in this notebook I add some conclusions on the practices that allowed the well enough performance of our models, and some that weren't applied here but which could improve model performances and these performances' reliability.
First we can say tha an adequate pre-treatment of the data is necessary for obtaining better model performance. This includes handling outliars, applying reduction of dimensionality methods if necessary, among others. In this notebook we used Maharanobis distance for removal of outliers and applied PCA in order to obtain more relevant features, the former can be improved by perhaps replacing outliers with the mean or median determined by a deeper analysis of the data. Thus the use of the PCA helped our models perform better, and the way in which we implemented it (not applying it to all our data, but just to the training sets, and then applying the obtained transformation to our testing sets, for both the choosing of hyperparameters and the choosing of methods with fixed hyperparameters), allows the performance of our models not to be biased towards our testing data, since the PCA didn´t depend on any testing data.

There are other practices such as k-fold cross validation which can be carried out in order to obtain more realistic measurements for the performance of our models, and decreasing the chances of our models working well for the testing set only by the way we chose to partition our data. The latter technique can be done either to the training and validatition sets, to obtain betterer choices for hyperparameters, or to the training and testing sets, to obtain more trustable measurements of our final models. Since we did 5-fold cross validatition to the step where we chose our hyperparameters, we can say from a safer standpoint that our model's resulting performances can be more trustable.

Finally, there are other methods apart from k-fold cross validation for choosing the best model that can be done to improve our models and the trustability in their measure performances, such as boosting.
