This notebook load the the Mean MEG Data and uses a Random Forest to classify between MCI and Control patients. It is a simple bechmark to measure the performance of other strategies. I have selected the random forest as the benchmarking model because it's simple to set up and doesn't have many hyperparameters to optimize (mtry being the most important one).

Also, it will be useful for getting comfortable with the MLR package's syntax, since I've since now always used caret.

In [1]:
library(tidyverse)
library(mlr)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.2.0
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Loading required package: ParamHelpers
“replacing previous import ‘BBmisc::isFALSE’ by ‘backports::isFALSE’ when loading ‘ParamHelpers’”

In [17]:
mean_data <- readRDS("/home/rstudio/data/mean.rds")
mean_data$class <- factor(mean_data$class)

In [18]:
mean_pca <- as.data.frame(predict(prcomp(select(mean_data, -id, -class), scale = TRUE)))

In [20]:
mean_pca$class <- mean_data$class

In [21]:
task = makeClassifTask(id = "rf_mean_meg_pca", data = mean_data, target = "class")

“Provided data is not a pure data.frame but from class tbl_df, hence it will be converted.”

Here, I create a randomForest learner and set the resampling strategy as a Leave-One-Out Crossvalidation. 

In [22]:
lrn = makeLearner("classif.randomForest")
rdesc = makeResampleDesc(method = "LOO")

Train the model

In [23]:
r = resample(learner = lrn, task = task, resampling = rdesc)

Resampling: LOO
Measures:             mmce      
[Resample] iter 1:    0.0000000 
[Resample] iter 2:    1.0000000 
[Resample] iter 3:    1.0000000 
[Resample] iter 4:    0.0000000 
[Resample] iter 5:    1.0000000 
[Resample] iter 6:    1.0000000 
[Resample] iter 7:    0.0000000 
[Resample] iter 8:    0.0000000 
[Resample] iter 9:    0.0000000 
[Resample] iter 10:   0.0000000 
[Resample] iter 11:   0.0000000 
[Resample] iter 12:   1.0000000 
[Resample] iter 13:   0.0000000 
[Resample] iter 14:   0.0000000 
[Resample] iter 15:   0.0000000 
[Resample] iter 16:   0.0000000 
[Resample] iter 17:   1.0000000 
[Resample] iter 18:   0.0000000 
[Resample] iter 19:   0.0000000 
[Resample] iter 20:   0.0000000 
[Resample] iter 21:   0.0000000 
[Resample] iter 22:   1.0000000 
[Resample] iter 23:   0.0000000 
[Resample] iter 24:   0.0000000 
[Resample] iter 25:   1.0000000 
[Resample] iter 26:   0.0000000 
[Resample] iter 27:   0.0000000 
[Resample] iter 28:   0.0000000 
[Resample] iter 29:   1.000

The mean MMCE is 0.3560606. Now, we will look at the Confunsion Matrix (class 2 is MCI):

In [45]:
print("Confusion matrix:")
confusion_matrix <- calculateConfusionMatrix(r$pred)
print(confusion_matrix)
print("Accuracy:")
sum(r$pred$data$truth == r$pred$data$response)/length(r$pred$data$truth)
t(confusion_matrix$result)

[1] "Confusion matrix:"
        predicted
true      1  2 -err.-
  1      25 29     29
  2      18 60     18
  -err.- 18 29     47


ERROR: Error in accuracy(confusion_matrix(result)): could not find function "accuracy"


In [44]:
TP <- confusion_matrix$result[2,2]
FP <- confusion_matrix$result[1,2]
TN <- confusion_matrix$result[1,1]
FN <- confusion_matrix$result[2,1]

precision = TP / (TP + FP)
recall = TP / (TP + FN)
f1 <- 2 * precision * recall / (precision + recall)
f1
precision
recall