### Accuracy Statistics

Using the validation_RF shapefile processed in QGIS

In [1]:
library(raster)
library(plyr)
setwd("~/RDemo/capstone/Sentinel2data")
img.classified <- raster("RF_classification2.tif")
shp.train <- shapefile("C://Users/Roxana/Documents/RDemo/capstone/QGISprocessed/training_data2.shp")
shp.valid <- shapefile("validation_RF")

Loading required package: sp


In [2]:
table(shp.valid$validclass)


 1  2  3  4  5  6  7  8 
 6 14  5 10 16  9  4  7 

We will generate two factor vectors, which we will compare in the confusion matrix:
1. **reference** : class labels assigned manually in QGIS
2. **predicted** : class labels that resulted in the automatic RF (or SVM) classification

In [4]:
reference <- as.factor(shp.valid$validclass)
reference # there are NA values, but we will ignore them later on

In [5]:
predicted <- as.factor(extract(img.classified, shp.valid))
predicted

In [6]:
accmat <- table("pred" = predicted, "ref" = reference)
accmat # contingency table of the counts at each combination of factor levels

    ref
pred  1  2  3  4  5  6  7  8
   1  4  0  1  0  5  0  0  0
   2  0 11  0  0  0  0  0  0
   3  0  0  3  2  0  0  0  0
   4  0  0  0  7  0  0  0  0
   5  0  0  0  0 11  1  0  0
   6  0  3  0  0  0  5  2  0
   7  2  0  0  1  0  1  1  0
   8  0  0  1  0  0  2  1  7

The numbers reflect the number of validation pixels. All pixels that have a NA value in either reference or predicted were ignored here. This output already visualize if and where there are misclassifications in our map: all pixels located on the diagonale are correctly classified, all pixels off the diagonal are not.

**User’s accuracies:**

In [8]:
UA <- diag(accmat) / rowSums(accmat) * 100
round(UA, 2)

**Producer’s accuracies:**

In [10]:
PA <- diag(accmat) / colSums(accmat) * 100
round(PA,2) 

 **Overall accuracy:**

In [12]:
OA <- sum(diag(accmat)) / sum(accmat) * 100
round(OA,4)

**Confusion matrix:**

In [13]:
accmat.ext <- addmargins(accmat)
accmat.ext <- rbind(accmat.ext, "Users" = c(PA, NA))
accmat.ext <- cbind(accmat.ext, "Producers" = c(UA, NA, OA))
colnames(accmat.ext) <- c(levels(as.factor(shp.train$classes)), "Sum", "PA")
rownames(accmat.ext) <- c(levels(as.factor(shp.train$classes)), "Sum", "UA")
accmat.ext <- round(accmat.ext, digits = 1)
dimnames(accmat.ext) <- list("Prediction" = colnames(accmat.ext),
                             "Reference" = rownames(accmat.ext))
class(accmat.ext) <- "table"
accmat.ext

          Reference
Prediction     b     c     f     g   ind    rh    rl     w   Sum    UA
       b     4.0   0.0   1.0   0.0   5.0   0.0   0.0   0.0  10.0  40.0
       c     0.0  11.0   0.0   0.0   0.0   0.0   0.0   0.0  11.0 100.0
       f     0.0   0.0   3.0   2.0   0.0   0.0   0.0   0.0   5.0  60.0
       g     0.0   0.0   0.0   7.0   0.0   0.0   0.0   0.0   7.0 100.0
       ind   0.0   0.0   0.0   0.0  11.0   1.0   0.0   0.0  12.0  91.7
       rh    0.0   3.0   0.0   0.0   0.0   5.0   2.0   0.0  10.0  50.0
       rl    2.0   0.0   0.0   1.0   0.0   1.0   1.0   0.0   5.0  20.0
       w     0.0   0.0   1.0   0.0   0.0   2.0   1.0   7.0  11.0  63.6
       Sum   6.0  14.0   5.0  10.0  16.0   9.0   4.0   7.0  71.0      
       PA   66.7  78.6  60.0  70.0  68.8  55.6  25.0 100.0        69.0

### Significance Test
We can also check if the result is purely coincidental, i.e., whether a random classification of the classes could have led to the same result, and will do this using a binomial test. Let x = total number of correctly classified validation points, and n = the total number of validation points in our confusion matrix:

In [17]:
sign <- binom.test(x = sum(diag(accmat)),
                   n = sum(accmat),
                   alternative = c("two.sided"),
                   conf.level = 0.95
)

**p-value:**

In [18]:
pvalue <- sign$p.value
pvalue

 **Confidence interval at alpha=0.05:**

In [16]:
conf_int_95 <- sign$conf.int[1:2]
round(conf_int_95,4)

### Conclusion: 
The *p-value* is lower than 0.05, so the classification resulted map is *somewhat significant*. If the classification were repeated under the same conditions, it can be assumed that the **overall accuracy** is 95% in the range of **56.92% to 79.46%**. 
Therefore, it is room for more model tuning, better resampling and more validation points, looking to further improve the model performance.