## About
This is a notebook for calculating prducer's accuracies and user's accuracies by class, along with their confidence intervals.
We follow the notation and calculations in Olofsson et al. 

The data input needs to be:
1. a csv of the points assessed with two columns: map_class and ref_class. Map class is the classification of the point in the map, ref_class is the "ground truth" classification of the point. 

2. a csv with the number of pixels per class in the map.

In [1]:
import pandas as pd
import numpy as np
import os

from sklearn.metrics import confusion_matrix
from sklearn.utils.multiclass import unique_labels

### Load Data

In [2]:
# load validation points
df = pd.read_csv('model_AE5FP_validation_pts_map_and_ref_classes.csv')
df = df.rename({'AE5FP_class':'map_class'},axis=1)
df.head()

Unnamed: 0,lon,lat,naip_id,pts_crs,map_class,ref_class,which_raster
0,-120.465232,34.462486,ca_m_3412037_nw_10_060_20200607,EPSG:4326,1,1,0
1,-119.997217,34.459656,ca_m_3412040_ne_10_060_20200522,EPSG:4326,1,1,0
2,-119.96981,34.438493,ca_m_3411933_nw_11_060_20200522,EPSG:4326,3,3,2
3,-120.469599,34.465779,ca_m_3412037_nw_10_060_20200607,EPSG:4326,1,1,0
4,-120.266262,34.471258,ca_m_3412038_ne_10_060_20200522,EPSG:4326,2,2,0


In [3]:
# load counts of pixels per class in map
pix_counts_by_raster = pd.read_csv('model_AE5FP_map_pixel_counts.csv')
pix_counts_by_raster

Unnamed: 0,n_nonice_2020,n_ice_2020,n_ground_2020,n_water_2020,raster
0,36271293,5382187,111150412,62968690,modelAE5_FP_2020_merged_crs26910_S_2020
1,1122203,30004,1891593,2893071,modelAE5_FP_2020_merged_crs26910_W_2020
2,89669636,1123921,62587031,69125241,modelAE5_FP_2020_merged_crs26911_2020


In [4]:
pix_counts = pd.DataFrame(data ={'n_other_veg': [sum(pix_counts_by_raster.n_nonice_2020)],
                'n_iceplant': [sum(pix_counts_by_raster.n_ice_2020)],
                'n_ground': [sum(pix_counts_by_raster.n_ground_2020)],
                'n_water': [sum(pix_counts_by_raster.n_water_2020)]})
pix_counts = pix_counts.to_numpy()[0]
pix_counts

array([127063132,   6536112, 175629036, 134987002])

### Confusion Matrix

Here we create a confusion matrix $n$ such that 

$n_{i,j}$ = number of points predicted as $i$, known to be $j$, 

which is equivalent to

$n_{i,j}$ = number of points that have map class as $i$ and reference class $j$.

In [5]:
# counts by reference class
print('Points in each reference class')
print(np.unique(df.ref_class, return_counts=True), '\n')

# counts by map class: these should match the counts given by the stratified sample design
print('Points in each map class')
print(np.unique(df.map_class, return_counts=True))

Points in each reference class
(array([0, 1, 2, 3]), array([236, 139, 119, 100])) 

Points in each map class
(array([0, 1, 2, 3]), array([200, 199, 110,  85]))


In [7]:
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
# using confusion_matrix directly we get a matrix C such that
# C_{i,j} = known to be i, predicted as  j 
# The notation in the paper is 
# n_{i,j} = predicted as i, known to be j 
# so we need to take the transpose

n = confusion_matrix(df.ref_class, df.map_class, labels=range(0,4)).T
n

array([[170,   1,  20,   9],
       [ 51, 137,  11,   0],
       [ 15,   1,  85,   9],
       [  0,   0,   3,  82]])

### Notation
Throughout the following, let $p_{ij}$ be the (true) fraction of the map that has map class $i$ and reference class $j$. 

### User's Accuracy
The user's accuracy of class $i$ is the fraction of the area mapped as class $i$ that has reference class $i$, this is (Olofsson et al. eq 2):
$$U_i = \frac{p_{ii}}{p_{i\cdot}}.$$
This is equivalent to the precision of class $i$. For example, when there are two classes (positive and negative) the user's accuracy of the positive class is the same as the precision of the true class (TP/(TP + FP)).

To estimate the $U_i$'s from the points sample we have that
$$\hat{U}_i = \frac{\hat{p}_{ii}}{\hat{p}_{i\cdot}},$$
where $\hat{p}_{ij}$ are the estimations of $p_{ij}$ from the sample. 
For stratified random sampling in which the sampling strata correspond to the map classes we have that
$$\hat{p}_{ij} = W_i \frac{n_{ij}}{n_{i\cdot}},$$
where
- $W_i$ is the fraction of the map's area classified  as class i,
- $n_{ij}$ is number of points with map class $i$, known to be reference class $j$ (entires in the confusion matrix), and
- $n_{i\cdot}$ is the number of points with map class $i$ (row sums in confusion matrix).

Notice that the user's accuracy can be simplifeid to
$$\hat{U}_i = \frac{n_{ii}}{n_{i\cdot}},$$
this is the formula implemented in the code.

For user's accuracy of map class $i$, the estiamted variance is (Olfosson et al. eq. 6):
$$\hat{V}(\hat{U}_i) = \frac{\hat{U}_i (1-\hat{U}_i)}{n_{i\cdot}-1}.$$

**NOTE:** We calculate the user's accuracies first since these are needed to calculate the approximate variance of the overall accuracy. 

### Variance, Standard Error & Confidence Intervals
Recall that the square root of the estimated variance results in the standard error of the estimator. For example, in the case of the estimated overall accuracy of the map $\hat{O}$ we have that $\hat{S}(\hat{O}) = \sqrt{\hat{V}(\hat{O})}$ (see Olofsson et al. eq. 5). 

Also, the standard error is used to get confidence intervals for the estimated statistic:the 95% confidence interval is estimated as $\hat{O} \pm 1.96 \hat{S}(\hat{O}) = \hat{O} \pm 1.96\  \sqrt{\hat{V}(\hat{O})}$.

In [19]:
n_classes = 4

# -------------------------------------
# points in sample that had class i in map (predicted as i, any true class j)
# these will also be used in overal accuracy and producer's accuracies
n_idot = [sum(n[i,:]) for i in range(4)]

# -------------------------------------
# estimated users' accuracy (precision for each class: TP/(TP+FP))
U_hat = [n[i,i] / n_idot[i] for i in range(n_classes)]

var_U_hat = [U_hat[i] * (1-U_hat[i])/(n_idot[i]-1) for i in range(0,4)]

# -------------------------------------
print("user's accuracies:", [x*100 for x in U_hat])
print("user's accuracies confidence interval:", 1.95*np.sqrt(var_U_hat)*100)

user's accuracies: [85.0, 68.84422110552764, 77.27272727272727, 96.47058823529412]
user's accuracies confidence interval: [4.93586378 6.41807914 7.82723088 3.92593944]


### Overal Accuracy

Let $O$ be the (true) accuracy of the map, and $\hat{O}$ its estimation from the sample. Then, following Olofsson et al., section 4.3, we have that 
$$\hat{O} = \sum_{i=1}^q \hat{p}_{ii},$$
where $q$ is the number of classes in the map and $\hat{p}_{ii}$ is the estimation of $p_{ij}$, the (true) fraction of the area in the map that was classified as class $i$ and has reference class $j$. 

For overall accuracy, the estimated variance is (Olofsson et al. eq 5):
$$\hat{V}(\hat{O}) = \sum_{i=1}^q \frac{W_i^2 \hat{U}_i (1-\hat{U}_i)}{n_{i\cdot}-1}.$$

In [27]:
# total number of pixels in the map
total_pix = sum(pix_counts)

# list with the fractions of area in map mapped as each class
W = [pix_counts[i]/ total_pix for i in range(n_classes)]      

# -------------------------------------
# overall accuracy
O_hat = sum([W[i]*n[i,i]/n_idot[i] for i in range(0,4)])
print('overall accuracy:', O_hat*100)

# -------------------------------------
var_O_hat = sum([ W[i]**2 * U_hat[i] * (1-U_hat[i])/(n_idot[i]-1) for i in range(0,4)])

# std error of estimated overall accuracy
print('overall accuracy confidence interval:', 1.95*np.sqrt(var_O_hat)*100, '\n')

overall accuracy: 85.19281389053452
overall accuracy confidence interval: 3.605875402993303 



### Producer's Accuracy
The producer's accuracy of class $i$ is the fraction of the (true) area with reference class $i$ that is actually mapped as class $j$, this is (Olofsson et al. eq 3):
$$P_j = \frac{p_{jj}}{p_{\cdot j}}.$$
This is equivalent to the sensitiviy of class $j$. For example, when there are two classes (positive and negative) the producer's accuracy of the positive class is the same as the sensitivy of the true class (TP/(TP + FN)).

To estimate the $P_i$'s from the points sample we have that
$$\hat{P}_j = \frac{\hat{p}_{jj}}{\hat{p}_{\cdot j}},$$
where the $\hat{p}_{ij}$ are as before.

For the producer's accuracy of class $j$ the estimated variance is given by (Olofsson et al. eq 7):

$$\hat{V}(\hat{P}_j) = 
\frac{1}{\hat{N}_{\cdot j}^2} 
\left( 
\frac{N_{j \cdot}^2 (1 - \hat{P}_j)^2 \hat{U}_j (1-\hat{U}_j)}{n_{j \cdot} -1}  
+
\hat{P}_j^2
\sum_{i\neq j}^q 
\frac{N_{i\cdot}^2}{n_{i \cdot} - 1} 
\frac{n_{ij}}{n_{i \cdot}} 
\left( 1 - \frac{n_{ij}}{n_{i \cdot}} \right)
\right),$$
where
- $N_{j \cdot}$ is the number of pixels in the map with map class $j$,
- $n_{j\cdot}$ is the number of sample points with map class $j$, and
- $\hat{N}_{\cdot j} = \sum_{i=1}^q N_{i\cdot}\frac{n_{ij}}{n_{i\cdot}}$ is the estimated number of pixels with reference class $j$.



In [51]:
p_hat_dotj = []
# estimated producer's accurace (sensitiviy for each class TP/(TP+FN))
P_hat = []  

for j in range(n_classes):
    # list of p_hat_ij with fixed j
    p_hat_ij = [ W[i]*n[i,j]/n_idot[i] for i in range(n_classes) ]
    p_hat_dotj.append(sum(p_hat_ij))  # equation (9)
p_hat_dotj


P_hat= [ (W[j]*n[j,j]/n_idot[j]) / p_hat_dotj[j] for j in range(n_classes)]
# -------------------------------------
print("producer's accuracies:", [x*100 for x in P_hat])

# -------------------------------------
# -------------------------------------
# VARIANCE
# notice N_jdot is pixel_counts[j]
N_hat_cdotj = []
for j in range(n_classes):
    summands = [ pix_counts[i] * n[i,j]/n_idot[i] for i in range(n_classes)]
    N_hat_cdotj.append(sum(summands))

# -------------------------------------
summand1 = [ (pix_counts[j]**2) * ((1-P_hat[j])**2) * U_hat[j] * (1-U_hat[j]) / (n_idot[j] - 1) 
            for j in range(n_classes)]

# -------------------------------------
summand2 = []
for j in range(n_classes):
    inner = []
    for i in range(n_classes):
        if i!=j:
            inner.append( (pix_counts[i]**2) /(n_idot[i]-1) * (n[i,j])/(n_idot[i]) * ( 1 - n[i,j]/n_idot[i]) ) 
    summand2.append((P_hat[j]**2) * sum(inner))

# -------------------------------------
var_P_hat = [1/(N_hat_cdotj[j]**2) *  (summand1[j] + summand2[j]) for j in range(n_classes)]

# -------------------------------------
# -------------------------------------
print("producer's accuracies confidence interval:", 1.95*np.sqrt(var_P_hat)*100, '\n')

producer's accuracies: [80.82402844924152, 66.84417997754406, 88.3865773243483, 86.63598116981504]
producer's accuracies confidence interval: [ 6.8722553  33.3373373   4.42737663  5.61087788] 



### Area Estimates

For stratified random sampling when the map classes are the strata, we have that an estimator of the proportion of area of class $j$ is (Olofsson et al. eq. 9):
$$ \hat{p}_{\cdot j} = \sum_{i=1}^q W_i \frac{n_{ij}}{n_{i\cdot}}.$$

For this estimator of area proportion per class, the standard error is estimated by (Olofsson et al. eq 10):
$$S(\hat{p}_{\cdot j}) =  
\sqrt{
\sum_{i=1}^q W_i^2 \frac{ \frac{n_{ij}}{n_{i\cdot}} \left(1 -  \frac{n_{ij}}{n_{i\cdot}} \right)}{n_{i \cdot}-1}
}.$$

The estimated area of class $j$ is
$$\hat{A}_j = A \times \hat{p}_{\cdot k},$$
where $A$ is the total are of the map. 
The standard error for the area is given by (Olofsson et al. eq 11):
$$ S(\hat{A}_j) = A \times S(\hat{p}_{\cdot j}).$$

In [74]:
# PERCENTAGE OF AREA ESTIMATION
# we had calculated the are estimators before, they are used in producer's accuracy
print("percentage of area per class: \n", [x*100 for x in p_hat_dotj])

# -------------------------------------
# STD ERROR
SE_p_hat_dotj = []
for j in range(n_classes):
    summands = [ (W[i]**2) * (n[i,j]/n_idot[i]) * (1 -  (n[i,j]/n_idot[i]))/ (n_idot[i]-1) 
                for i in range(n_classes)]
    SE_p_hat_dotj.append(np.sqrt(sum(summands)))
    
print("confidence interval for percentage area per class:\n", [x*1.96*100 for x in SE_p_hat_dotj])


percentage of area per class: 
 [30.081846706908088, 1.5154090662249942, 34.565492267897106, 33.8372519589698]
confidence interval for percentage area per class:
 [2.917186667668769, 0.7641169000097102, 3.54075127941016, 2.5010852129609216]


In [72]:
# AREA ESTIMATION

confidence interval for percentage area per class:
 [2.917186667668769, 0.7641169000097102, 3.54075127941016, 2.5010852129609216]


In [79]:
map_area = total_pix * 0.25 #in m^2, assuming a resolution of 0.5m per pixel side

approx_area_per_class = [map_area * p_hat_dotj[i] for i in range(n_classes)]
print("approx area per class (m^2): \n", approx_area_per_class)

SE_area_per_class = [map_area * SE_p_hat_dotj[i] for i in range(n_classes)]
print("confidence interval for area per class (m^2):\n", [x*1.96 for x in SE_area_per_class])

approx area per class (m^2): 
 [33407040.04497487, 1682919.664246231, 38386299.738131836, 37577561.052647054]
confidence interval for area per class (m^2):
 [3239647.2455628067, 848581.0105469481, 3932139.5701876124, 2777550.6829536646]


In [80]:
map_area = total_pix * 0.25 / (100**2)#in km^2, assuming a resolution of 0.5m per pixel side

approx_area_per_class = [map_area * p_hat_dotj[i] for i in range(n_classes)]
print("approx area per class (km^2): \n", approx_area_per_class)

SE_area_per_class = [map_area * SE_p_hat_dotj[i] for i in range(n_classes)]
print("confidence interval for area per class (km^2):\n", [x*1.96 for x in SE_area_per_class])

approx area per class (km^2): 
 [3340.704004497487, 168.2919664246231, 3838.6299738131834, 3757.7561052647056]
confidence interval for area per class (km^2):
 [323.96472455628066, 84.85810105469481, 393.2139570187612, 277.75506829536647]


CHECK THESE PAPERS:

https://www.sciencedirect.com/science/article/pii/S0034425712004191?casa_token=VRVZgQNuCnoAAAAA:lpifuEHGRTIQIamPd7BaXJVxE5j8LBiyAGX5kTLRz1RCgU_5Uj34g_8lsRKrCz8iGNlYoabJ

https://www.sciencedirect.com/science/article/pii/S0034425706004068?casa_token=34fHx5SX2vsAAAAA:_0hDu9LAlVO6JGqeV0yZWmHZ99uW-yoh2QhdTGt4QDr6FZgE9deZQM-xAVH9biVSNJFfc4SV

In [8]:
n

array([[170,   1,  20,   9],
       [ 51, 137,  11,   0],
       [ 15,   1,  85,   9],
       [  0,   0,   3,  82]])

In [11]:
n=n*2
n

array([[680,   4,  80,  36],
       [204, 548,  44,   0],
       [ 60,   4, 340,  36],
       [  0,   0,  12, 328]])

In [19]:
n_classes = 4

# -------------------------------------
# points in sample that had class i in map (predicted as i, any true class j)
# these will also be used in overal accuracy and producer's accuracies
n_idot = [sum(n[i,:]) for i in range(4)]

# -------------------------------------
#USER'S ACCURACY

# estimated users' accuracy (precision for each class: TP/(TP+FP))
U_hat = [n[i,i] / n_idot[i] for i in range(n_classes)]

var_U_hat = [U_hat[i] * (1-U_hat[i])/(n_idot[i]-1) for i in range(0,4)]

# -------------------------------------
print("user's accuracies:", [x*100 for x in U_hat])
print("user's accuracies confidence interval:", 1.95*np.sqrt(var_U_hat)*100)

# -------------------------------------
# -------------------------------------

# OVERAL ACCURACY
# total number of pixels in the map
total_pix = sum(pix_counts)

# list with the fractions of area in map mapped as each class
W = [pix_counts[i]/ total_pix for i in range(n_classes)]      

# -------------------------------------
# overall accuracy
O_hat = sum([W[i]*n[i,i]/n_idot[i] for i in range(0,4)])
print('overall accuracy:', O_hat*100)

# -------------------------------------
var_O_hat = sum([ W[i]**2 * U_hat[i] * (1-U_hat[i])/(n_idot[i]-1) for i in range(0,4)])

# std error of estimated overall accuracy
print('overall accuracy confidence interval:', 1.95*np.sqrt(var_O_hat)*100, '\n')
# -------------------------------------
# -------------------------------------

p_hat_dotj = []
# estimated producer's accurace (sensitiviy for each class TP/(TP+FN))

for j in range(n_classes):
    # list of p_hat_ij with fixed j
    p_hat_ij = [ W[i]*n[i,j]/n_idot[i] for i in range(n_classes) ]
    p_hat_dotj.append(sum(p_hat_ij))  # equation (9)
p_hat_dotj


P_hat= [ (W[j]*n[j,j]/n_idot[j]) / p_hat_dotj[j] for j in range(n_classes)]
# -------------------------------------
print("producer's accuracies:", [x*100 for x in P_hat])

# -------------------------------------
# -------------------------------------
# VARIANCE
# notice N_jdot is pixel_counts[j]
N_hat_cdotj = []
for j in range(n_classes):
    summands = [ pix_counts[i] * n[i,j]/n_idot[i] for i in range(n_classes)]
    N_hat_cdotj.append(sum(summands))

# -------------------------------------
summand1 = [ (pix_counts[j]**2) * ((1-P_hat[j])**2) * U_hat[j] * (1-U_hat[j]) / (n_idot[j] - 1) 
            for j in range(n_classes)]

# -------------------------------------
summand2 = []
for j in range(n_classes):
    inner = []
    for i in range(n_classes):
        if i!=j:
            inner.append( (pix_counts[i]**2) /(n_idot[i]-1) * (n[i,j])/(n_idot[i]) * ( 1 - n[i,j]/n_idot[i]) ) 
    summand2.append((P_hat[j]**2) * sum(inner))

# -------------------------------------
var_P_hat = [1/(N_hat_cdotj[j]**2) *  (summand1[j] + summand2[j]) for j in range(n_classes)]

# -------------------------------------
# -------------------------------------
print("producer's accuracies confidence interval:", 1.95*np.sqrt(var_P_hat)*100, '\n')


# PERCENTAGE OF AREA ESTIMATION
# we had calculated the are estimators before, they are used in producer's accuracy
print("percentage of area per class: \n", [x*100 for x in p_hat_dotj])

# -------------------------------------
# STD ERROR
SE_p_hat_dotj = []
for j in range(n_classes):
    summands = [ (W[i]**2) * (n[i,j]/n_idot[i]) * (1 -  (n[i,j]/n_idot[i]))/ (n_idot[i]-1) 
                for i in range(n_classes)]
    SE_p_hat_dotj.append(np.sqrt(sum(summands)))
    
print("confidence interval for percentage area per class:\n", [x*1.96*100 for x in SE_p_hat_dotj])

map_area = total_pix * 0.25 / (100**2)#in km^2, assuming a resolution of 0.5m per pixel side

approx_area_per_class = [map_area * p_hat_dotj[i] for i in range(n_classes)]
print("approx area per class (km^2): \n", approx_area_per_class)

SE_area_per_class = [map_area * SE_p_hat_dotj[i] for i in range(n_classes)]
print("confidence interval for area per class (km^2):\n", [x*1.96 for x in SE_area_per_class])

user's accuracies: [85.0, 68.84422110552764, 77.27272727272727, 96.47058823529412]
user's accuracies confidence interval: [2.46329437 3.20297906 3.90022025 1.95426471]
overall accuracy: 85.19281389053452
overall accuracy confidence interval: 1.7969960274944623 

producer's accuracies: [80.82402844924152, 66.84417997754406, 88.3865773243483, 86.63598116981504]
producer's accuracies confidence interval: [ 3.4244642  16.6152207   2.20665601  2.79642297] 

percentage of area per class: 
 [30.081846706908088, 1.5154090662249942, 34.565492267897106, 33.8372519589698]
confidence interval for percentage area per class:
 [1.4541362047632633, 0.38083924454307405, 1.7644213604024164, 1.2461812512443673]
approx area per class (km^2): 
 [3340.704004497487, 168.2919664246231, 3838.6299738131834, 3757.7561052647056]
confidence interval for area per class (km^2):
 [161.4873810663307, 42.29365310284215, 195.94573304449574, 138.39318898615738]


In [20]:

user's accuracies: [85.0, 68.84422110552764, 77.27272727272727, 96.47058823529412]
user's accuracies confidence interval: [4.93586378 6.41807914 7.82723088 3.92593944]

444215282