# Entropy Summary Function Experiment (Notebook 2)

This notebook is the second part of the experiment appearing in the paper *On the stability of persistent entropy and new summary functions for TDA*. We will use images from the [misc database](http://sipi.usc.edu/database/database.php?volume=misc) (with exeption of a synthetic image which had a trivial barcode). The whole experiment consist on:
* Transform the images to gray scale.
* Add gaussian, poisson and salt and pepper noise.
* Calculate the persistent diagrams and barcodes of these images using the lower star filtration.
* Summarize the diagrams using the Betti curve, the NES function and the persistence silhouettes.
* Compare the results of the three curves.

In this notebook we  will compare how betti, NES and silhouettes respond to noise and their discriminative power

## Robustness to noise

Betti, NES and silhouettes were computed in script2.R and saved as data.

In [1]:
load("script2.RData")

We calculate the L1-norm between the curve from the original image and the noisy one. We save it for each type of curve and noise and plot them.

In [2]:
n <- length(nes_list[[1]])

nes_avg <- c(0,0,0)
betti_avg <- c(0,0,0)
sil_avg <- c(0,0,0)
sil2_avg <- c(0,0,0)

nes_perturbation <- matrix(0, nrow = n, ncol = length(folders)-1)
betti_perturbation <- matrix(0, nrow = n, ncol = length(folders)-1)
sil_perturbation <- matrix(0, nrow = n, ncol = length(folders)-1)
sil2_perturbation <- matrix(0, nrow = n, ncol = length(folders)-1)

for (i in seq(2,length(folders),1)){
    for (n in seq(n)){
      nes_perturbation[n,i-1] <- NormaL1_new(nes_list[[1]][[n]]- nes_list[[i]][[n]], tseq)
      betti_perturbation[n,i-1] <- NormaL1_new(betti_list[[1]][[n]]- betti_list[[i]][[n]], tseq)
      sil_perturbation[n,i-1] <- NormaL1_new(sil_list[[1]][[n]]- sil_list[[i]][[n]], tseq)
      sil2_perturbation[n,i-1] <- NormaL1_new(sil2_list[[1]][[n]]- sil2_list[[i]][[n]], tseq)
    }
    nes_avg[i-1] <- mean(nes_perturbation[, i-1])
    betti_avg[i-1] <- mean(betti_perturbation[, i-1])
    sil_avg[i-1] <- mean(sil_perturbation[, i-1])
    sil2_avg[i-1] <- mean(sil2_perturbation[, i-1])
}



As it can be seen, NES is more robust to gaussian and poisson noise than Betti but th most robust are silhouettes. In the case of salt and pepper, Betti is better than the others while NES is better than silhouettes. This last fact is expected since diagrams (with the bottleneck distance) are unstable to salt and pepper noise when calculated with the lower star filtration, but small changes are produces if we are only counting the bars. 

In [3]:
x <- data.frame("Betti" = betti_avg, "NES" = nes_avg, "Sil_p1" = sil_avg, "Sil_p2" = sil2_avg)
row.names(x) <- c("Gauss", "Poisson", "s&p")
x

Unnamed: 0,Betti,NES,Sil_p1,Sil_p2
Gauss,0.1988986,0.1149179,0.0387186,0.02449915
Poisson,0.2445259,0.1991411,0.155366,0.08634215
s&p,0.1397105,0.2978631,0.4281472,0.45781185


## Discriminative power

It is also expected than being less sensitive to noise has a cost. The more robust to noise, the less power to distinguish the images. In the following example we calculate the distance matrix between funtions corresponding to none noisy images.

In [4]:
listaux <- nes_list[[1]]
n <- length(listaux)
dmnes <- matrix(0, nrow = n, ncol = n)
dvnes <- c()

for (i in seq(n)){
    for (j in seq(n)){
        dmnes[i,j] = NormaL1_new(listaux[[i]] - listaux[[j]], tseq)
        if (i < j){
            dvnes <- c(dvnes, dmnes[i,j])
        }
    }
}


listaux <- betti_list[[1]]
n <- length(listaux)
dmbetti <- matrix(0, nrow = n, ncol = n)
dvbetti <- c()

for (i in seq(n)){
    for (j in seq(n)){
        dmbetti[i,j] = NormaL1_new(listaux[[i]] - listaux[[j]], tseq)
        if (i < j){
            dvbetti <- c(dvbetti, dmbetti[i,j])
        }
    }
}

listaux <- sil_list[[1]]
n <- length(listaux)
dmsil <- matrix(0, nrow = n, ncol = n)
dvsil <- c()

for (i in seq(n)){
    for (j in seq(n)){
        dmsil[i,j] = NormaL1_new(listaux[[i]] - listaux[[j]], tseq)
        if (i < j){
            dvsil <- c(dvsil, dmsil[i,j])
        }
    }
}

listaux <- sil2_list[[1]]
n <- length(listaux)
dmsil2 <- matrix(0, nrow = n, ncol = n)
dvsil2 <- c()

for (i in seq(n)){
    for (j in seq(n)){
        dmsil2[i,j] = NormaL1_new(listaux[[i]] - listaux[[j]], tseq)
        if (i < j){
            dvsil2 <- c(dvsil2, dmsil2[i,j])
        }
    }
}

In this case, Betti is the one with the highest difference, NES the second and silhouettes the less discriminatives.

In [5]:
results <- data.frame("Betti" = dvbetti, "NES" = dvnes, "Sil_p1" = dvsil, "Sil_p2" = dvsil2)
colMeans(results)

## Conclusions

NES function is more robust to noise than betti curves but less than silhouettes. On the other hand, it keeps less information than betti curves but more than silhouettes. Therefore, it is more balanced than these other curves. Nevertheless, its greatest interest is that it may produce pretty different curves to betti and silhouettes and then, distinguish images which they cannot. In any case, these curves are though to complement each other in, for example, classification tasks.