The package nevada (NEtwork-VAlued Data Analysis) is an R package for the statistical analysis of network-valued data. In this setting, a sample is made of statistical units that are networks themselves. The package provides a set of matrix representations for networks so that network-valued data can be transformed into matrix-valued data. Subsequently, a number of distances between matrices is provided as well to quantify how far two networks are from each other and several test statistics are proposed for testing equality in distribution between samples of networks using exact permutation testing procedures. The permutation scheme is carried out by the flipr package which also provides a number of test statistics based on inter-point distances that play nicely with network-valued data. The implementation is largely made in C++ and the matrix of inter- and intra-sample distances is pre-computed, which alleviates the computational burden often associated with permutation tests.
You can install the latest stable version of nevada on CRAN with:
install.packages("nevada")
Or you can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("astamm/nevada")
In this first example, we compare two populations of networks generated according to two different models (Watts-Strogatz and Barabasi), using the adjacency matrix representation of networks, the Frobenius distance to compare single networks and the combination of Student-like and Fisher-like statistics based on inter-point distances to summarize information and perform the permutation test.
set.seed(123)
n <- 10L
x <- nevada::nvd(
model = "smallworld",
n = n,
model_params = list(dim = 1L, nei = 4L, p = 0.15)
)
y <- nevada::nvd(
model = "pa",
n = n,
model_params = list(power = 1L, m = NULL, directed = FALSE)
)
By default the nvd()
constructor generates networks with 25 nodes. One
can wonder whether there is a difference between the distributions that
generated these two samples (which there is given the models that we
used). The test2_global()
function provides an answer to this
question:
t1_global <- nevada::test2_global(x, y, seed = 1234)
t1_global$pvalue
[1] 0.0009962984
The p-value is very small, leading to the conclusion that we should reject the null hypothesis of equal distributions.
Although this is a fake example, we could create a partition to try to localize differences along this partition:
partition <- as.integer(c(1:5, each = 5))
The test2_local()
function provides an answer to this question:
t1_local <- nevada::test2_local(x, y, partition, seed = 1234)
t1_local
$intra
# A tibble: 5 × 3
E pvalue truncated
<chr> <dbl> <lgl>
1 P1 0.175 TRUE
2 P2 0.175 TRUE
3 P3 0.175 TRUE
4 P4 0.0859 TRUE
5 P5 0.00299 FALSE
$inter
# A tibble: 10 × 4
E1 E2 pvalue truncated
<chr> <chr> <dbl> <lgl>
1 P1 P2 0.175 TRUE
2 P1 P3 0.175 TRUE
3 P1 P4 0.0859 TRUE
4 P1 P5 0.0240 FALSE
5 P2 P3 0.175 TRUE
6 P2 P4 0.00499 FALSE
7 P2 P5 0.000996 FALSE
8 P3 P4 0.0140 FALSE
9 P3 P5 0.00200 FALSE
10 P4 P5 0.0420 FALSE
In this second example, we compare two populations of networks generated according to the same model (Watts-Strogatz), using the adjacency matrix representation of networks, the Frobenius distance to compare single networks and the combination of Student-like and Fisher-like statistics based on inter-point distances to summarize information and perform the permutation test.
n <- 10L
x <- nevada::nvd("smallworld", n)
y <- nevada::nvd("smallworld", n)
One can wonder whether there is a difference between the distributions
that generated these two samples (which there is given the models that
we used). The test2_global()
function provides an answer to this
question:
t2 <- nevada::test2_global(x, y, seed = 1234)
t2$pvalue
[1] 0.9999973
The p-value is larger than 5% or even 10%, leading us to failing to reject the null hypothesis of equal distributions at these significance thresholds.