lolaR is a package built to estimate the curvature in latent distance network models as described in Hoff, Raftery, and Handcock (2002) as well as test whether a network could have been generated by a latent space of constant curvature. We estimate the latent distances on a subset of the network using cliques. We then exploit the fact that triangles allow for identification of the latent space provided we have a midpoint. See Wilkins-Reeves and McCormick (2022) for additional details
You can install the development version of lolaR from GitHub with:
# install.packages("devtools")
devtools::install_github("SteveJWR/lolaR")
We highlight this application with two collaboration networks in Physics article co-authorship from Leskovec, Kleinberg, and Faloutsos (2007) and available at Leskovec and Krevl (2014). We consider the problem of testing whether a latent space model has constant curvature. To do so we first consider two collaboration networks, one for Astrophysics and one for Condensed Matter Physics.
library(lolaR)
library(ggplot2)
library(igraph)
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
library(Matrix)
load("data/Astro_adjacency_matrix.rda")
load("data/CondMat_adjacency_matrix.rda")
# Adjacency Matrix Names
#G.astro
#G.cm
We next plot the largest connected components from the first 2000 indices.
g.sub.astro <- igraph::graph_from_adjacency_matrix(G.astro[1:2000,1:2000], mode = "undirected")
g.sub.cm <- igraph::graph_from_adjacency_matrix(G.cm[1:2000,1:2000], mode = "undirected")
connected.idx.astro <- which(components(g.sub.astro, mode = "strong")$membership == 4)
connected.idx.cm <- which(components(g.sub.cm, mode = "strong")$membership == 35)
g.sub.astro <- igraph::graph_from_adjacency_matrix(G.astro[connected.idx.astro,connected.idx.astro], mode = "undirected")
g.sub.cm <- igraph::graph_from_adjacency_matrix(G.cm[connected.idx.cm,connected.idx.cm], mode = "undirected")
V(g.sub.astro)$name = NA
V(g.sub.cm)$name = NA
plot(g.sub.astro, vertex.size=4)
plot(g.sub.cm, vertex.size=4)
We first find a list of cliques of size $ 19 $ and $ 12 $ respectively. This is in order to have a moderately large ($ 40-60 $) number of rows/columns in our distance matrix.
Since these networks are relatively sparse, we can compute the exact set of maximal cliques and partition them into non-overlapping sets. See the package for an approximate method of finding cliques.
cliques.astro = CliqueSearch(G.astro, min_clique_size = 19)
cliques.cm = CliqueSearch(G.cm, min_clique_size = 12)
Next using the set of cliques, we can estimate a distance matrix under the latent distance model without first choosing an embedding space.
D.astro <- EstimateD(G.astro, cliques.astro, verbose = T)
#> Num Steps: 0 Likelihood Stopping Criteria: 1 Num Steps: 1 Likelihood Stopping Criteria: 0.0125474 Num Steps: 2 Likelihood Stopping Criteria: 0.0024412 Num Steps: 3 Likelihood Stopping Criteria: 0.0002361 Num Steps: 4 Likelihood Stopping Criteria: 3.9e-05 Num Steps: 5 Likelihood Stopping Criteria: 4.63e-05 Num Steps: 6 Likelihood Stopping Criteria: 2.32e-05 Num Steps: 7 Likelihood Stopping Criteria: 1.1e-06 Num Steps: 8 Likelihood Stopping Criteria: 0
D.cliques <- EstimateD(G.cm, cliques.cm, verbose = T)
#> Num Steps: 0 Likelihood Stopping Criteria: 1 Num Steps: 1 Likelihood Stopping Criteria: 0.0002213 Num Steps: 2 Likelihood Stopping Criteria: 3.2e-06 Num Steps: 3 Likelihood Stopping Criteria: 0
We first can search for the best midpoint and estimate the latent curvature for each model.
kappa.astro <- EstimateCurvature(D.astro, verbose = T, d.yz.min = 1, d.yz.max = 4.5)
#> [1] "Midpoints: "
#> y z m dym dzm dyz
#> [1,] 2 3 1 7.176377 8.198623 4.331071
#> [2,] 5 6 4 7.282860 3.714406 8.441608
#> [3,] 8 9 7 6.813139 5.169662 6.178127
kappa.cm <- EstimateCurvature(D.cm, verbose = T, d.yz.min = 1, d.yz.max = 4.5)
#> [1] "Midpoints: "
#> y z m dym dzm dyz
#> [1,] 2 3 1 9.311635 7.071450 5.901554
#> [2,] 5 6 4 6.649070 8.279867 9.012998
#> [3,] 8 9 7 13.867194 17.240381 5.619337
From these distance matrices we are able to search for indices which
approximately form a set $ y,z,m $ where $ m $ is the midpoint of $
y $ and $ z
We would like to test the hypothesis $$ H_0: \kappa(r) = \kappa \text{ for all } r \in {1,2,\dots, R} $$ Where $ (r) $ corresponds to the curvature at the corresponding point $ r $.
test.astro <- ConstantCurvatureTest(D.astro, num.midpoints = 3, d.yz.min = 1, d.yz.max = 4.5)
test.cm <- ConstantCurvatureTest(D.cm, num.midpoints = 3, d.yz.min = 1, d.yz.max = 4.5)
med.vec.astro <- rep(0,3)
med.vec.cm <- rep(0,3)
for(k in seq(3)){
med.vec.astro[k] = SoftThreshold(median(test.astro$estimates[test.astro$loc == k]), 10)
med.vec.cm[k] = SoftThreshold(median(test.cm$estimates[test.cm$loc == k]),10)
}
library(ggplot2)
test.astro$scaled_estimates = SoftThreshold(test.astro$estimates, 10)
dat.astro <- data.frame(matrix(c(test.astro$loc, test.astro$scaled_estimates), ncol = 2))
names(dat.astro) = c("loc", "scaled_estimates")
ggplot(dat.astro, aes(y = scaled_estimates, x = loc)) +
geom_jitter() +
geom_vline(xintercept = 0.5, col = "red") +
geom_vline(xintercept = 1.5, col = "red") +
geom_vline(xintercept = 2.5, col = "red") +
geom_vline(xintercept = 3.5, col = "red") +
geom_segment(aes(x=0.5,xend=1.5,y=med.vec.astro[1],yend=med.vec.astro[1]), col = "blue") +
geom_segment(aes(x=1.5,xend=2.5,y=med.vec.astro[2],yend=med.vec.astro[2]), col = "blue") +
geom_segment(aes(x=2.5,xend=3.5,y=med.vec.astro[3],yend=med.vec.astro[3]), col = "blue") +
ggtitle("Curvature Estimates Within Astrophysics Network") +
ylab("Trimmed Curvature") +
xlab("Midpoint Set")
test.cm$scaled_estimates = SoftThreshold(test.cm$estimates, 10)
dat.cm <- data.frame(matrix(c(test.cm$loc, test.cm$scaled_estimates), ncol = 2))
names(dat.cm) = c("loc", "scaled_estimates")
ggplot(dat.cm, aes(y = scaled_estimates, x = loc)) +
geom_jitter() +
geom_vline(xintercept = 0.5, col = "red") +
geom_vline(xintercept = 1.5, col = "red") +
geom_vline(xintercept = 2.5, col = "red") +
geom_vline(xintercept = 3.5, col = "red") +
geom_segment(aes(x=0.5,xend=1.5,y=med.vec.cm[1],yend=med.vec.cm[1]), col = "blue") +
geom_segment(aes(x=1.5,xend=2.5,y=med.vec.cm[2],yend=med.vec.cm[2]), col = "blue") +
geom_segment(aes(x=2.5,xend=3.5,y=med.vec.cm[3],yend=med.vec.cm[3]), col = "blue") +
ggtitle("Curvature Estimates Within CMP Network") +
ylab("Trimmed Curvature") +
xlab("Midpoint Set")
Hoff, Peter D, Adrian E Raftery, and Mark S Handcock. 2002. “Latent Space Approaches to Social Network Analysis.” Journal of the American Statistical Association 97 (460): 1090–98.
Leskovec, Jure, Jon Kleinberg, and Christos Faloutsos. 2007. “Graph evolution.” ACM Transactions on Knowledge Discovery from Data (TKDD) 1 (1). https://doi.org/10.1145/1217299.1217301.
Leskovec, Jure, and Andrej Krevl. 2014. “SNAP Datasets: Stanford Large Network Dataset Collection.” http://snap.stanford.edu/data.
Wilkins-Reeves, Steven, and Tyler McCormick. 2022. “Asymptotically Normal Estimation of Local Latent Network Curvature.” arXiv Preprint arXiv:2211.11673.