Skip to content

SteveJWR/lolaR

Repository files navigation

lolaR

lolaR is a package built to estimate the curvature in latent distance network models as described in Hoff, Raftery, and Handcock (2002) as well as test whether a network could have been generated by a latent space of constant curvature. We estimate the latent distances on a subset of the network using cliques. We then exploit the fact that triangles allow for identification of the latent space provided we have a midpoint. See Wilkins-Reeves and McCormick (2022) for additional details

Installation

You can install the development version of lolaR from GitHub with:

# install.packages("devtools")
devtools::install_github("SteveJWR/lolaR")

Example

We highlight this application with two collaboration networks in Physics article co-authorship from Leskovec, Kleinberg, and Faloutsos (2007) and available at Leskovec and Krevl (2014). We consider the problem of testing whether a latent space model has constant curvature. To do so we first consider two collaboration networks, one for Astrophysics and one for Condensed Matter Physics.

library(lolaR)
library(ggplot2)
library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(Matrix)

load("data/Astro_adjacency_matrix.rda")
load("data/CondMat_adjacency_matrix.rda")

# Adjacency Matrix Names
#G.astro
#G.cm 

We next plot the largest connected components from the first 2000 indices.

g.sub.astro <- igraph::graph_from_adjacency_matrix(G.astro[1:2000,1:2000], mode = "undirected")
g.sub.cm <- igraph::graph_from_adjacency_matrix(G.cm[1:2000,1:2000], mode = "undirected")


connected.idx.astro  <- which(components(g.sub.astro, mode = "strong")$membership == 4)
connected.idx.cm  <- which(components(g.sub.cm, mode = "strong")$membership == 35)


g.sub.astro <- igraph::graph_from_adjacency_matrix(G.astro[connected.idx.astro,connected.idx.astro], mode = "undirected")
g.sub.cm <- igraph::graph_from_adjacency_matrix(G.cm[connected.idx.cm,connected.idx.cm], mode = "undirected")

V(g.sub.astro)$name = NA
V(g.sub.cm)$name = NA
plot(g.sub.astro, vertex.size=4)

plot(g.sub.cm, vertex.size=4)

We first find a list of cliques of size $ 19 $ and $ 12 $ respectively. This is in order to have a moderately large ($ 40-60 $) number of rows/columns in our distance matrix.

Since these networks are relatively sparse, we can compute the exact set of maximal cliques and partition them into non-overlapping sets. See the package for an approximate method of finding cliques.

cliques.astro = CliqueSearch(G.astro, min_clique_size = 19)
cliques.cm = CliqueSearch(G.cm, min_clique_size = 12)

Next using the set of cliques, we can estimate a distance matrix under the latent distance model without first choosing an embedding space.

D.astro <- EstimateD(G.astro, cliques.astro, verbose = T)
#> Num Steps: 0 Likelihood Stopping Criteria: 1 Num Steps: 1 Likelihood Stopping Criteria: 0.0125474 Num Steps: 2 Likelihood Stopping Criteria: 0.0024412 Num Steps: 3 Likelihood Stopping Criteria: 0.0002361 Num Steps: 4 Likelihood Stopping Criteria: 3.9e-05 Num Steps: 5 Likelihood Stopping Criteria: 4.63e-05 Num Steps: 6 Likelihood Stopping Criteria: 2.32e-05 Num Steps: 7 Likelihood Stopping Criteria: 1.1e-06 Num Steps: 8 Likelihood Stopping Criteria: 0 
D.cliques <- EstimateD(G.cm, cliques.cm, verbose = T)
#> Num Steps: 0 Likelihood Stopping Criteria: 1 Num Steps: 1 Likelihood Stopping Criteria: 0.0002213 Num Steps: 2 Likelihood Stopping Criteria: 3.2e-06 Num Steps: 3 Likelihood Stopping Criteria: 0 

We first can search for the best midpoint and estimate the latent curvature for each model.

kappa.astro <- EstimateCurvature(D.astro, verbose = T, d.yz.min = 1, d.yz.max = 4.5)
#> [1] "Midpoints: "
#>      y z m      dym      dzm      dyz
#> [1,] 2 3 1 7.176377 8.198623 4.331071
#> [2,] 5 6 4 7.282860 3.714406 8.441608
#> [3,] 8 9 7 6.813139 5.169662 6.178127
kappa.cm <- EstimateCurvature(D.cm, verbose = T, d.yz.min = 1, d.yz.max = 4.5)
#> [1] "Midpoints: "
#>      y z m       dym       dzm      dyz
#> [1,] 2 3 1  9.311635  7.071450 5.901554
#> [2,] 5 6 4  6.649070  8.279867 9.012998
#> [3,] 8 9 7 13.867194 17.240381 5.619337

From these distance matrices we are able to search for indices which approximately form a set $ y,z,m $ where $ m $ is the midpoint of $ y $ and $ z $, i.e. ($ d_{yz} = 2d_{ym} = 2d_{zm} $). After finding a set of $ R $ non-overlapping indices such approximately satisfying this midpoint equation $ {y{(r)},z{(r)},m^{(r)}}_{r = 1}^R $.

We would like to test the hypothesis $$ H_0: \kappa(r) = \kappa \text{ for all } r \in {1,2,\dots, R} $$ Where $ (r) $ corresponds to the curvature at the corresponding point $ r $.

test.astro <- ConstantCurvatureTest(D.astro, num.midpoints = 3, d.yz.min = 1, d.yz.max = 4.5)
test.cm <- ConstantCurvatureTest(D.cm, num.midpoints = 3, d.yz.min = 1, d.yz.max = 4.5)
med.vec.astro <- rep(0,3)
med.vec.cm <- rep(0,3)

for(k in seq(3)){
  med.vec.astro[k] = SoftThreshold(median(test.astro$estimates[test.astro$loc == k]), 10)
  med.vec.cm[k] = SoftThreshold(median(test.cm$estimates[test.cm$loc == k]),10)
}
library(ggplot2)


test.astro$scaled_estimates = SoftThreshold(test.astro$estimates, 10)
dat.astro <- data.frame(matrix(c(test.astro$loc, test.astro$scaled_estimates), ncol = 2))

names(dat.astro) = c("loc", "scaled_estimates")
ggplot(dat.astro, aes(y = scaled_estimates, x = loc)) +
  geom_jitter() +
  geom_vline(xintercept = 0.5, col = "red") +
  geom_vline(xintercept = 1.5, col = "red") +
  geom_vline(xintercept = 2.5, col = "red") +
  geom_vline(xintercept = 3.5, col = "red") +
  geom_segment(aes(x=0.5,xend=1.5,y=med.vec.astro[1],yend=med.vec.astro[1]), col = "blue") +
  geom_segment(aes(x=1.5,xend=2.5,y=med.vec.astro[2],yend=med.vec.astro[2]), col = "blue") +
  geom_segment(aes(x=2.5,xend=3.5,y=med.vec.astro[3],yend=med.vec.astro[3]), col = "blue") +
  ggtitle("Curvature Estimates Within Astrophysics Network") +
  ylab("Trimmed Curvature") +
  xlab("Midpoint Set")

test.cm$scaled_estimates = SoftThreshold(test.cm$estimates, 10)
dat.cm <- data.frame(matrix(c(test.cm$loc, test.cm$scaled_estimates), ncol = 2))

names(dat.cm) = c("loc", "scaled_estimates")
ggplot(dat.cm, aes(y = scaled_estimates, x = loc)) +
  geom_jitter() +
  geom_vline(xintercept = 0.5, col = "red") +
  geom_vline(xintercept = 1.5, col = "red") +
  geom_vline(xintercept = 2.5, col = "red") +
  geom_vline(xintercept = 3.5, col = "red") +
  geom_segment(aes(x=0.5,xend=1.5,y=med.vec.cm[1],yend=med.vec.cm[1]), col = "blue") +
  geom_segment(aes(x=1.5,xend=2.5,y=med.vec.cm[2],yend=med.vec.cm[2]), col = "blue") +
  geom_segment(aes(x=2.5,xend=3.5,y=med.vec.cm[3],yend=med.vec.cm[3]), col = "blue") +
  ggtitle("Curvature Estimates Within CMP Network") +
  ylab("Trimmed Curvature") +
  xlab("Midpoint Set")

References

Hoff, Peter D, Adrian E Raftery, and Mark S Handcock. 2002. “Latent Space Approaches to Social Network Analysis.” Journal of the American Statistical Association 97 (460): 1090–98.

Leskovec, Jure, Jon Kleinberg, and Christos Faloutsos. 2007. “Graph evolution.” ACM Transactions on Knowledge Discovery from Data (TKDD) 1 (1). https://doi.org/10.1145/1217299.1217301.

Leskovec, Jure, and Andrej Krevl. 2014. “SNAP Datasets: Stanford Large Network Dataset Collection.” http://snap.stanford.edu/data.

Wilkins-Reeves, Steven, and Tyler McCormick. 2022. “Asymptotically Normal Estimation of Local Latent Network Curvature.” arXiv Preprint arXiv:2211.11673.

About

R package for estimating the LOcal LAtent space curvature under the latent distance model as well as tests for constant curvature.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published