Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover block memberships with dcsbm? #35

Closed
eapower opened this issue Aug 14, 2023 · 4 comments
Closed

Recover block memberships with dcsbm? #35

eapower opened this issue Aug 14, 2023 · 4 comments

Comments

@eapower
Copy link

eapower commented Aug 14, 2023

Hi! Thanks for your work on the package!
I was hoping to use the directed_dcsbm function to generate simple degree-corrected SBMs. I've run into some issues surrounding the block memberships.

At very least, I would like to be able to recover each node's membership as we move from the factor model to the sampled network. I know that the factor model given as input has parameters z_in and z_out, but as the samples that result from that (I'm using sample_igraph) don't necessarily have the same number of nodes, there can't be a direct mapping of those vectors to what we get in any graph drawing from that.

I'm also specifying a vector of block probabilities, but their ordering don't seem to carry through to the factor model. I'm not sure what that implies for the use with the block matrix. Code to demonstrate that while the blocks are sized roughly in agreement with what's given to pi_in or pi_out, they don't appear in the same order in z_in or z_out:

set.seed(32)

bm <- as.matrix(cbind(
  c(.3, .005, .005, .005, .005),
  c(.002, .3, .005, .005, .005),
  c(.002, .01, .3, .005, .005),
  c(.002, .01, .005, .2, .005),
  c(.002, .005, .005, .005, .2)
))

pi <- c(5, 50, 20, 25, 100)

sbm <- fastRG::directed_dcsbm(n = 200, 
                              B = bm,
                              pi_in = pi,
                              pi_out = pi,
                              expected_out_degree = 3,
                              allow_self_loops = FALSE,
                              sort_nodes = TRUE)

net <- fastRG::sample_igraph(sbm)

## the order of the blocks as given for the block probabilities don't align with the order of the block memberships in the factor model:
pi
table(sbm$z_in)

All of this means that I'm unsure of the block memberships of each node, and whether it's appropriately aligning with the block matrix given in input. (In a perfect world, the igraph object that's created would have vertex attributes that are the block memberships).

Ideally, I'd yet further be able to specify that z_in == z_out. Even if I specify both p_in and p_out, they don't align (clear with table(sbm$z_in, sbm$z_out)).

Let me know if these issues are unclear! Thanks again!

@alexpghayes
Copy link
Collaborator

We reorder pi_in and pi_out so that there blocks are arranged from smallest to largest, at https://github.com/RoheLab/fastRG/blob/main/R/directed_dcsbm.R#L440.

It sounds like you may also want the undirected version of the sbm, in which case you should use dcsbm() rather than directed_dcsbm().

If you would like to combine the latent information in your sbm object with the sample graph in net, you should be able to do that as follows.

set.seed(32)

bm <- as.matrix(cbind(
  c(.3, .005, .005, .005, .005),
  c(.002, .3, .005, .005, .005),
  c(.002, .01, .3, .005, .005),
  c(.002, .01, .005, .2, .005),
  c(.002, .005, .005, .005, .2)
))

pi <- c(5, 50, 20, 25, 100)

sbm <- fastRG::directed_dcsbm(
  n = 200,
  B = bm,
  pi_in = pi,
  pi_out = pi,
  expected_out_degree = 3,
  allow_self_loops = FALSE,
  sort_nodes = TRUE
)
#> Generating random degree heterogeneity parameters `theta_in` and `theta_out` from LogNormal(2, 1) distributions. This distribution may change in the future. Explicitly set `theta_in` and `theta_out` for reproducible results.

net <- fastRG::sample_igraph(sbm)

## the order of the blocks as given for the block probabilities don't align with the order of the block memberships in the factor model:
pi
#> [1]   5  50  20  25 100
table(sbm$z_in)
#> 
#>   1   2   3   4   5 
#>   3  25  24  47 101

net |>
  igraph::set_vertex_attr("in_block", value = sbm$z_in)
#> Error in i_set_vertex_attr(graph = graph, name = name, value = value, : Length of new attribute value must be 1 or 156, the number of target vertices, not 200

Created on 2023-08-14 with reprex v2.0.2

However, it looks like there is an issue creating the igraph object here. I'll fix this and ping again here once things are working as expected.

alexpghayes added a commit that referenced this issue Aug 14, 2023
@alexpghayes
Copy link
Collaborator

Fixed. You'll need to update to the dev version with

remotes::install_github("RoheLab/fastRG")

Then I think you'll want something like the following

set.seed(32)

bm <- as.matrix(cbind(
  c(.3, .005, .005, .005, .005),
  c(.002, .3, .005, .005, .005),
  c(.002, .01, .3, .005, .005),
  c(.002, .01, .005, .2, .005),
  c(.002, .005, .005, .005, .2)
))

pi <- c(5, 50, 20, 25, 100)

latent <- fastRG::dcsbm(
  n = 200,
  B = bm,
  pi = pi,
  expected_degree = 3,
  allow_self_loops = FALSE,
  sort_nodes = TRUE,
  poisson_edges = FALSE     # my guess is that you want this! would read the documentation about this carefully!
)
#> Generating random degree heterogeneity parameters `theta` from a LogNormal(2, 1) distribution. This distribution may change in the future. Explicitly set `theta` for reproducible results.


ig <- fastRG::sample_igraph(latent)

# node orders between `latent` and `ig` object will match up :)
ig_with_block <- ig |>
  igraph::set_vertex_attr("block", value = latent$z)

igraph::V(ig_with_block)$block
#>   [1] block1 block1 block1 block1 block1 block1 block1 block1 block2 block2
#>  [11] block2 block2 block2 block2 block2 block2 block2 block2 block2 block2
#>  [21] block2 block2 block2 block2 block2 block2 block2 block2 block2 block2
#>  [31] block2 block2 block3 block3 block3 block3 block3 block3 block3 block3
#>  [41] block3 block3 block3 block3 block3 block3 block3 block3 block3 block3
#>  [51] block3 block3 block4 block4 block4 block4 block4 block4 block4 block4
#>  [61] block4 block4 block4 block4 block4 block4 block4 block4 block4 block4
#>  [71] block4 block4 block4 block4 block4 block4 block4 block4 block4 block4
#>  [81] block4 block4 block4 block4 block4 block4 block4 block4 block4 block4
#>  [91] block4 block4 block4 block4 block4 block4 block4 block4 block4 block4
#> [101] block4 block4 block4 block4 block4 block4 block4 block4 block4 block4
#> [111] block4 block4 block4 block5 block5 block5 block5 block5 block5 block5
#> [121] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [131] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [141] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [151] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [161] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [171] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [181] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> [191] block5 block5 block5 block5 block5 block5 block5 block5 block5 block5
#> Levels: block1 block2 block3 block4 block5

Again note that the blocks are ordered by block size

Created on 2023-08-14 with reprex v2.0.2

@eapower
Copy link
Author

eapower commented Aug 15, 2023

Thanks for the quick and helpful reply! Just to be clear: imagine you have two blocks of the same size, but very different entries in the block matrix: how could you be sure that you're associating the right block with the right entries in the block matrix? e.g., if you run the above with set.seed(34), you get block 2 and 3 differing by one node (see with table(latent$z)) -- I can't be sure which block matrix entries those really align with, given the sorting of blocks by size.

Also, with set.seed(32) as original, table(latent$z) implies that blocks aren't quite sorted by size?

(Finally, I do actually want a directed network to result (and the block matrix given wasn't symmetric), so is there any chance of setting z_in == z_out?)

@alexpghayes alexpghayes reopened this Aug 15, 2023
@alexpghayes
Copy link
Collaborator

imagine you have two blocks of the same size, but very different entries in the block matrix: how could you be sure that you're associating the right block with the right entries in the block matrix?

When we sort pi vectors, we also reorder the rows and columns of B to align with sorted pi. See https://github.com/RoheLab/fastRG/blob/main/R/directed_dcsbm.R#L433, for example. I will add some documentation about this.

Also, with set.seed(32) as original, table(latent$z) implies that blocks aren't quite sorted by size?

Right, they're sorted by expected size.

(Finally, I do actually want a directed network to result (and the block matrix given wasn't symmetric), so is there any chance of setting z_in == z_out?)

Yes, although it's a little hacky.

set.seed(32)

bm <- as.matrix(cbind(
  c(.3, .005, .005, .005, .005),
  c(.002, .3, .005, .005, .005),
  c(.002, .01, .3, .005, .005),
  c(.002, .01, .005, .2, .005),
  c(.002, .005, .005, .005, .2)
))

pi <- c(5, 50, 20, 25, 100)

# note: this is a Poisson DCSBM, rather than a Bernoulli DCSBM
latent <- fastRG::directed_dcsbm(
  n = 200,
  B = bm,
  pi_in = pi,
  pi_out = pi,
  expected_out_degree = 3,
  allow_self_loops = FALSE,
  sort_nodes = TRUE
)
#> Generating random degree heterogeneity parameters `theta_in` and `theta_out` from LogNormal(2, 1) distributions. This distribution may change in the future. Explicitly set `theta_in` and `theta_out` for reproducible results.

# for sampling to work as expected, all you need is this, which forces
# blocks and degree-correction parameters to match across incoming and
# outgoing blocks
latent$Y <- latent$X

# fix meta-data
latent$theta_out <- latent$theta_in
latent$z_out <- latent$z_in
latent$pi_out <- latent$pi_in

ig <- fastRG::sample_igraph(latent)

# node orders between `latent` and `ig` object will match up :)
ig_with_block <- ig |>
  igraph::set_vertex_attr("block", value = latent$z_in)

igraph::V(ig_with_block)$block
#>   [1] 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3
#>  [38] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>  [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5
#> [112] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> [149] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> [186] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> Levels: 1 2 3 4 5

Created on 2023-08-15 with reprex v2.0.2

alexpghayes added a commit that referenced this issue Aug 16, 2023
alexpghayes added a commit that referenced this issue Aug 16, 2023
- See discussion in #35
- Also flip incoming and outgoing blocks, such that X now contains info about outgoing blocks and Y now contains info about incoming blocks, as you would expected if A[i, j] encodes an edge from node i to node j
- Update NEWS accordingly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants