Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There's an issue with saving and/or loading a simplified tree sequence (carrying slendr metadata) #136

Closed
bodkan opened this issue May 4, 2023 · 1 comment

Comments

@bodkan
Copy link
Owner

bodkan commented May 4, 2023

Let's encode the following trivial model:

model <-
  population("pop", time = 1, N = 100) %>%
  compile_model(generation_time = 1, simulation_length = 1000, direction = "forward")

Simulate a tree sequence from it, save that tree sequence to disk:

ts <- slim(model, sequence_length = 1000, recombination_rate = 0)
ts_save(ts, "/tmp/big.trees")

So far so good.

  • Loading the tree sequence (plus model data) works:
> ts_load("/tmp/big.trees", model)

╔════════════════════════╗
║TreeSequence            ║
╠═══════════════╤════════╣
║Trees          │       1║
╟───────────────┼────────╢
║Sequence Length│    1000║
╟───────────────┼────────╢
║Time Units     │   ticks║
╟───────────────┼────────╢
║Sample Nodes   │     200║
╟───────────────┼────────╢
║Total Size     │95.3 KiB║
╚═══════════════╧════════╝
╔═══════════╤════╤════════╤════════════╗
║Table      │Rows│Size    │Has Metadata║
╠═══════════╪════╪════════╪════════════╣
║Edges      │ 366│11.4 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Individuals│ 256│26.8 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Migrations │   0│ 8 Bytes│          No║
╟───────────┼────┼────────┼────────────╢
║Mutations  │   0│ 1.2 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Nodes      │ 367│14.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Populations│   1│ 2.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Provenances│   1│33.9 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Sites      │   0│16 Bytes│          No║
╚═══════════╧════╧════════╧════════════╝
  • Simplifying the tree sequence and then loading it also works:
> ts %>% ts_simplify() %>% ts_save("/tmp/small1.trees")
> ts_load("/tmp/small1.trees", model)
╔════════════════════════╗
║TreeSequence            ║
╠═══════════════╤════════╣
║Trees          │       1║
╟───────────────┼────────╢
║Sequence Length│    1000║
╟───────────────┼────────╢
║Time Units     │   ticks║
╟───────────────┼────────╢
║Sample Nodes   │     200║
╟───────────────┼────────╢
║Total Size     │95.6 KiB║
╚═══════════════╧════════╝
╔═══════════╤════╤════════╤════════════╗
║Table      │Rows│Size    │Has Metadata║
╠═══════════╪════╪════════╪════════════╣
║Edges      │ 365│11.4 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Individuals│ 255│26.7 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Migrations │   0│ 8 Bytes│          No║
╟───────────┼────┼────────┼────────────╢
║Mutations  │   0│ 1.2 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Nodes      │ 366│14.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Populations│   1│ 2.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Provenances│   2│34.5 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Sites      │   0│16 Bytes│          No║
╚═══════════╧════╧════════╧════════════╝
  • Simplifying the tree sequence down to a specific subset of individuals and then loading it doesn't work. Boo.
> ts %>% ts_simplify(simplify_to = c("pop_1", "pop_2", "pop_3")) %>% ts_save("/tmp/small2.trees")
> ts_load("/tmp/small2.trees", model)
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 3) to match `..2` (size 100).
Run `rlang::last_trace()` to see where the error occurred.

The error suggests an issue with some join-like operation on some tree sequence table, although I'd think the tree sequence itself should carry every piece of slendr-relevant metadata along side the standard tskit data? Well, clearly not.

  • Interestingly, loading the small tree sequence without the slendr metadata works without problems. So something is clearly messed up with slendr-specific information not being passed through when a simplified tree sequence is saved.
> ts_load("/tmp/small2.trees")
╔════════════════════════╗
║TreeSequence            ║
╠═══════════════╤════════╣
║Trees          │       1║
╟───────────────┼────────╢
║Sequence Length│    1000║
╟───────────────┼────────╢
║Time Units     │   ticks║
╟───────────────┼────────╢
║Sample Nodes   │       6║
╟───────────────┼────────╢
║Total Size     │44.5 KiB║
╚═══════════════╧════════╝
╔═══════════╤════╤═════════╤════════════╗
║Table      │Rows│Size     │Has Metadata║
╠═══════════╪════╪═════════╪════════════╣
║Edges      │  10│328 Bytes│          No║
╟───────────┼────┼─────────┼────────────╢
║Individuals│   8│  2.6 KiB│         Yes║
╟───────────┼────┼─────────┼────────────╢
║Migrations │   0│  8 Bytes│          No║
╟───────────┼────┼─────────┼────────────╢
║Mutations  │   0│  1.2 KiB│          No║
╟───────────┼────┼─────────┼────────────╢
║Nodes      │  11│  1.1 KiB│         Yes║
╟───────────┼────┼─────────┼────────────╢
║Populations│   1│  2.3 KiB│         Yes║
╟───────────┼────┼─────────┼────────────╢
║Provenances│   2│ 34.5 KiB│          No║
╟───────────┼────┼─────────┼────────────╢
║Sites      │   0│ 16 Bytes│          No║
╚═══════════╧════╧═════════╧════════════╝
@bodkan
Copy link
Owner Author

bodkan commented May 4, 2023

Yeah, the population had 100 individuals to start with. Simple simplification does not change the number of "sampled individuals" (still 100). But the subsetting of individuals changes the number of sampled individuals (in the code above down to 3), which matches the error from the join operation:

! Can't recycle ..1 (size 3) to match ..2 (size 100).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant