Skip to content

Commit

Permalink
Merge pull request #121 from bodkan/spatial-nonspatial-mixed-models
Browse files Browse the repository at this point in the history
Allow for the possibility to have non-spatial populations in an otherwise spatial model
  • Loading branch information
bodkan committed Feb 2, 2023
2 parents cd505a4 + 3724ca7 commit 57aca52
Show file tree
Hide file tree
Showing 46 changed files with 641 additions and 306 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,8 @@ jobs:
run: |
mkdir ~/R_LIBS
echo "R_LIBS_USER=~/R_LIBS" >> ~/.Renviron
R -e 'install.packages("reticulate", repos = "http://cran.rstudio.com")'
R -e 'deps <- c("msprime==1.2.0", "tskit==0.5.2", "pyslim==1.0"); PYTHON_ENV <- paste(gsub("==", "-", deps), collapse = "_"); reticulate::conda_create(envname = PYTHON_ENV); reticulate::use_condaenv(PYTHON_ENV, required = TRUE); reticulate::conda_install(envname = PYTHON_ENV, packages = c(deps, "pandas"), pip = TRUE)'
R -e 'install.packages("reticulate", repos = "http://cran.rstudio.com"); reticulate::install_miniconda()'
R -e 'deps <- c("msprime==1.2.0", "tskit==0.5.4", "pyslim==1.0.1"); PYTHON_ENV <- paste0("Python-3.11_", paste(gsub("==", "-", deps), collapse = "_")); reticulate::conda_create(envname = PYTHON_ENV, python_version = '3.11'); reticulate::use_condaenv(PYTHON_ENV, required = TRUE); reticulate::conda_install(envname = PYTHON_ENV, packages = c(deps, "pandas"), pip = TRUE)'
- uses: r-lib/actions/setup-pandoc@v2

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ jobs:
run: |
mkdir ~/R_LIBS
echo "R_LIBS_USER=~/R_LIBS" >> ~/.Renviron
R -e 'install.packages("reticulate", repos = "http://cran.rstudio.com")'
R -e 'deps <- c("msprime==1.2.0", "tskit==0.5.2", "pyslim==1.0"); PYTHON_ENV <- paste(gsub("==", "-", deps), collapse = "_"); reticulate::conda_create(envname = PYTHON_ENV); reticulate::use_condaenv(PYTHON_ENV, required = TRUE); reticulate::conda_install(envname = PYTHON_ENV, packages = c(deps, "pandas"), pip = TRUE)'
R -e 'install.packages("reticulate", repos = "http://cran.rstudio.com"); reticulate::install_miniconda()'
R -e 'deps <- c("msprime==1.2.0", "tskit==0.5.4", "pyslim==1.0.1"); PYTHON_ENV <- paste0("Python-3.11_", paste(gsub("==", "-", deps), collapse = "_")); reticulate::conda_create(envname = PYTHON_ENV, python_version = '3.11'); reticulate::use_condaenv(PYTHON_ENV, required = TRUE); reticulate::conda_install(envname = PYTHON_ENV, packages = c(deps, "pandas"), pip = TRUE)'
- uses: r-lib/actions/setup-r@v2
with:
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: slendr
Title: A Simulation Framework for Spatiotemporal Population Genetics
Version: 0.4.0.9000
Version: 0.5.0
Authors@R:
person(given = "Martin",
family = "Petr",
Expand Down
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# slendr (development version)
# slendr 0.5.0

- **<u>Minor breaking change!</u> Python environments of _slendr_ are no longer automatically activated upon calling `library(slendr)`! Using the coalescent _msprime_ back end and _slendr_'s tree-sequence functions now requires making an explicit call to a new function `init_env()` after `library(slendr)` is executed.** (PR [#102](https://github.com/bodkan/slendr/pull/118))

Expand All @@ -12,8 +12,12 @@ Splitting the Python virtual environment activation step into its own `init_env(

So, to recap: `library(slendr)` no longer activates _slendr_'s isolated Python virtual environment. In order to simulate tree sequences and analyse them using its interface to _tskit_, it is necessary to call `init_env()`. This function performs the same Python-activation steps that `library(slendr)` used to call automagically in earlier _slendr_ versions. No other change to your scripts is necessary.

- Related to the previous point: _slendr_ now requires Python 3.11, msprime 1.2.0, tskit 0.5.4, and pyslim 1.0.1, to keep up with recent releases of its Python dependencies. Again, this presents no hassle to the user, and the only thing required is re-running `setup_env()`. (PR [#112](https://github.com/bodkan/slendr/pull/121)).

- When a named list is provided as a `sample_sets =` argument to a oneway statistic function, the names are used in a `set` column of the resulting data frame even if only single samples were used. ([#2a6781](https://github.com/bodkan/slendr/commit/2a6781))

- It is now possible to have non-spatial populations in an otherwise spatial model. Of course, when plotting such models on a map, only spatial components of the model will be plotted and _slendr_ will give a warning. To be absolutely sure that users intends to do this, _slendr_ will also give a warning when running `compile_model()` on models like this. Please consider this option experimental for the time-being as it is hard to predict which edge cases might break because of this (all unit tests and documentation tests are passing though). Feedback is more than welcome. (PR [#112](https://github.com/bodkan/slendr/pull/121)).

- It is now possible to label groups of samples in _slendr_'s _tskit_ interface functions which should make data frames with statistics results more readable. As an example, running `ts_f3(ts, A = c("p1_1", "p1_2", "p1_3"), B = c("p2_1", "p2_3"), C = c("p3_1", "p3_2", "p3_"))` resulted in a following data-frame output:

```
Expand Down
41 changes: 29 additions & 12 deletions R/compilation.R
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,29 @@ compile_model <- function(populations, generation_time, path = NULL, resolution
if (is.null(slim_script))
slim_script <- system.file("scripts", "script.slim", package = "slendr")

map <- get_map(populations[[1]])
if (!is.null(map) && is.null(resolution))
# get values of all map attributes across populations
maps <- lapply(populations, get_map) %>% Filter(Negate(is.null), .)

if (length(unique(maps)) > 1)
stop("Multiple spatial maps detected across populations but only a single\n",
"world map is allowed for spatial models.", call. = FALSE)
else if (length(unique(maps)) == 1)
map <- maps[[1]]
else
map <- NULL

if (!is.null(map) && length(maps) != length(populations))
warning("Model containing a mix of spatial and non-spatial populations will be compiled.\n",
"Although this is definitely supported, make sure this is really what you want.",
call. = FALSE)

if (!is.null(map) && is.null(resolution) && serialize)
stop("A map resolution must be specified for spatial models", call. = FALSE)

if (!serialize && is.null(path) && inherits(map, "slendr_map")) {
stop("Spatial models must be serialized to disk for SLiM to simulate data from",
call. = FALSE)
warning("Spatial models must be serialized to disk for SLiM to simulate data from.\n",
"Compiled like this, your model can only be simulated with msprime.",
call. = FALSE)
}
if (serialize && is.null(path))
path <- tempfile()
Expand Down Expand Up @@ -120,9 +136,6 @@ compile_model <- function(populations, generation_time, path = NULL, resolution

if (is.data.frame(gene_flow)) gene_flow <- list(gene_flow)

if (length(unique(sapply(populations, has_map))) != 1)
stop("Populations must be either all spatial or non-spatial, but not both", call. = FALSE)

# make sure all populations share the same direction of time
time_dir <- setdiff(unique(sapply(populations, time_direction)), "unknown")

Expand Down Expand Up @@ -157,7 +170,7 @@ setting `direction = 'backward'.`", call. = FALSE)
admix_table <- compile_geneflows(gene_flow, split_table, generation_time, time_dir, end_time)
resize_table <- compile_resizes(populations, generation_time, time_dir, end_time, split_table)

if (inherits(map, "slendr_map")) {
if (serialize && inherits(map, "slendr_map")) {
if (!is.null(competition)) check_resolution(map, competition)
if (!is.null(mating)) check_resolution(map, mating)
if (!is.null(dispersal)) check_resolution(map, dispersal)
Expand Down Expand Up @@ -915,12 +928,13 @@ compile_maps <- function(populations, split_table, resolution, generation_time,
map_table <- map_table[!duplicated(map_table[, c("pop", "time_gen")]), ]
map_table$path <- seq_len(nrow(map_table)) %>% paste0(., ".png")

# maps of ancestral populations have to be set in the first generation,
# regardless of the specified split time
# maps of ancestral populations (that is, those that are spatial) have to
# be set in the first generation, regardless of the specified split time
ancestral_pops <- split_table[split_table$parent == "__pop_is_ancestor", ]$pop
ancestral_maps <- purrr::map(ancestral_pops, ~ which(map_table$pop == .x)) %>%
purrr::map_int(~ .x[1])
map_table[ancestral_maps, ]$time_gen <- 1
if (any(!is.na(ancestral_maps)))
map_table[ancestral_maps[!is.na(ancestral_maps)], ]$time_gen <- 1

map_table
}
Expand Down Expand Up @@ -1044,7 +1058,10 @@ compile_dispersals <- function(populations, generation_time, direction,

# Render population boundaries to black-and-white spatial maps
render <- function(pops, resolution) {
raster_list <- lapply(pops, function(pop) {
# process only populations which have a map
spatial_pops <- Filter(has_map, pops)

raster_list <- lapply(spatial_pops, function(pop) {
# iterate over temporal maps for the current population
snapshots <- lapply(unique(pop$time), function(t) {
snapshot <- pop[pop$time == t, ]
Expand Down
44 changes: 29 additions & 15 deletions R/interface.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,16 +55,22 @@ population <- function(name, time, N, parent = NULL, map = FALSE,
if (!is.character(name) || length(name) != 1)
stop("A population name must be a character scalar value", call. = FALSE)

# is this the first population defined in the model?
if (is.null(parent)) {
if (!is.logical(map) && !inherits(map, "slendr_map"))
stop("Ancestral population must specify its 'map'", call. = FALSE)
# if this population splits from a parental population, check that the parent
# really exists and that the split time make sense given the time of appearance
# of the parental population
if (!is.null(parent)) {
if (!inherits(parent, "slendr_pop"))
stop("Only slendr_pop objects can represent parental populations", call. = FALSE)
else
map <- map
} else {
check_split_time(time, parent)
map <- attr(parent, "map")
check_split_time(time, parent)
}

if (!is.logical(map) && !inherits(map, "slendr_map"))
stop("A simulation landscape must be an object of the class slendr_map", call. = FALSE)

if (!is.null(parent) && is.logical(map) && map == FALSE)
map <- attr(parent, "map")

time <- as.integer(time)
if (time < 1) stop("Split time must be a non-negative integer number", call. = FALSE)
N <- as.integer(N)
Expand Down Expand Up @@ -570,7 +576,15 @@ gene_flow <- function(from, to, rate, start, end, overlap = TRUE) {
if ((has_map(from) && !has_map(to)) || (!has_map(from) && has_map(to)))
stop("Both or neither populations must be spatial", call. = FALSE)

# make sure both participating populations are present at the start of the
# gene flow event (`check_present_time()` is reused from the sampling functionality)
# check_present_time(start, from, offset = 0)
# check_present_time(end, from, offset = 0)
# check_present_time(start, to, offset = 0)
# check_present_time(end, to, offset = 0)

# make sure the population is not removed during the the admixture period
# (note that this will only be relevant for SLiM simulations at this moment)
check_removal_time(start, from)
check_removal_time(end, from)
check_removal_time(start, to)
Expand Down Expand Up @@ -1327,23 +1341,23 @@ setup_env <- function(quiet = FALSE, agree = FALSE, pip = NULL) {
if (!dir.exists(reticulate::miniconda_path()))
reticulate::install_miniconda()

reticulate::conda_create(envname = PYTHON_ENV)
reticulate::use_condaenv(PYTHON_ENV, required = TRUE)

# parse the Python env name back to the list of dependencies
# (the environment is defined in .onAttach(), and this makes sure the
# dependencies are defined all in one place)
version_deps <- PYTHON_ENV %>% gsub("-", "==", .) %>% strsplit("_") %>% .[[1]]
other_deps <- "pandas"
deps <- c(version_deps, other_deps)
versions <- PYTHON_ENV %>% gsub("-", "==", .) %>% strsplit("_") %>% .[[1]]
python_version <- gsub("Python==", "", versions[1])
package_versions <- c(versions[-1], "pandas")

reticulate::conda_create(envname = PYTHON_ENV, python_version = python_version)
reticulate::use_condaenv(PYTHON_ENV, required = TRUE)

# msprime/tskit conda dependency is broken on M1 Mac architecture, fallback
# to pip in cases like this (otherwise use conda to avoid any potential
# compilation issues such as missing libgsl)
if (is.null(pip))
pip <- all(Sys.info()[c("sysname", "machine")] == c("Darwin", "arm64"))

reticulate::conda_install(envname = PYTHON_ENV, packages = deps, pip = pip)
reticulate::conda_install(envname = PYTHON_ENV, packages = package_versions, pip = pip)

if (!quiet) {
message("======================================================================")
Expand Down
22 changes: 10 additions & 12 deletions R/tree-sequences.R
Original file line number Diff line number Diff line change
Expand Up @@ -2258,7 +2258,7 @@ get_pyslim_table_data <- function(ts, simplify_to = NULL) {
# add numeric node IDs to each individual
combined <-
dplyr::bind_rows(sampled, not_sampled) %>%
dplyr::right_join(nodes, by = "ind_id") %>%
dplyr::right_join(nodes, by = "ind_id", multiple = "all") %>% # required after dplyr v1.1.0
dplyr::mutate(time = ifelse(is.na(ind_id), time.y, time.x),
time_tskit = time_tskit.y,
sampled = ifelse(is.na(ind_id), FALSE, sampled)) %>%
Expand Down Expand Up @@ -2361,7 +2361,7 @@ get_tskit_table_data <- function(ts, simplify_to = NULL) {

# add numeric node IDs to each individual
combined <- dplyr::select(individuals, -time, -pop_id) %>%
dplyr::right_join(nodes, by = "ind_id") %>%
dplyr::right_join(nodes, by = "ind_id", multiple = "all") %>% # required after dplyr v1.1.0
dplyr::rename(pop = pop.y, time_tskit = time_tskit.y)

# for tree sequences which have some individuals/nodes marked as 'sampled', mark the
Expand Down Expand Up @@ -2422,7 +2422,10 @@ get_annotated_edges <- function(x) {
dplyr::select(parent_pop = pop,
parent_node_id = node_id,
parent_time = time, parent_location = location)
parent_nodes <- dplyr::left_join(parent_nodes, edges, by = stats::setNames("parent", join1)) %>%
parent_nodes <- dplyr::left_join(
parent_nodes, edges,
by = stats::setNames("parent", join1), multiple = "all"
) %>% # multiple = "all" required after dplyr v1.1.0
dplyr::arrange(!!join1)

# take the `parent_nodes` able above and do another join operation, this time
Expand All @@ -2438,17 +2441,12 @@ get_annotated_edges <- function(x) {
dplyr::select(child_pop = pop,
child_node_id = node_id,
child_time = time, child_location = location)
edge_nodes <- dplyr::inner_join(edge_nodes, parent_nodes, by = stats::setNames("child", join2)) %>%
edge_nodes <- dplyr::inner_join(
edge_nodes, parent_nodes,
by = stats::setNames("child", join2)
) %>%
dplyr::arrange(!!join2)

# data %>%
# dplyr::filter(phylo_id %in% edges$child) %>%
# dplyr::select(child_pop = pop,
# child_phylo_id = phylo_id, child_node_id = node_id,
# child_time = time, child_location = location) %>%
# dplyr::inner_join(parent_nodes, by = c("child_phylo_id" = "child")) %>%
# dplyr::arrange(child_phylo_id)

# transforming individual child/parent location columns (type POINT) into a
# line (type LINESTRING)
if (spatial) {
Expand Down
12 changes: 11 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -213,12 +213,22 @@ get_geneflows <- function(model, time) {
after_op <- `<=`
}

geneflows <- subset(model$geneflow, before_op(tstart_orig, time) & after_op(tend_orig, time))
if (!is.null(time))
geneflows <- subset(model$geneflow, before_op(tstart_orig, time) & after_op(tend_orig, time))
else
geneflows <- model$geneflow

geneflows$from <- factor(geneflows$from, levels = pop_names)
geneflows$to <- factor(geneflows$to, levels = pop_names)

migr_coords <- lapply(seq_len(nrow(geneflows)), function(row_i) {

# if time point was not provided, simply take the midpoint of the current
# gene-flow event
if (is.null(time)) {
time <- geneflows[row_i, c("tstart_orig", "tend_orig")] %>% as.numeric() %>% mean()
}

from <- model$populations[pop_names == geneflows[row_i, ]$from][[1]] %>%
.[before_op(.$time, time), ] %>%
.[nrow(.), ]
Expand Down
11 changes: 8 additions & 3 deletions R/visualization.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ plot_map <- function(..., time = NULL, gene_flow = FALSE,
stop("'slendr_region' and 'slendr_pops' object cannot be plotted at once", call. = FALSE)

if (gene_flow & (is.null(time) | !inherits(args[[1]], "slendr_model")))
stop("Migrations can be visualized only when a time point and a 'slendr_model'
objects are specified", call. = FALSE)
warning("All gene-flow event will be visualized at once. If you wish to visualize\n",
"gene flows at a particular point in time, use the `time` argument.", call. = FALSE)

# a single model object was provided
if (length(args) == 1 & inherits(args[[1]], "slendr_model")) {
Expand Down Expand Up @@ -146,7 +146,12 @@ objects are specified", call. = FALSE)
pop_maps <- pops
}

if (intersect) pop_maps <- lapply(pop_maps, intersect_features)
if (intersect) {
intersected_maps <- pop_maps %>% Filter(has_map, .) %>% lapply(intersect_features)
if (length(intersected_maps) != length(pop_maps))
warning("Non-spatial populations in your model won't be visualized", call. = FALSE)
pop_maps <- intersected_maps
}

pop_maps <- do.call(rbind, pop_maps)
pop_maps$pop <- factor(pop_maps$pop, levels = pop_names)
Expand Down
28 changes: 14 additions & 14 deletions R/zzz.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@ msp <- NULL
# define slendr's required Python dependencies and compose an environment name
# that will be used specifically for them
PYTHON_ENV <-
c("msprime==1.2.0", "tskit==0.5.2", "pyslim==1.0") %>%
c("msprime==1.2.0", "tskit==0.5.4", "pyslim==1.0.1") %>%
gsub("==", "-", .) %>%
paste(collapse = "_")
paste(collapse = "_") %>%
paste0("Python-3.11_", .)

.onAttach <- function(libname, pkgname) {
# check for presence of the slim binary in user's PATH and display
Expand Down Expand Up @@ -43,25 +44,24 @@ PYTHON_ENV <-
if (!getOption("slendr.custom_env")) {
version <- strsplit(PYTHON_ENV, "_")[[1]] %>% gsub(".*-", "", .)
packageStartupMessage(
sprintf(paste0("A slendr Python environment with the necessary versions of msprime (%s),\n",
"tskit (%s), and pyslim (%s) has not been found.\n"), version[1], version[2], version[3]),
"\nYou can setup a pre-configured environment with all of slendr's Python\n",
"dependencies automatically by running the function setup_env()."
sprintf(paste0("A slendr Python (%s) environment with the necessary versions of\n",
"msprime (%s), tskit (%s), and pyslim (%s) has not been found.\n"),
version[1], version[2], version[3], version[4]),
"\nYou can setup a pre-configured environment with all of slendr's Python\n",
"dependencies automatically by running the function setup_env()."
)
}
}

packageStartupMessage(
"=======================================================================\n",
"NOTE: Due to frequent issues with some user's Python setups, slendr no\n",
"longer activates its Python environment automatically upon calling\n",
"library(slendr).\n\n",
"NOTE: Due to Python setup issues on some systems which have been\n",
"causing trouble particularly for novice users, calling library(slendr)\n",
"no longer activates slendr's Python environment automatically.\n\n",
"In order to use slendr's msprime back end or its tree-sequence\n",
"functionality, users now must activate slendr's Python environments\n",
"manually by calling init_env().\n\n",
"This inconvenience is a compromise in order to help novice users avoid\n",
"having to debug very technical, low-level Python-specific issues. This\n",
"note will be removed in a future version of slendr.",
"functionality, users must now activate slendr's Python environment\n",
"manually by executing init_env() after calling library(slendr).\n\n",
"(This note will be removed in the next major version of slendr.)",
"\n======================================================================="
)
}
Expand Down
Loading

0 comments on commit 57aca52

Please sign in to comment.