Add support for SLiM simulations on Windows #149

bodkan · 2024-01-18T11:49:08Z

This is inspired by the changes suggested by @GKresearch here.

I manageds to setup Windows 11 in a virtual machine on my Mac. It's not a great development experience but it's something!

The goal is to check if we can support the default means of installation described in Section 2.3.1 of the SLiM manual. Conda-installed SLiM should work as well, except that doesn't support SLiMgui which makes it not the best default target. That said, Conda users will be able specify the path to their slim binary directly via the slim_path= argument of the slim() function.

Speaking about paths -- I still haven't figured out how path management is supposed to work in Windows shell. In fact, it appears that the SLiM installation on Windows described in the SLiM manual is using a non-native non-Windows shell? That's why I'm considering making specification of path to a SLiM binary to be used (be it CLI SLiM or SLiMgui) mandatory on Windows.

Hey @GKresearch, if you're still interested in helping, it would be great if you could test the new version of slendr with Windows support on your computer, after I manage to put everything together.

Thanks for pushing me to do this and sorry again for the huge delay!

Hi Martin, Here is my edit to try and get the slim command working in Windows. The primary changes are with the slim command line request with specific changes to the pathing. I hope this pull request works appropriately, let me know if there are any issues. Couple of notes: I'm on a Windows 11 machine. My slim is held in a conda environment and was downloaded through conda-forge. I call the slim path directly with the "slim_path" option in your slim() R command. I used you setup_env() to get access to the necessary python modules. Other than that, these changes were sufficient to allow me to work through the tutorial that you have on the slendr website. I mention this briefly in notes in the code itself, but I did not test anything regarding the slim gui, this was command line only. Sincerely, GK

bhaller · 2024-01-18T14:00:17Z

In fact, it appears that the SLiM installation on Windows described in the SLiM manual is using a non-native non-Windows shell?

The main way for people to install on Windows nowadays, I think, is the pacman installer, I think? I don't know anything about Windows, but I think that installation route is pretty standard there...?

bodkan · 2024-01-18T14:09:44Z

The main way for people to install on Windows nowadays, I think, is the pacman installer, I think? I don't know anything about Windows, but I think that installation route is pretty standard there...?

Yeah, that's what I understood as well (looks to me that pacman is something like Homebrew on macOS but ¯\(ツ)/¯ I haven't used Windows in 15 years).

What I meant is that I it looked to me that CLI programs installed via pacman appear to be using some sort of bash ported for Windows but not the actual Windows "Command Prompt" (I think that's how it used to be called) or the "newer" Powershell. This is what the colored terminal screenshots in SLiM manual are from.

So this wasn't actually about the installation itself but about running those programs. In any case, it's all good -- this wasn't meant to be a criticism of any kind. Or it wasn't meant to express surprise or anything. Again, I don't use Windows, and have no prior expectations about any of this. :)

bodkan · 2024-01-18T14:15:52Z

Ah, hold on. I can run slim.exe through the good old Windows System Prompt that I remember from my teenage years. :) What threw me off is that the shell spawned by msys2 is a bash shell (not the Windows prompt), and same was true for git, etc. So I assumed something more special is going on with the compilation of programs ported from unix to Windows.

Also, when I said

In fact, it appears that the SLiM installation on Windows described in the SLiM manual is using a non-native non-Windows shell?

I actually meant pacman / msys2 compiled programs in general. But the above reads as I'm talking specifically about SLiM, which wasn't my intention, sorry.

bhaller · 2024-01-18T14:26:15Z

No worries, just checking that you were using pacman not some other thing like the WSL installation procedure, which (I think?) is more fringe.

GKresearch · 2024-01-18T17:23:22Z

Hi @bodkan,

Thank you for looking into this! Yeah, I'm happy to try running the new version when it is ready!

benjeffery · 2024-02-01T15:01:52Z

@bodkan Hi! I assume you're using the latest tskit as some time ago we increased the possible lengths of ragged columns such as metadata to a 64bit int. It may be that on windows the type may still be 32 bit. What length of metadata column causes this issue? You can tell by with len(ts.tables.metadata).

bhaller · 2024-02-01T15:05:31Z

@benjeffery Might it be a good idea to add a bit of code in tskit, at startup or some convenient spot like that, that just checks sizeof(X) for various types and prints an error and exits if they are not what you expect/need? Trivial to do, and would catch a problem like this on an unusual platform that doesn't use the C type sizes you expect...?

bodkan · 2024-02-01T15:32:32Z

Hello @benjeffery, thanks for jumping in.

I assume you're using the latest tskit [...]

I release a new slendr version whenever a new version of tskit or msprime or pyslim is released, so currently it includes tskit 0.5.6.

What length of metadata column causes this issue? You can tell by with len(ts.tables.metadata).

I'll try to get that as soon as I manage to isolate a broken simulation run in a fully reproducible way. I don't have Windows access and debugging this over GitHub actions or inside a super slow virtualized Windows machine has been what I imagine using punchcards in the 70s must've been like... The fact that this doesn't happen (for any given simulation) in every run has made investigating this even more challenging.

I'll be in touch as I get more info on this, thank you.

benjeffery · 2024-02-02T23:37:59Z

@benjeffery Might it be a good idea to add a bit of code in tskit, at startup or some convenient spot like that, that just checks sizeof(X) for various types and prints an error and exits if they are not what you expect/need? Trivial to do, and would catch a problem like this on an unusual platform that doesn't use the C type sizes you expect...?

I haven't had time to dig into this yet - but am surprised we don't have a test for it. Hopefully we can replicate in CI.

bodkan · 2024-02-03T23:18:07Z

So, I have to say I’m not yet 100% convinced it’s really a tskit-on-Windows issue rather than reticulate-on-Windows issue. In fact, so far I'm leaning towards the latter.

I did finally managed to create a tiny test case which produces a tree sequence which reproducibly causes the crash every time. This will hopefully make it possible to really dig deep to determine just what is it that’s causing the problem! From a casual glance I managed yesterday evening before I had to run home it’s something… utterly bizarre.

bodkan · 2024-02-05T12:27:51Z

OK, I don't remember dealing with such an obscure weird issue.

@rdinnager Would you mind testing one more thing for me? I managed to create a tree sequence which always breaks on my Windows virtual machine but I'd like to make sure it also happens on a native Windows system. Interestingly, this happens regardless of whether the tree sequence is simulated with SLiM or msprime.

I attach two tree sequences, output_slim.trees and output_msprime.trees (output_trees.zip). Accessing the metadata element of a ts tree-sequence object crashes for both (so it must be something that they have in common!). Could it be the seed maybe? Have to do more digging. Already getting to an always breaking simulation run was quite a struggle.

First, to eliminate tskit in pure Python from the list of suspects, can you check that this runs without any problems?

# restart R to get a clean environment (and no slendr)

# use Python installed by slendr (but this doesn't really have an effect)
library(reticulate)
use_condaenv("Python-3.12_msprime-1.3.0_tskit-0.5.6_pyslim-1.0.4_tspop-0.0.2") 

repl_python() # open a Python REPL inside R

# paste the following into the Python REPL
import tskit
ts = tskit.load("output_msprime.trees")

ts.metadata          # this works without any issues
ts.metadata["SLiM"]  # this works without any issues

Then, let's run the same code as above but this time in the reticulate interface to Python (this works by embedding a Python process in an R session, otherwise there's no difference vs running pure Python code):

# restart R to get a clean environment (and no slendr)

# use Python installed by slendr (but this doesn't really have an effect)
library(reticulate)
use_condaenv("Python-3.12_msprime-1.3.0_tskit-0.5.6_pyslim-1.0.4_tspop-0.0.2") 

# load a tree sequence directly using the Python/R reticulate interface
tskit <- import("tskit")
ts <- tskit$load("output_msprime.trees")

ts$metadata # this works
ts$metadata # but this breaks!
ts$metadata # this works
ts$metadata # but this breaks! WTF?!?!
ts$metadata # this works
ts$metadata # but this breaks! WTF?!?!
ts$metadata # this works
ts$metadata # but this breaks! WTF?!?!
ts$metadata # this works
ts$metadata # but this breaks! WTF?!?!
ts$metadata # this works
ts$metadata # but this breaks! WTF?!?!
# ... rinse and repeat

So accessing the metadata breaks but only on every other access?!?! What the hell? This must be some weird, super low-level memory problem related to the Python embedding / caching the metadata method call. But I'm totally out of my depth.

@rdinnager Could you please check that you observe the same behavior? (Note that in my VM it happens for both the SLiM and msprime tree sequence). At least then I'll know I'm not chasing ghosts but that the attached tree sequences break also for you in this way. Thank you very much.

Again, for completeness, here's the error in full:

> ts$metadata # but this breaks! WTF?!?!
Error in py_get_attr_impl(x, name, silent) :
  SystemError: <functools._lru_cache_wrapper object at 0x000001D8D7D5BE20> returned a result with an exception set
Run `reticulate::py_last_error()` for details.
> reticulate::py_last_error()

── Python Exception Message ────────────────────────────────────────────
OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
  File "C:\Users\mp\AppData\Local\R-MINI~1\envs\PYTHON~1.2\Lib\site-packages\tskit\trees.py", line 4421, in metadata
return self.metadata_schema.decode_row(self._ll_tree_sequence.get_metadata())
^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mp\AppData\Local\R-MINI~1\envs\PYTHON~1.2\Lib\site-packages\tskit\trees.py", line 4428, in metadata_schema
return metadata_module.parse_metadata_schema(
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    SystemError: <functools._lru_cache_wrapper object at 0x000001D8D7D5BE20> returned a result with an exception set

  ── R Traceback ─────────────────────────────────────────────────────────
  ▆
  1. ├─ts$metadata
  2. └─reticulate:::`$.python.builtin.object`(ts, metadata)
  3.   └─reticulate:::py_get_attr_or_item(x, name, TRUE)
  4.     └─reticulate::py_get_attr(x, name)
  5.       └─reticulate:::py_get_attr_impl(x, name, silent)

Here's a slendr script which generates a broken SLiM and msprime tree sequence. No need to run this, both tree sequences are attached to this message:

# simulate a "broken" tree sequence ---------------------------------------

devtools::load_all() # must be in a checked out slim-on-windows branch of slendr
init_env()

SEED <- 4106790818

p <- population(name = "pop1", N = 700, time = 1)
model <- compile_model(populations = p, generation_time = 30, simulation_length = 1000)

output_slim <- "output_slim.trees"
output_msprime <- "output_msprime.trees"

slim(model, sequence_length = 1e6, recombination_rate = 0, random_seed = SEED, output = output_slim, load = FALSE)
msprime(model, sequence_length = 1e6, recombination_rate = 0, random_seed = SEED, output = output_msprime, load = FALSE)

bodkan · 2024-02-05T12:51:03Z

@bodkan wrote:
Accessing the metadata element of a ts tree-sequence object crashes for both SLiM and msprime tree sequences (so it must be something that they have in common!). Could it be the seed maybe? Have to do more digging. Already getting to an always breaking simulation run was quite a struggle.

Holy crap. It is the seed?!?

devtools::load_all()
init_env(quiet = TRUE)

This always runs:

MAX_INT <-  2147483647 # https://en.wikipedia.org/wiki/2,147,483,647#In_computing
SEED <- MAX_INT

p <- population(name = "pop1", N = 100, time = 1)
model <- compile_model(populations = p, generation_time = 30, simulation_length = 1000)

while (TRUE) { # infinite loop, careful
  cat("*")
  ts <- msprime(model, sequence_length = 1e3, recombination_rate = 0, random_seed = SEED)
}

This always crashes:

MAX_INT <-  2147483647
SEED <- MAX_INT + 1       # <==== NOTE THE +1!

p <- population(name = "pop1", N = 100, time = 1)
model <- compile_model(populations = p, generation_time = 30, simulation_length = 1000)

while (TRUE) { # infinite loop except on Windows
  cat("*")
  ts <- msprime(model, sequence_length = 1e3, recombination_rate = 0, random_seed = SEED)
}

This is the weirdest, craziest bug I've ever encountered, ever. How hilarious that all this frustrating debugging and digging boils down to... a huge-ass number being a little bigger than allowed!

I still have no idea why ts.metadata crashes in reticulated Python on Windows only once every second access (seriously, WTF?!). If it's int overflow, I would expect it to happen every time? But I really have no idea. Clearly this has something to do with the caching of the metadata access method (the primary error I was getting was related to method caching).

@rdinnager, I think you can ignore my request for help in the message above. I'll adjust the range of integers that slendr picks a random seed from and also put a constraint on the range of user-provided random seeds -- for both slim() and msprime() functions.

Then I'll run the unit tests again and see what happens. 🤞

bodkan · 2024-02-05T13:03:56Z

Oh! It doesn't look like slendr is generating a random seed in case the user doesn't provide any. If no random seed is explicitly specified, then slendr lets SLiM or msprime pick their own.

This is interesting because if SLiM tends to provide a wider range of integers as random seeds than msprime, it would explain why it's been only slim()-based unit tests failing this whole time, never msprime unit tests -- which is what threw me off originally as I assumed it's something specific to SLiM tree sequences. Fun!

Given that the error happens someplace slendr has no control over (low-level interface between R/embedded-Python), maybe a compromise (at least intermediate one) is for slendr to actually generate a random seed of its own, restricted to be less than .Machine$integer.max.

bhaller · 2024-02-05T16:01:36Z

Wow, craziness. Nice debugging!

This reverts commit 9d97733.

bodkan · 2024-02-06T11:39:34Z

OK, so all the slendr unit tests are passing across macOS / Linux / Windows with no exceptions. I also verified that all the vignettes (which contain much more ambitious simulations runs) are building on all three systems.

It really was the random seed being bigger than a 32-bit integer causing the intermittent crashes on Windows during parsing of the metadata (in which slendr saves dictionary with some custom metadata, including the seed used). Some very obscure interaction at the level of R-Python interface that's impossible for me to explain because it would require digging into that interface at a C level. I'm sure it would be fun to figure out but I can't put more time into this.

As far as slendr is concerned, this is fixed by making sure that random seeds are less than 2,147,483,647 which makes Windows happy in the current state of things.

Once this is merged, I will put out a new slendr version to release the Windows support to the wild. I'm really happy to have this in because porting the entire unit test suite to Windows required changes to make the code more robust overall.

I'm pretty sure there are still smaller issues here and there but discovering those would require using slendr and it's tskit interface on Windows for more serious work. Not something I can manage on a slow-oh-my-god-why-is-it-so-slow virtualized Windows system with a hilariously low screen resolution (I can't even get a proper fullscreen RStudio view! :)). Those bugs can be taken care of with individual GitHub issues.

What a journey this has been. Thank you everyone for helping out!

GKresearch and others added 3 commits July 26, 2023 14:51

Merge branch 'main' into slim-on-windows

f91dbc5

Add a few changes to the submitted PR

d1e5e5a

bodkan mentioned this pull request Jan 18, 2024

Update engines.R #142

Closed

bodkan and others added 21 commits January 18, 2024 18:58

On Windows, install Python deps from conda

caa8faa

tspop is not on conda, install it through pip

ca589fc

Fix Python env name on GitHub Actions

608ad2a

Remove empty file

0901a68

Take care of backward slashes in paths on Windows

7bc6dcf

Remove the requirement of setting slim_path

237b9ae

Update vignette

3031f39

Normalize output t.s. path in all circumstances

93f9fee

Create slendr Python env even on Windows GitHub Actions

4c14b30

Invert exe condition

085283f

Run SLiM-based unit tests only where SLiM is present

c7d7074

Normalize paths to locations as well

b70b34e

Install SLiM on Windows G.A. server

9d3c615

Specify msys2 shell

72a2a3b

Update msys2 rule

2da0d98

Update msys2 rule

922cd83

Update msys2 rule

5a155e2

Specify .Renviron with path to SLiM on Windows

c57a63a

Fix broken path to .Renviron

fe40649

Fix escape character

2ee4cf3

Install pandoc on Windows

fdb9c49

bodkan added 4 commits February 5, 2024 15:38

If a seed is missing, generate one on slendr's side

ad7d9a0

Update relevant manpages

5c8cb46

Remove skip_if(slim_is_missing()) in unit tests

929bd36

Re-enable running vignettes of Windows

048f229

bodkan added 12 commits February 5, 2024 17:47

Re-render the website

59efbaa

Search for slim or slim.exe in the vignettes 05 and 08"

cc70e1f

Use platform-independent paths in vignette number 10

7d3e598

Install bash using msys2 to render vignette number 8

c685425

Revert "Bump up SLiM version requirement to 4.1"

6d213f2

This reverts commit 9d97733.

Update installation instructions

2627ac1

Hide logging info

7e9eeb9

Try specifying bash engine path manually

85c1ece

Render vignette number 5 only when ADMIXTOOLS is installed

82d75b1

Normalize all output paths in vignette number 8

3c7face

Bump up SLiM version requirement to 4.1

9d97733

Set correct bash path

89f5a1f

bodkan merged commit e9bfaad into main Feb 6, 2024
6 of 8 checks passed

bodkan deleted the slim-on-windows branch February 6, 2024 11:40

bodkan restored the slim-on-windows branch March 21, 2024 13:47

bodkan deleted the slim-on-windows branch March 21, 2024 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for SLiM simulations on Windows #149

Add support for SLiM simulations on Windows #149

bodkan commented Jan 18, 2024

bhaller commented Jan 18, 2024

bodkan commented Jan 18, 2024

bodkan commented Jan 18, 2024 •

edited

bhaller commented Jan 18, 2024

GKresearch commented Jan 18, 2024

benjeffery commented Feb 1, 2024

bhaller commented Feb 1, 2024

bodkan commented Feb 1, 2024

benjeffery commented Feb 2, 2024

bodkan commented Feb 3, 2024

bodkan commented Feb 5, 2024 •

edited

bodkan commented Feb 5, 2024

bodkan commented Feb 5, 2024 •

edited

bhaller commented Feb 5, 2024

bodkan commented Feb 6, 2024 •

edited

Add support for SLiM simulations on Windows #149

Add support for SLiM simulations on Windows #149

Conversation

bodkan commented Jan 18, 2024

bhaller commented Jan 18, 2024

bodkan commented Jan 18, 2024

bodkan commented Jan 18, 2024 • edited

bhaller commented Jan 18, 2024

GKresearch commented Jan 18, 2024

benjeffery commented Feb 1, 2024

bhaller commented Feb 1, 2024

bodkan commented Feb 1, 2024

benjeffery commented Feb 2, 2024

bodkan commented Feb 3, 2024

bodkan commented Feb 5, 2024 • edited

bodkan commented Feb 5, 2024

bodkan commented Feb 5, 2024 • edited

bhaller commented Feb 5, 2024

bodkan commented Feb 6, 2024 • edited

bodkan commented Jan 18, 2024 •

edited

bodkan commented Feb 5, 2024 •

edited

bodkan commented Feb 5, 2024 •

edited

bodkan commented Feb 6, 2024 •

edited