Changing the way tree-sequence outputs are handled by default #100

bodkan · 2022-07-20T11:10:19Z

The CRAN submission process has progressed smoothly so far. slendr got lots of great feedback, especially with regards to documentation which is now much nicer and more robust than before.

So far so good.

Unfortunately, CRAN folks pointed out that the current way handling of tree-sequence output files in slendr is incompatible with CRAN policies. In fact, the way the outputs have been handled is explicitly illegal as far as CRAN is concerned.

The problem is with how storing tree sequence outputs has been automatically handled so far. Briefly, for convenience reasons, unless a full path to a tree-sequence output was provided to slim() or msprime() interface functions, outputs were by default saved to a user-defined model directory together with other slendr files.

Example:

# … imagine code defining population dynamics here …

model <- compile_model(path = <…>, …)

# unless an output path is explicitly specified, the following automatically stores the tree sequence in
# the path specified above together with the rest of the model files
slim(model, sequence_length = 1e6, recombination_rate = 1e-8)

# this allows a tree sequence to be loaded simply by without having to explicitly deal with file paths
ts <- ts_load(model) %>% ts_recapitate(...) %>% ts_simplify()

# … proceed with doing fun tree-sequence things …

This is all beautiful, except the fact that CRAN considers it illegal to write anything anywhere unless it is into /tmp/ (or equivalent on different OS). And no, it does not matter that the model directory is often in a /tmp/ itself, or that the interface has been designed with the model object and its associated directory being the central component of the simulation, where tree sequences are put by default. Upon reflection, although this seems to be too strict at first (plenty of Google issues related to this), it is entirely reasonable position to take from the perspective of CRAN.

This has forced me to change the way tree sequence outputs are handled. Not in a dramatic way, but still in a way that’s backwards incompatible with the way I’ve been doing things since the very beginning. I’m glad I discovered this before submitting the paper, doing this during the revisions would be a huge annoyance for… basically everybody.

Here's the new behavior that hopefully adheres to CRAN's super strict -- and, in hindsight, extremely reasonable -- policies:

# simulation function automatically loads a tree-sequence file (which is by default saved
# to a temporary location)
ts <- slim(model, sequence_length = 1e6, recombination_rate = 1e-8)

# … proceed with doing fun tree-sequence things on the generated `ts` object …

Alternatively, one can also do:

# save the tree sequence to a custom location
output_path <- "path/to/my/treesequence.trees"
ts <- slim(model, sequence_length = 1e6, recombination_rate = 1e-8, output = output_path)

# load and process the output basically in the same way as before
ts <- ts_load(file = output_path, model) %>% ts_recapitate(...) %>% ts_simplify()

Or even this (which is closest to the way things have been done in the past):

# save the tree sequence to a custom location but don't automatically load it afterwards
slim(model, sequence_length = 1e6, recombination_rate = 1e-8, output = "path/to/my/treesequence.trees", load = FALSE)

# load and process the output basically in the same way as before
ts <- ts_load(file = "path/to/my/treesequence.trees", model) %>% ts_recapitate(...) %>% ts_simplify()

Pinging people who have been involved in slendr development or have been using slendr for their work: @bhaller, @petrelharp, @mkiravn, @FerRacimo, @MoiColl, @dangliu, @awohns, @jaurbanChicago, @archgen, @emmaprantoni. Once the new version is merged to the main, your code might need some changes in case you update at some point.

Once the automated GitHub Actions tests pass, I will merge the PR and resubmit slendr to CRAN. I will ping you here, then you'll be able to get the latest version of slendr with devtools::install_github("bodkan/slendr").

For whatever reason msprime occassionally gives this error: Traceback (most recent call last): File "/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpjyLc4w/filee337593377b2/script.py", line 198, in <module> ts = msprime.sim_ancestry( File "/Users/mp/Library/r-miniconda-arm64/envs/msprime-1.1.1_tskit-0.4.1_pyslim-0.700/lib/python3.8/site-packages/msprime/ancestry.py", line 1207, in sim_ancestry sim = _parse_sim_ancestry( File "/Users/mp/Library/r-miniconda-arm64/envs/msprime-1.1.1_tskit-0.4.1_pyslim-0.700/lib/python3.8/site-packages/msprime/ancestry.py", line 1008, in _parse_sim_ancestry random_generator = _msprime.RandomGenerator(random_seed) ValueError: seeds must be greater than 0 and less than 2^32

dangliu · 2022-07-20T11:39:39Z

Hi Martin, Thank you for the information. Regarding the recombination map stuff you told me last time, I thought about modifying the --recombination-rate argument or adding a new argument like --recombination-map, but I am not sure about it and not super familiar with these things. So, I will still wait for you in this case, no hurry. Good luck with the CRAN things! Best, Dang Martin Petr ***@***.***> 於 2022年7月20日週三下午1:10寫道：

…

bodkan · 2022-07-20T12:10:23Z

Just for completeness - this is purely an UI change, nothing has changed in terms of simulation itself.

In terms of reproducibility, the same random seed will result in the same tree sequence output even in the latest version.

petrelharp · 2022-07-20T18:56:20Z

This makes sense, and looks like you have a good solutoin.

bodkan · 2022-07-21T12:23:46Z

Checks passed. Merging the PR now and resubmitting to CRAN. 🤞

bodkan added 30 commits July 18, 2022 15:25

Rename sampling= to samples=

9d18e9d

Automatically return tree sequence from slim() and msprime()

b1e1458

Process methods argument

fae5f44

Adapt examples to the new tree sequence handling

7358ee5

Write an informative message upon loading a t.s.

50adbe5

Write debugging output only when --debug is given

bca5da8

Port vignettes to the new output handling method

9310884

Check for the present of the locations file

9051605

Remove read_example

2b7c12b

Add function for printing tree sequence summary

2da8e54

Update tree-sequence.R examples and logs

3a5d5c7

Reset Makefile restoring

d636eec

Update docs

493313c

Modify output tree sequence handling

03642c4

Adapt test-ts.R to the new output method

3c51ec6

Adapt test-trees.R to the new output method

0246572

Adapt test-ts-ancestors-descendants.R to the new output method

d7501e4

Adapt test-simulation-runs.R to the new output method

09749af

Adapt test-sampling.R to the new output method

a781131

Adapt test-pure-slim-vs-slendr.R to the new output method

55f33dc

Adapt test-pure-msprime-vs-slendr.R to the new output method

dde4120

Adapt test-metadata.R to the new output method

ccfe84c

Adapt tests of resizes to the new output method

25f6756

Adapt test-msprime.R to the new output method

5ec9453

Adapt test-msprime-geneflow.R to the new output method

86ab7c7

Fix incorrect model

af30907

Adapt test-interaction-changes.R to new tree-sequence path handling

981dc3a

Move library call to the beginning of the Rmd

a3fb60f

Save locations to a customized path

9425fc4

Adapt test-time-direction.R to new tree-sequence path handling

b6800fc

bodkan added 12 commits July 19, 2022 18:41

Update paper vignette

cad70e3

Update website

0fd381f

Fix broken README outputs

f097e29

Update tests for Linux

27e1d40

Update website

18aaad2

Update Makefile

b85f492

Make default t.s. loading optional, add tests

221de8e

Update cran-comments

805d89d

Update documentation

c6d19d1

Update examples

3c435ad

Update Makefile

75fc94a

bodkan added 6 commits July 20, 2022 15:54

Satisfy a couple of more CRAN warnings

ba9a245

Update cran-comments.md

252e6df

Add a deprecation message

adf4e0d

Update news

37c996a

Update NEWS

587bd16

Remove redundant (and sometimes failing) random seed generation

adbc898

bodkan added 4 commits July 20, 2022 21:33

Expand Python loading message

5af3acb

Update website

cbe6ed4

Update msprime code in the built-in example

80f5640

Temporarilly turn off off Windows G.A. due to upstream bugs

b2bafb6

bodkan merged commit 7411b68 into main Jul 21, 2022

bodkan deleted the changing-default-outputs branch August 20, 2022 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing the way tree-sequence outputs are handled by default #100

Changing the way tree-sequence outputs are handled by default #100

bodkan commented Jul 20, 2022 •

edited

Loading

dangliu commented Jul 20, 2022 via email

bodkan commented Jul 20, 2022

petrelharp commented Jul 20, 2022

bodkan commented Jul 21, 2022

Changing the way tree-sequence outputs are handled by default #100

Changing the way tree-sequence outputs are handled by default #100

Conversation

bodkan commented Jul 20, 2022 • edited Loading

dangliu commented Jul 20, 2022 via email

bodkan commented Jul 20, 2022

petrelharp commented Jul 20, 2022

bodkan commented Jul 21, 2022

bodkan commented Jul 20, 2022 •

edited

Loading