Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the way tree-sequence outputs are handled by default #100

Merged
merged 52 commits into from
Jul 21, 2022

Conversation

bodkan
Copy link
Owner

@bodkan bodkan commented Jul 20, 2022

The CRAN submission process has progressed smoothly so far. slendr got lots of great feedback, especially with regards to documentation which is now much nicer and more robust than before.

So far so good.

Unfortunately, CRAN folks pointed out that the current way handling of tree-sequence output files in slendr is incompatible with CRAN policies. In fact, the way the outputs have been handled is explicitly illegal as far as CRAN is concerned.

The problem is with how storing tree sequence outputs has been automatically handled so far. Briefly, for convenience reasons, unless a full path to a tree-sequence output was provided to slim() or msprime() interface functions, outputs were by default saved to a user-defined model directory together with other slendr files.

Example:

# … imagine code defining population dynamics here …

model <- compile_model(path = <…>, …)

# unless an output path is explicitly specified, the following automatically stores the tree sequence in
# the path specified above together with the rest of the model files
slim(model, sequence_length = 1e6, recombination_rate = 1e-8)

# this allows a tree sequence to be loaded simply by without having to explicitly deal with file paths
ts <- ts_load(model) %>% ts_recapitate(...) %>% ts_simplify()

# … proceed with doing fun tree-sequence things …

This is all beautiful, except the fact that CRAN considers it illegal to write anything anywhere unless it is into /tmp/ (or equivalent on different OS). And no, it does not matter that the model directory is often in a /tmp/ itself, or that the interface has been designed with the model object and its associated directory being the central component of the simulation, where tree sequences are put by default. Upon reflection, although this seems to be too strict at first (plenty of Google issues related to this), it is entirely reasonable position to take from the perspective of CRAN.

This has forced me to change the way tree sequence outputs are handled. Not in a dramatic way, but still in a way that’s backwards incompatible with the way I’ve been doing things since the very beginning. I’m glad I discovered this before submitting the paper, doing this during the revisions would be a huge annoyance for… basically everybody.

Here's the new behavior that hopefully adheres to CRAN's super strict -- and, in hindsight, extremely reasonable -- policies:

# simulation function automatically loads a tree-sequence file (which is by default saved
# to a temporary location)
ts <- slim(model, sequence_length = 1e6, recombination_rate = 1e-8)

# … proceed with doing fun tree-sequence things on the generated `ts` object …

Alternatively, one can also do:

# save the tree sequence to a custom location
output_path <- "path/to/my/treesequence.trees"
ts <- slim(model, sequence_length = 1e6, recombination_rate = 1e-8, output = output_path)

# load and process the output basically in the same way as before
ts <- ts_load(file = output_path, model) %>% ts_recapitate(...) %>% ts_simplify()

Or even this (which is closest to the way things have been done in the past):

# save the tree sequence to a custom location but don't automatically load it afterwards
slim(model, sequence_length = 1e6, recombination_rate = 1e-8, output = "path/to/my/treesequence.trees", load = FALSE)

# load and process the output basically in the same way as before
ts <- ts_load(file = "path/to/my/treesequence.trees", model) %>% ts_recapitate(...) %>% ts_simplify()

Pinging people who have been involved in slendr development or have been using slendr for their work: @bhaller, @petrelharp, @mkiravn, @FerRacimo, @MoiColl, @dangliu, @awohns, @jaurbanChicago, @archgen, @emmaprantoni. Once the new version is merged to the main, your code might need some changes in case you update at some point.

Once the automated GitHub Actions tests pass, I will merge the PR and resubmit slendr to CRAN. I will ping you here, then you'll be able to get the latest version of slendr with devtools::install_github("bodkan/slendr").

bodkan added 30 commits July 18, 2022 15:25
bodkan added 12 commits July 19, 2022 18:41
For whatever reason msprime occassionally gives this error:

Traceback (most recent call last):
  File "/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpjyLc4w/filee337593377b2/script.py", line 198, in <module>
    ts = msprime.sim_ancestry(
  File "/Users/mp/Library/r-miniconda-arm64/envs/msprime-1.1.1_tskit-0.4.1_pyslim-0.700/lib/python3.8/site-packages/msprime/ancestry.py", line 1207, in sim_ancestry
    sim = _parse_sim_ancestry(
  File "/Users/mp/Library/r-miniconda-arm64/envs/msprime-1.1.1_tskit-0.4.1_pyslim-0.700/lib/python3.8/site-packages/msprime/ancestry.py", line 1008, in _parse_sim_ancestry
    random_generator = _msprime.RandomGenerator(random_seed)
ValueError: seeds must be greater than 0 and less than 2^32
@dangliu
Copy link

dangliu commented Jul 20, 2022 via email

@bodkan
Copy link
Owner Author

bodkan commented Jul 20, 2022

Just for completeness - this is purely an UI change, nothing has changed in terms of simulation itself.

In terms of reproducibility, the same random seed will result in the same tree sequence output even in the latest version.

@petrelharp
Copy link
Collaborator

This makes sense, and looks like you have a good solutoin.

@bodkan
Copy link
Owner Author

bodkan commented Jul 21, 2022

Checks passed. Merging the PR now and resubmitting to CRAN. 🤞

@bodkan bodkan merged commit 7411b68 into main Jul 21, 2022
@bodkan bodkan deleted the changing-default-outputs branch August 20, 2022 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants