Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the plot_model layout algorithm #135

Merged
merged 17 commits into from
Apr 25, 2023
Merged

Change the plot_model layout algorithm #135

merged 17 commits into from
Apr 25, 2023

Conversation

bodkan
Copy link
Owner

@bodkan bodkan commented Apr 24, 2023

As soon as more complex demographic models were implemented using slendr, a clear and very annoying issue popped up.

Consider the following random model:

o <- population("outgroup", time = 1, N = 1000)
pop_ancA <- population("pop_ancA", time = 100, parent = o, N = 3000, remove = 200)
popA1 <- population("popA1", time = 200, parent = pop_ancA, N = 400)
popA2 <- population("popA2", time = 200, parent = pop_ancA, N = 10000)
pop_ancB <- population("pop_ancB", time = 500, parent = o, N = 7000, remove = 800)
popB1 <- population("popB1", time = 800, parent = pop_ancB, N = 200)
popB2 <- population("popB2", time = 800, parent = pop_ancB, N = 2000)
popX <- population("popX", time = 801, parent = popB1, N = 1500)
popY <- population("popY", time = 802, parent = popX, N = 8500)
pop_ancC <- population("pop_ancC", time = 700, parent = o, N = 2000, remove = 1000)
popC1 <- population("popC1", time = 901, parent = pop_ancC, N = 550)
popC2 <- population("popC2", time = 901, parent = pop_ancC, N = 7500)

model <- compile_model(
  list(o, pop_ancA, popA1, popA2, pop_ancB, popB1, popB2, popX, popY, pop_ancC, popC1, popC2),
  generation_time = 1, simulation_length = 1000
)

When this is visualized, the following silliness pops up:

plot_model(model)

image

The issue is especially prominent if many split and gene flow events happen close in time.

In my defense, when plot_model() was written, I only used fairly compact models for testing slendr. With more complex models being programmed these days -- either imported via demes->slendr importer written by @IsabelMarleen, models developed and tested as part of the work on our new ABC package, or randomized models -- this is getting more and more annoying.

This PR is an attempt to fix this using a relatively straightforward solution while keeping most of the plot_model code intact (that took a lot of ggplot2 black magic and I'd rather not touch that unless I absolutely have to).

The main idea is to solve most of the most apparent issues which are hindering the role of plot_model() for debugging and inspection of models, not more (a graph can always be made prettier but that's not the goal here).

@bodkan
Copy link
Owner Author

bodkan commented Apr 24, 2023

The issue comes down to the ordering of populations along the x-axis. The current slendr plot_model algorithm simply sorts populations according to the split time, and arranges the population polygons in sort of a cascading layout going from left to right (picture above).

My first idea was to switch this ordering algorithm to one which orders population not according to the split time, but according to the in-order traversal principle. Implementation is here.

When ran on the example model above, it seems to be working OK?

plot_model(model)

image

plot_model(model, sizes = FALSE)

image

@bodkan
Copy link
Owner Author

bodkan commented Apr 24, 2023

A couple of more tests using models that @IsabelMarleen implemented for her thesis. All of those were completely unreadable with the old plot_model() but look quite nice now. (Except for the gene flow arrows which... I'm not entirely sure how to fix for extremely complex migration models and probably won't be tackling here.)

  • Elephant demographic history:

image

Previous version of this was next to useless:

image

  • Gutenkunst et al. model imported into slendr via Isabel's new slendr importer function from the Demes spec:

image

  • Jacobs et al. model imported from the Demes spec:

image

@bodkan
Copy link
Owner Author

bodkan commented Apr 24, 2023

Hmm, another test -- what if there are multiple ancestral populations (i.e. populations without a parental population?).

o1 <- population("outgroup1", time = 1, N = 5000)
pop_ancA <- population("pop_ancA", time = 100, parent = o1, N = 3000, remove = 200)
popA1 <- population("popA1", time = 200, parent = pop_ancA, N = 400)
popA2 <- population("popA2", time = 200, parent = pop_ancA, N = 10000)
o2 <- population("outgroup2", time = 1, N = 7000, remove = 800)
popB1 <- population("popB1", time = 800, parent = o2, N = 200)
popB2 <- population("popB2", time = 800, parent = o2, N = 2000)
popX <- population("popX", time = 801, parent = popB1, N = 1500)
popY <- population("popY", time = 802, parent = popX, N = 8500)
pop_ancC <- population("pop_ancC", time = 700, parent = o1, N = 2000, remove = 1000)
popC1 <- population("popC1", time = 901, parent = pop_ancC, N = 550)
popC2 <- population("popC2", time = 901, parent = pop_ancC, N = 7500)

model <- compile_model(
  list(o1, o2, pop_ancA, popA1, popA2, popB1, popB2, popX, popY, pop_ancC, popC1, popC2),
  generation_time = 1, simulation_length = 1000
)

plot_model(model)

Old plot_model():

image

New plot_model() from this PR:

image

Definitely more useful.

I am kind of bothered by the overlapping population labels which can also obscure the splitting lineages (observe the purple "popB1" overlapping with darkish blue "popX", and also obscuring where is the split happening from-to and when).

I wonder if labels of populations which are not themselves parents of something could be plotted on the bottom? The overlap will always be there, unfortunately, but at least the split of lineages will be visible?

@bodkan
Copy link
Owner Author

bodkan commented Apr 24, 2023

It was also appropriate to fix #131 here.

@bodkan
Copy link
Owner Author

bodkan commented Apr 24, 2023

I am kind of bothered by the overlapping population labels which can also obscure the splitting lineages (observe the purple "popB1" overlapping with darkish blue "popX", and also obscuring where is the split happening from-to and when).

I wonder if labels of populations which are not themselves parents of something could be plotted on the bottom? The overlap will always be there, unfortunately, but at least the split of lineages will be visible?

OK, I ended up placing labels of populations which:

  • are not ancestral
  • do not have more than one daughter population

along the bottom of the output.

image

I think this looks much nicer than the first draft version:

In particular in the model which has many split events happening quickly in sequence, the splits themselves are no longer obscured by the labels (image below). Not super clear but much clearer. Maybe distributing colors of lineages not in a gradient with colors closer the closer the populations are along the x-axis would help? Although significant jumps in color wouldn't look that nice. I think I'll keep things this way, for now.

image

I have a feeling something might be still off with plotting on a log10-scale y-axis. I will think about that tomorrow.

That said, I think we're there, @IsabelMarleen! I will ping you again once I merge the PR (tomorrow, I think). You'll then be able to update figures in your thesis.

@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2023

Codecov Report

Merging #135 (03444e5) into main (109732f) will increase coverage by 0.10%.
The diff coverage is 98.18%.

@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   83.77%   83.88%   +0.10%     
==========================================
  Files           6        6              
  Lines        3076     3103      +27     
==========================================
+ Hits         2577     2603      +26     
- Misses        499      500       +1     
Impacted Files Coverage Δ
R/visualization.R 64.12% <98.18%> (+2.37%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bodkan
Copy link
Owner Author

bodkan commented Apr 25, 2023

A few minor updates and fixes (in particular, y-axis log10 time scale turned out to be broken for forward-time models). I also changed the color of the "population split lines" to match the population that is diverging from its parent population, rather than the other way around.

Screenshot 2023-04-25 at 13 46 54

I could imagine tweaking and aligning pixels here and there but... I think two days was more than enough time to spend on this, so I will merge the PR as soon as all CI tests pass.

I will likely decorate this with a v0.6.0 version too as there have been reasonably useful fixes and changes implemented since the last version that would be nice to have on CRAN for everyone out there.

@bodkan bodkan merged commit e28aa00 into main Apr 25, 2023
@bodkan bodkan deleted the fixing-plot-model branch April 25, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants