Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulation_type_knockdown adds more cells to KO #26

Closed
HectorRDB opened this issue Jan 27, 2021 · 11 comments
Closed

simulation_type_knockdown adds more cells to KO #26

HectorRDB opened this issue Jan 27, 2021 · 11 comments

Comments

@HectorRDB
Copy link

Hi,
I have been following the knockout vignette from the devel branch and I noticed a weird behavior
If I run, with no effect (multiplier is 1)

  backbone <- backbone_bifurcating()
  model_common <-
    initialise_model(
      backbone = backbone,
      num_cells = 100,
      num_tfs = nrow(backbone$module_info),
      num_targets = 250,
      num_hks = 250,
      simulation_params = simulation_default(
        census_interval = 10, 
        ssa_algorithm = ssa_etl(tau = 300 / 3600),
        experiment_params = simulation_type_wild_type(num_simulations = 2)
      )
    ) %>%
    generate_tf_network() %>%
    generate_feature_network() %>% 
    generate_kinetics() %>%
    generate_gold_standard()
  
  model_wt <- model_common %>%
    generate_cells()
  b3_genes <- model_common$feature_info %>% filter(module_id == "B3") %>% pull(feature_id)
  model_ko <- model_common
  model_ko$simulation_params$experiment_params <- simulation_type_knockdown(
    num_simulations = 2,
    timepoint = 0, 
    genes = b3_genes,
    num_genes = length(b3_genes),
    multiplier = 1
  )
  model_ko <- model_ko %>%
    generate_cells()
  model_comb <-
    combine_models(list(WT = model_wt, KO = model_ko)) %>% 
    generate_experiment()

I don't get identical datasets:

table(table(str_detect(model_comb$simulations$meta$from, "WT")))

FALSE  TRUE 
  186   184 
ggplot(df, aes(x = time, fill = str_detect(from, "WT"))) + geom_density()
ggplot(df, aes(x = sim_time, fill = str_detect(from, "WT"))) + geom_density(alpha = .5)

image

image

I am missing something?
Best

@rcannood
Copy link
Member

Hey Hector!

What do you expect to happen? That you get identical cell profiles as output? Or that you get the same number of cells from both WT and KO?

I assume you set the num_simulations to 2 for the purpose of the reproducible example. If not, you should probably increase the number of simulations.

Do the sections below answer your question?

Robrecht

Identical cell profiles

If you're expecting identical cell profiles: The simulations are still random. For instance, if I run the code above and visualise the model with plot_gold_mappings(model_comb), I get the following plot.

Code
library(tidyverse)
library(dyngen)

set.seed(1)
backbone <- backbone_bifurcating()
model_common <-
  initialise_model(
    backbone = backbone,
    num_cells = 100,
    num_tfs = nrow(backbone$module_info),
    num_targets = 250,
    num_hks = 250,
    simulation_params = simulation_default(
      census_interval = 10, 
      ssa_algorithm = ssa_etl(tau = 300 / 3600),
      experiment_params = simulation_type_wild_type(num_simulations = 2)
    )
  ) %>%
  generate_tf_network() %>%
  generate_feature_network() %>% 
  generate_kinetics() %>%
  generate_gold_standard()

model_wt <- model_common %>%
  generate_cells()
b3_genes <- model_common$feature_info %>% filter(module_id == "B3") %>% pull(feature_id)
model_ko <- model_common
model_ko$simulation_params$experiment_params <- simulation_type_knockdown(
  num_simulations = 2,
  timepoint = 0, 
  genes = b3_genes,
  num_genes = length(b3_genes),
  multiplier = 1
)
model_ko <- model_ko %>%
  generate_cells()
model_comb <-
  combine_models(list(WT = model_wt, KO = model_ko)) %>% 
  generate_experiment()

plot_gold_mappings(model_comb)

Rplot

The first frame is the gold standard for both WT and KO, so there's no difference between them. The second and third frames are the two WT simulations, whereas the fourth and fifth are the KO simulations. With this specific seed, one WT simulation went to the end state on the left and one went to the right, whereas both KO simulations went to the right. If you increase the number of simulations, on average you will find the same kinds of cell profiles, but they will still be different, of course.

Same cell numbers

If you're simply expecting the same number of cells from KO as from WT, I've just pushed a commit to dyngen@devel that allows you to run the cell sampling experiment before combining the models. However, this will yield twice the number of cells specified by the num_cells parameter.

Example
set.seed(1)
backbone <- backbone_bifurcating()
model_common <-
  initialise_model(
    backbone = backbone,
    num_cells = 50,
    num_tfs = nrow(backbone$module_info),
    num_targets = 250,
    num_hks = 250,
    simulation_params = simulation_default(
      census_interval = 10, 
      ssa_algorithm = ssa_etl(tau = 300 / 3600),
      experiment_params = simulation_type_wild_type(num_simulations = 2)
    )
  ) %>%
  generate_tf_network() %>%
  generate_feature_network() %>% 
  generate_kinetics() %>%
  generate_gold_standard()

model_wt <- model_common %>%
  generate_cells() %>% 
  generate_experiment()

b3_genes <- model_common$feature_info %>% filter(module_id == "B3") %>% pull(feature_id)
model_ko <- model_common
model_ko$simulation_params$experiment_params <- simulation_type_knockdown(
  num_simulations = 2,
  timepoint = 0, 
  genes = b3_genes,
  num_genes = length(b3_genes),
  multiplier = 1
)
model_ko <- model_ko %>%
  generate_cells() %>% 
  generate_experiment()
model_comb <-
  combine_models(list(WT = model_wt, KO = model_ko))

model_comb$experiment$cell_info$model %>% table

@HectorRDB
Copy link
Author

HectorRDB commented Jan 27, 2021

Hey Robrecht,
Thanks for the quick answer. Sorry, I should have been clearer on my first issue.

  • Yes indeed, I set 2 here so the code can run faster, I set it to higher values in practice.
  • I indeed have two issues but i think they are related:

The KO experiment always has more cells. Indeed, here, each category of model_comb$simulations$meta$simulation_i will have exactly 92 cells (if its' in WT) and 93 (if it's in KO). Moreover, this number seem to be a function of the census_interval interval argument, not num_cells.

This "extra" cell in KO always has a sim_time variable of 0, i.e. it's not at all random: the second plot I showed is identical if I set n_simulations to 100 or even 500. This means that the two datasets are not fully comparable.
I've done checks comparing the two distributions of sim_time (and time since I am unsure which of those two variables should reflect the truth) when multiplier is 1 and the distribution of p-values is very much skewed toward zero.

I might be misunderstanding the various outputs from the simulation but this all seems weird to me.
Thanks for the help

@HectorRDB
Copy link
Author

HectorRDB commented Jan 28, 2021

Ok, I think I found the issue and it was on my end...
I set the seed before all simulations, which explain why the time distribution was unmoving. Just changing the seed fixes the issue. This solves the imbalance issue.
I still have the question of the number of cells. It seems to be a function of the census_interval interval argument, not num_cells.
Thanks

@rcannood
Copy link
Member

Good to hear you solved the issue about the time distribution.

With regard to the number of cells: the number of samples that are being generated throughout the simulation is equal to:

total_simtime <- simtime_from_backbone(model$backbone)
sample_interval <- model$simulation_params$census_interval
num_simulations <- nrow(model$simulation_params$experiment_params)

num_samples <- floor(total_simtime / sample_interval) * num_simulations

If num_samples < model$numbers$num_cells, then there will be only num_samples cells, otherwise the full amount will be sampled. Maybe I should let dyngen show a warning when this occurs. If it does, simply decrease the census interval a bit.

Can you confirm that this was indeed the case?

Robrecht

@HectorRDB
Copy link
Author

Thanks!! So with the parameters from above, I get num_samples is 138, which is higher than model$numbers$num_cells (100).

I then generate the wild type ( model_wt <- model_common %>% generate_cells() ) but this object has 178 cells, not 100 or even 138

nrow(model_wt$simulations$dimred)
[1] 178

Am I missing something?

rcannood added a commit that referenced this issue Jun 21, 2021
…backbone segments does not have any simulations steps (#26).
@rcannood
Copy link
Member

Hey @HectorRDB,

Sorry for the delay, I finally got around to taking a look at this problem.

I think dyngen was actually generating fewer cells than expected (in the first code example, we expect model$simulations$meta to have 200 rows since generate_experiment() is being called twice).

I solved a bug fix and pushed the fix to devel. I'll publish the fix on CRAN asap.

Does this solve your problem?

Robrecht

@rcannood
Copy link
Member

Also, nrow(model_wt$simulations$dimred) should be much greater than num_cells (ideally, >×10). If it isn't, then the census_interval needs to be decreased or the number of simulations should be increased (See ?simulation_default).

@rcannood rcannood mentioned this issue Jun 21, 2021
@HectorRDB
Copy link
Author

Hi @rcannood
I won't have time in the next few days to look at it but I'll keep you updated when I get around to it. Thanks for checking it out though.
best

@rcannood
Copy link
Member

Hey @HectorRDB,

Do you still want to get back to this issue or shall I just close it?

Kind regards,
Robrecht

@HectorRDB
Copy link
Author

You can close it, I took a quick look and it indeed seem to work. Thanks a lot for the help

@rcannood
Copy link
Member

rcannood commented Dec 1, 2021

👍

@rcannood rcannood closed this as completed Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants