-
Notifications
You must be signed in to change notification settings - Fork 6
/
simulating_batch_effects.Rmd
140 lines (119 loc) · 4.25 KB
/
simulating_batch_effects.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "Advanced: Simulating batch effects"
output: rmarkdown::html_vignette
vignette: >
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{Advanced: Simulating batch effects}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
<!-- github markdown built using
rmarkdown::render("vignettes/simulating_batch_effects.Rmd", output_format = rmarkdown::github_document())
-->
An easy way of simulating batch effects is by performing multiple dyngen runs, but with different kinetics.
dyngen samples the GRN kinetics randomly from a set of predefined distributions (See `?generate_kinetics`).
## Create underlying gene regulatory network
First, create the 'common' part of the dyngen simulation as follows. This will
ensure that the gene regulatory network-part of the simulation is exactly the same.
```{r init, message=FALSE, fig.width = 10, fig.height = 6}
library(tidyverse)
library(dyngen)
set.seed(1)
backbone <- backbone_bifurcating()
config <-
initialise_model(
backbone = backbone,
num_cells = 1000,
num_tfs = nrow(backbone$module_info),
num_targets = 250,
num_hks = 250,
simulation_params = simulation_default(
census_interval = 10,
ssa_algorithm = ssa_etl(tau = 300 / 3600),
experiment_params = simulation_type_wild_type(num_simulations = 100)
)
)
```
```{r init2, include=TRUE}
# the simulation is being sped up because rendering all vignettes with one core
# for pkgdown can otherwise take a very long time
set.seed(1)
config <-
initialise_model(
backbone = backbone,
num_cells = 1000,
num_tfs = nrow(backbone$module_info),
num_targets = 50,
num_hks = 50,
verbose = interactive(),
download_cache_dir = tools::R_user_dir("dyngen", "data"),
simulation_params = simulation_default(
census_interval = 5,
ssa_algorithm = ssa_etl(tau = .01),
experiment_params = simulation_type_wild_type(num_simulations = 10)
)
)
```
```{r model, message=FALSE, fig.width = 10, fig.height = 6}
model_common <-
config %>%
generate_tf_network() %>%
generate_feature_network()
```
## Simulate multiple times with different kinetics
Now you can run the simulation multiple times.
Note that for each separate run, the `generate_kinetics()` step ensures that the
thermodynamic parameters will be different (yet drawn from the same distribution).
Some values for the kinetics of the main transcription factors are not changed,
to ensure that the desired dynamic process is obtained.
```{r simulation}
model_a <- model_common %>%
generate_kinetics() %>%
generate_gold_standard() %>%
generate_cells()
model_b <- model_common %>%
generate_kinetics() %>%
generate_gold_standard() %>%
generate_cells()
```
The differences if kinetics parameters can be visualised as follows.
```{r compare_kinetics}
params_a <-
dyngen:::.kinetics_extract_parameters_as_df(model_a$feature_info, model_a$feature_network) %>%
select(id, group = param, model_a = value)
params_b <-
dyngen:::.kinetics_extract_parameters_as_df(model_b$feature_info, model_b$feature_network) %>%
select(id, group = param, model_b = value)
params_ab <-
inner_join(params_a, params_b, by = c("id", "group"))
ggplot(params_ab) +
geom_point(aes(model_a, model_b, colour = group)) +
facet_wrap(~group, scales = "free") +
theme_bw()
```
## Combine outputs and visualise
Finally, combine the simulations as follows.
```{r combine}
model_ab <-
combine_models(list(left = model_a, right = model_b)) %>%
generate_experiment()
```
Show a dimensionality reduction.
```{r plot, fig.width = 10, fig.height = 6}
plot_simulations(model_ab)
plot_gold_mappings(model_ab, do_facet = FALSE)
```
Visualise the dataset using dyno.
```{r dyno, fig.width = 10, fig.height = 6}
library(dyno)
dataset <- as_dyno(model_ab)
plot_dimred(dataset)
plot_heatmap(dataset, features_oi = 50)
```
## Additional thoughts
It's definitely possible to think of different ways of simulating batch effects.
For instance, you could implement a different strategy for changing the kinetics parameters
between runs. You could also change the gene regulatory network itself, e.g. by changing the
effect of some of the edges, or changing the targets of certain regulators.
If you come up with something, feel free to let us know on GitHub!