Permalink
Newer
Older
100644 508 lines (359 sloc) 21.7 KB
Mar 15, 2016 @cheeyeelim Changes for revision round.
1 - Brief introduction
2 - Installation
3 - Input data format
4 - Output format
5 - Useful functions in BTR
6 - Example workflows
7 - Inferring model without an initial model
8 - Full workflow
9 - Initial setup
10 - Data preparation
11 - Run model training
12 - Inferring model with an initial model
13 - Full workflow
14 - Initial setup
15 - Data preparation
16 - Run model training
17 - Extending model with more genes
18 - Full workflow
19 - Initial setup
20 - Data preparation
21 - Add extra genes to the initial Boolean model
22 - Estimate initial state for the extra genes
23 - Run model training
Dec 3, 2015 @cheeyeelim Final update
24
25 <!---
26 rmarkdown::pdf_document:
27 toc: true
28 number_sections: true
29 rmarkdown::md_document:
30 variant: markdown_github
31 toc: true
32 -->
Nov 29, 2015 @cheeyeelim Generate readme
33 Brief introduction
34 ==================
35
Dec 11, 2015 @cheeyeelim Fix names
36 `BTR` is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users.
Nov 29, 2015 @cheeyeelim Generate readme
37
Dec 11, 2015 @cheeyeelim Fix names
38 Running `BTR` is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, `BTR` may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of `BTR`. This can be easily achieved by using `doParallel` package, as illustrated in the example.
Nov 29, 2015 @cheeyeelim Generate readme
39
40 Note that the examples presented in this vignette are different from the results presented in our paper. The examples presented here have been simplified to speed up the processing time.
41
42 Installation
43 ============
44
Dec 11, 2015 @cheeyeelim Fix names
45 `BTR` can be installed from CRAN.
Nov 29, 2015 @cheeyeelim Generate readme
46
47 ``` r
Dec 11, 2015 @cheeyeelim Fix names
48 install.packages('BTR')
Nov 29, 2015 @cheeyeelim Generate readme
49 ```
50
Dec 3, 2015 @cheeyeelim Final update
51 Or from Github for the latest version. To install from Gitbub, you will require the `devtools` package.
Nov 29, 2015 @cheeyeelim Generate readme
52
53 ``` r
Dec 3, 2015 @cheeyeelim Final update
54 install.packages('devtools')
Dec 11, 2015 @cheeyeelim Fix names
55 devtools::install_github("cheeyeelim/BTR")
Nov 29, 2015 @cheeyeelim Generate readme
56 ```
57
58 Also install `doParallel` package if you intend to use parallel processing.
59
60 Input data format
61 =================
62
63 Depending on the analysis, only 3 types of data will ever be needed. The format of the data required is discussed below.
64
65 1. Expression data. A matrix with genes on the columns, and cells on the row.
66
67 The expression data should be preprocessed as in any standard sequencing data processing pipelines, which includes quality control filtering and normalisation.
68
Dec 3, 2015 @cheeyeelim Final update
69 Use `initialise_raw_data` to convert expression data into a suitable format for model inference. It is recommended to use `initialise_raw_data` before subsetting the expression data for preferred cell types.
Nov 29, 2015 @cheeyeelim Generate readme
70
71 ``` r
72 data(wilson_raw_data)
Dec 3, 2015 @cheeyeelim Final update
73 round(wilson_raw_data[1:5, 1:5], 4)
Nov 29, 2015 @cheeyeelim Generate readme
74 ```
75
76 | | bptf| cbfa2t3h| csf1r| dnmt3a| eif2b1|
77 |-----------|--------:|---------:|--------:|--------:|-------:|
78 | lmpp\_002 | 1.0261| 2.3944| 2.6847| 1.6636| 2.0203|
79 | lmpp\_003 | 2.6496| 1.7800| 1.6821| 1.5941| 2.7736|
80 | lmpp\_004 | 10.3080| 0.5889| 4.2653| -0.5565| 0.0026|
81 | lmpp\_007 | 0.5419| 1.8631| 10.8468| 0.1757| 1.0873|
82 | lmpp\_008 | 0.9209| 2.6637| 2.8549| 2.1965| 2.3663|
83
Dec 3, 2015 @cheeyeelim Final update
84 ``` r
85 edata = initialise_raw_data(wilson_raw_data, max_expr='low') #max_expr='low' because this is qPCR data.
86 ```
87
Nov 29, 2015 @cheeyeelim Generate readme
88 1. Initial Boolean model. A data frame with two columns, targets and update functions.
89
90 Note that if an update function contains both activation and inhibition genes, they must be expressed with a separate clause containing only activation genes, and a separate clause containing only inhibition genes. (See the update functions of Gata1 and Gata2 for examples)
91
92 Use `initialise_model` to convert the input Boolean model into a BoolModel object.
93
94 ``` r
95 data(krum_bmodel)
96 head(krum_bmodel)
97 ```
98
99 | targets | factors |
100 |:--------|:-----------------------------------|
101 | gata2 | gata2 & ! ((gata1 & fog1) | sfpi1) |
102 | gata1 | (gata1 | gata2 | fli1) & ! sfpi1 |
103 | fog1 | gata1 |
104 | eklf | gata1 & ! fli1 |
105 | fli1 | gata1 & ! eklf |
106 | scl | gata1 & ! sfpi1 |
107
Dec 3, 2015 @cheeyeelim Final update
108 ``` r
109 bmodel = initialise_model(krum_bmodel)
110 ```
111
Nov 29, 2015 @cheeyeelim Generate readme
112 1. Initial state.
113
114 A single row data frame with genes as the columns. The expression state of each gene must be in binarised form, i.e. 0s and 1s.
115
116 Note that all the genes that are present in the initial Boolean model must also be present here.
117
118 ``` r
119 data(krum_istate)
120 head(krum_istate)
121 ```
122
123 | | cjun| cebpa| fli1| gata1| gata2| eklf| sfpi1| gfi1| scl| egrnab| fog1|
124 |----------------|-----:|------:|-----:|------:|------:|-----:|------:|-----:|----:|-------:|-----:|
125 | initial\_state | 0| 1| 0| 0| 1| 0| 1| 0| 0| 0| 0|
126
127 Output format
128 =============
129
Dec 11, 2015 @cheeyeelim Fix names
130 BTR supports several output formats for Boolean models, as shown below.
Nov 29, 2015 @cheeyeelim Generate readme
131
132 - `outgraph_model` - Outputs a Boolean model in a tab-delimited file with each line being an edge (i.e. gene interaction). This function also outputs a node attribute file, which can be used to distinguish gene and AND nodes in a graph plotting software. This format is readable by both Cytoscape and Gephi.
133 - `outgenysis_model` - Outputs a Boolean model in a space-delimited file with each line being an edge (i.e. gene interaction). This format is readable by genYsis (used for steady state analysis).
134 - `writeBM` - Outputs a Boolean model in a comma-delimited file similar in format to the input file format (i.e. two columns: genes and update functions).
135
Dec 11, 2015 @cheeyeelim Fix names
136 BTR can also output a state transition graph.
Nov 29, 2015 @cheeyeelim Generate readme
137
138 - `outstate_graph` - Outputs a state space of a Boolean model simulated with an initial state. This format is readable by both Cytoscape and Gephi.
139
Dec 11, 2015 @cheeyeelim Fix names
140 Useful functions in BTR
Mar 15, 2016 @cheeyeelim Changes for revision round.
141 =======================
Nov 29, 2015 @cheeyeelim Generate readme
142
Dec 11, 2015 @cheeyeelim Fix names
143 Besides training Boolean models, BTR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data.
Nov 29, 2015 @cheeyeelim Generate readme
144
Dec 11, 2015 @cheeyeelim Fix names
145 - `model_train` - Core function in `BTR` that performs Boolean model inference.
Nov 29, 2015 @cheeyeelim Generate readme
146 - `simulate_model` - Simulate a Boolean model asynchronously using an initial state, and return its state space.
147 - `calc_mscore` - Calculate a distance score for a Boolean model with respect to an expression data.
148 - `model_dist` - Calculate the number of genes in the update functions that differ between two Boolean models.
149 - `model_setdiff` - Show the genes in the update functions that differ between two Boolean models.
150
151 Example workflows
152 =================
153
154 Three example workflows will be discussed in this vignette: (1) Inferring model without an initial model, (2) Inferring model with an initial model, (3) Extending model with more genes. The two workflows are largely similar, which only differ in the data preparation step.
155
156 Inferring model without an initial model
157 ----------------------------------------
158
159 This workflow is intended for use on inferring a Boolean model without an initial model.
160
Dec 11, 2015 @cheeyeelim Fix names
161 When no initial model is used, BTR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data.
Nov 29, 2015 @cheeyeelim Generate readme
162
163 ### Full workflow
164
165 Full workflow is included here for easy referencing. Each step is discussed in further details below.
166
167 ``` r
168 set.seed(0) #use to ensure reproducibility. remove in actual use.
169
170 # (1) Setup paths and environment.
Dec 11, 2015 @cheeyeelim Fix names
171 library(BTR)
Nov 29, 2015 @cheeyeelim Generate readme
172
173 # If intending to use parallel processing, uncomment the following lines.
174 # library(doParallel) num_core = 4 #specify the number of cores to be used.
175 # doParallel::registerDoParallel(cores=num_core)
176
177 # (2) Load data.
178 data(wilson_raw_data) #load a data frame of expression data.
Mar 15, 2016 @cheeyeelim Changes for revision round.
179 cdata = initialise_raw_data(wilson_raw_data, max_expr = "low")
Nov 29, 2015 @cheeyeelim Generate readme
180
181 # (3) Filter cell types.
Dec 3, 2015 @cheeyeelim Final update
182 cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep",
183 rownames(cdata))
184 fcdata = cdata[cell_ind, ] #select only relevant cells.
Nov 29, 2015 @cheeyeelim Generate readme
185
186 # (4) Filter genes.
187 gene_ind = c("fli1", "gata1", "gata2", "gfi1", "scl", "sfpi1") #select genes to be included.
Dec 3, 2015 @cheeyeelim Final update
188 fcdata = fcdata[, gene_ind]
Nov 29, 2015 @cheeyeelim Generate readme
189
190 # (5) Inferring Boolean model.
Mar 15, 2016 @cheeyeelim Changes for revision round.
191 final_model = model_train(cdata = fcdata, max_varperrule = 3, verbose = T)
Nov 29, 2015 @cheeyeelim Generate readme
192
193 # (6) Visualise the Boolean model generated.
194 plotBM(final_model)
195 ```
196
197 ### Initial setup
198
Dec 11, 2015 @cheeyeelim Fix names
199 The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package.
Nov 29, 2015 @cheeyeelim Generate readme
200
201 ``` r
Dec 3, 2015 @cheeyeelim Final update
202 set.seed(0) #use to ensure reproducibility. remove in actual use.
Nov 29, 2015 @cheeyeelim Generate readme
203
Dec 3, 2015 @cheeyeelim Final update
204 # (1) Setup paths and environment.
Dec 11, 2015 @cheeyeelim Fix names
205 library(BTR)
Nov 29, 2015 @cheeyeelim Generate readme
206
Dec 3, 2015 @cheeyeelim Final update
207 # If intending to use parallel processing, uncomment the following lines.
208 # library(doParallel) num_core = 4 #specify the number of cores to be used.
209 # doParallel::registerDoParallel(cores=num_core)
Nov 29, 2015 @cheeyeelim Generate readme
210 ```
211
212 ### Data preparation
213
214 Only the expression data is needed for inferring a Boolean model without an initial model.
215
216 To load the data into R, use `read.table` or `read.csv`. In this example, we are using the example data included with the package, so we are accessing it by using `data`.
217
Dec 3, 2015 @cheeyeelim Final update
218 `initialise_raw_data` is used to preprocess the data.
219
Nov 29, 2015 @cheeyeelim Generate readme
220 ``` r
Dec 3, 2015 @cheeyeelim Final update
221 # (2) Load data.
222 data(wilson_raw_data) #load a data frame of expression data.
Mar 15, 2016 @cheeyeelim Changes for revision round.
223 cdata = initialise_raw_data(wilson_raw_data, max_expr = "low")
Nov 29, 2015 @cheeyeelim Generate readme
224 ```
225
Dec 3, 2015 @cheeyeelim Final update
226 Once data is loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete.
Nov 29, 2015 @cheeyeelim Generate readme
227
228 ``` r
229 # (3) Filter cell types.
Dec 3, 2015 @cheeyeelim Final update
230 cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep",
231 rownames(cdata))
232 fcdata = cdata[cell_ind, ] #select only relevant cells.
Nov 29, 2015 @cheeyeelim Generate readme
233
234 # (4) Filter genes.
235 gene_ind = c("fli1", "gata1", "gata2", "gfi1", "scl", "sfpi1") #select genes to be included.
Dec 3, 2015 @cheeyeelim Final update
236 fcdata = fcdata[, gene_ind]
Nov 29, 2015 @cheeyeelim Generate readme
237 ```
238
239 ### Run model training
240
241 To reconstruct a Boolean model from an expression data, run `model_train`.
242
243 In this example, `model_train` takes a few seconds to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above.
244
245 You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise quickly using `plotBM`, which is based on `igraph` package. For easier manipulation, output the Boolean model using `outgraph_model` and display it with Cytoscape or Gephi.
246
247 ``` r
Dec 3, 2015 @cheeyeelim Final update
248 # (5) Inferring Boolean model.
Mar 15, 2016 @cheeyeelim Changes for revision round.
249 final_model = model_train(cdata = fcdata, max_varperrule = 3, verbose = T)
Nov 29, 2015 @cheeyeelim Generate readme
250
Dec 3, 2015 @cheeyeelim Final update
251 # (6) Visualise the Boolean model generated.
Nov 29, 2015 @cheeyeelim Generate readme
252 plotBM(final_model)
253 ```
254
Mar 15, 2016 @cheeyeelim Changes for revision round.
255 ![](vignettes/btr_files/figure-markdown_github/unnamed-chunk-17-1.png)<!-- -->
Nov 29, 2015 @cheeyeelim Generate readme
256
257 Inferring model with an initial model
258 -------------------------------------
259
260 This workflow is intended for use on inferring a Boolean model with an initial model.
261
262 When an initial model is used, note that only genes that are both present in the initial model and expression data will be used for reconstructing gene interactions. Any genes in the initial model that do not have corresponding expression values in the data will keep their original gene interactions as specified in the initial model without any modifications.
263
264 ### Full workflow
265
266 Full workflow is included here for easy referencing. Each step is discussed in further details below.
267
268 ``` r
269 set.seed(0) #use to ensure reproducibility. remove in actual use.
270
271 # (1) Setup paths and environment.
Dec 11, 2015 @cheeyeelim Fix names
272 library(BTR)
Nov 29, 2015 @cheeyeelim Generate readme
273
274 # If intending to use parallel processing, uncomment the following lines.
275 # library(doParallel) num_core = 4 #specify the number of cores to be used.
276 # doParallel::registerDoParallel(cores=num_core)
277
278 # (2) Load data.
279 data(krum_bmodel) #load a data frame of Boolean model.
280 data(krum_istate) #load a data frame of initial state.
281 data(wilson_raw_data) #load a data frame of expression data.
282
283 bmodel = initialise_model(krum_bmodel)
284 istate = krum_istate
Mar 15, 2016 @cheeyeelim Changes for revision round.
285 cdata = initialise_raw_data(wilson_raw_data, max_expr = "low")
Nov 29, 2015 @cheeyeelim Generate readme
286
287 # (3) Filter cell types.
Dec 3, 2015 @cheeyeelim Final update
288 cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep",
289 rownames(cdata))
290 fcdata = cdata[cell_ind, ] #select only relevant cells.
Nov 29, 2015 @cheeyeelim Generate readme
291
292 # (4) Inferring Boolean model.
Mar 15, 2016 @cheeyeelim Changes for revision round.
293 final_model = model_train(cdata = fcdata, bmodel = bmodel, istate = istate,
294 max_varperrule = 3, verbose = T)
Nov 29, 2015 @cheeyeelim Generate readme
295
296 # (5) Visualise the Boolean model generated.
297 plotBM(final_model)
298 ```
299
300 ### Initial setup
301
Dec 11, 2015 @cheeyeelim Fix names
302 The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package.
Nov 29, 2015 @cheeyeelim Generate readme
303
304 ``` r
Dec 3, 2015 @cheeyeelim Final update
305 set.seed(0) #use to ensure reproducibility. remove in actual use.
Nov 29, 2015 @cheeyeelim Generate readme
306
Dec 3, 2015 @cheeyeelim Final update
307 # (1) Setup paths and environment.
Dec 11, 2015 @cheeyeelim Fix names
308 library(BTR)
Nov 29, 2015 @cheeyeelim Generate readme
309
Dec 3, 2015 @cheeyeelim Final update
310 # If intending to use parallel processing, uncomment the following lines.
311 # library(doParallel) num_core = 4 #specify the number of cores to be used.
312 # doParallel::registerDoParallel(cores=num_core)
Nov 29, 2015 @cheeyeelim Generate readme
313 ```
314
315 ### Data preparation
316
317 3 pieces of data are needed to infer a Boolean model with an initial model: an expression data, an initial Boolean model and an initial state.
318
319 To load the data into R, use `read.table` or `read.csv`. In this example, we are using the example data included with the package, so we are accessing it by using `data`.
320
Dec 3, 2015 @cheeyeelim Final update
321 `initialise_model` converts the data frame containing the Boolean model into a BoolModel object. `initialise_raw_data` is used to preprocess the data.
Nov 29, 2015 @cheeyeelim Generate readme
322
323 ``` r
Dec 3, 2015 @cheeyeelim Final update
324 # (2) Load data. (2) Load data.
325 data(krum_bmodel) #load a data frame of Boolean model.
326 data(krum_istate) #load a data frame of initial state.
327 data(wilson_raw_data) #load a data frame of expression data.
Nov 29, 2015 @cheeyeelim Generate readme
328
329 bmodel = initialise_model(krum_bmodel)
330 istate = krum_istate
Mar 15, 2016 @cheeyeelim Changes for revision round.
331 cdata = initialise_raw_data(wilson_raw_data, max_expr = "low")
Nov 29, 2015 @cheeyeelim Generate readme
332 ```
333
Dec 3, 2015 @cheeyeelim Final update
334 Once data are loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. In this example, genes are not filtered as all genes that are present in both expression data and Boolean model are used automatically.
Nov 29, 2015 @cheeyeelim Generate readme
335
336 ``` r
337 # (3) Filter cell types.
Dec 3, 2015 @cheeyeelim Final update
338 cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep",
339 rownames(cdata))
340 fcdata = cdata[cell_ind, ] #select only relevant cells.
Nov 29, 2015 @cheeyeelim Generate readme
341 ```
342
343 ### Run model training
344
345 To reconstruct a Boolean model from an expression data, run `model_train`.
346
Dec 3, 2015 @cheeyeelim Final update
347 In this example, `model_train` takes one or two minutes to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above.
Nov 29, 2015 @cheeyeelim Generate readme
348
349 You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise using `plotBM`, which is based on `igraph` package. For easier manipulation, output the Boolean model using `outgraph_model` and display it with Cytoscape or Gephi.
350
351 ``` r
Dec 3, 2015 @cheeyeelim Final update
352 # (4) Inferring Boolean model.
Mar 15, 2016 @cheeyeelim Changes for revision round.
353 final_model = model_train(cdata = fcdata, bmodel = bmodel, istate = istate,
354 max_varperrule = 3, verbose = T)
Nov 29, 2015 @cheeyeelim Generate readme
355
Dec 3, 2015 @cheeyeelim Final update
356 # (5) Visualise the Boolean model generated.
Nov 29, 2015 @cheeyeelim Generate readme
357 plotBM(final_model)
358 ```
359
Mar 15, 2016 @cheeyeelim Changes for revision round.
360 ![](vignettes/btr_files/figure-markdown_github/unnamed-chunk-23-1.png)<!-- -->
Nov 29, 2015 @cheeyeelim Generate readme
361
362 Extending model with more genes
363 -------------------------------
364
365 This workflow is intended for use on extending an initial Boolean model with additional genes.
366
367 When an initial model is used, note that only genes that are both present in the initial model and expression data will be used for reconstructing gene interactions. Any genes in the initial model that do not have corresponding expression values in the data will keep their original gene interactions as specified in the initial model without any modifications.
368
369 ### Full workflow
370
371 Full workflow is included here for easy referencing. Each step is discussed in further details below.
372
373 *Note that this example takes a few minutes to run on a single core. The use of parallel processing is recommended.*
374
375 ``` r
376 set.seed(0) #use to ensure reproducibility. remove in actual use.
377
378 # (1) Setup paths and environment.
Dec 11, 2015 @cheeyeelim Fix names
379 library(BTR)
Nov 29, 2015 @cheeyeelim Generate readme
380
381 # If intending to use parallel processing, uncomment the following lines.
382 # library(doParallel) num_core = 4 #specify the number of cores to be used.
383 # doParallel::registerDoParallel(cores=num_core)
384
385 # (2) Load data.
386 data(krum_bmodel) #load a data frame of Boolean model.
387 data(krum_istate) #load a data frame of initial state.
388 data(wilson_raw_data) #load a data frame of expression data.
389
390 bmodel = initialise_model(krum_bmodel)
391 istate = krum_istate
Mar 15, 2016 @cheeyeelim Changes for revision round.
392 cdata = initialise_raw_data(wilson_raw_data, max_expr = "low")
Nov 29, 2015 @cheeyeelim Generate readme
393
394 # (3) Filter cell types.
Dec 3, 2015 @cheeyeelim Final update
395 cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep",
396 rownames(cdata))
397 fcdata = cdata[cell_ind, ] #select only relevant cells.
Nov 29, 2015 @cheeyeelim Generate readme
398
399 # (4) Adding extra genes to the initial Boolean model. extra_genes =
400 # setdiff(colnames(wilson_raw_data), bmodel@target) #to view available genes
Dec 3, 2015 @cheeyeelim Final update
401 # to be added. print(extra_genes) #to view available genes to be added.
Nov 29, 2015 @cheeyeelim Generate readme
402 add_gene = "ldb1" #genes to be added: ldb1
403 grown_bmodel = grow_bmodel(add_gene, bmodel)
404
Dec 3, 2015 @cheeyeelim Final update
405 # (5) Estimating initial state for the extra genes. (estimating from CMPs)
406 tmp_istate = mean(cdata[grepl("cmp", rownames(cdata)), add_gene])
Nov 29, 2015 @cheeyeelim Generate readme
407 tmp_istate = matrix(round(tmp_istate), nrow = 1)
408 colnames(tmp_istate) = add_gene
409 grown_istate = cbind(istate, tmp_istate)
410 grown_istate = initialise_data(grown_istate)
411
412 # (6) Inferring Boolean model.
Mar 15, 2016 @cheeyeelim Changes for revision round.
413 final_model = model_train(cdata = fcdata, bmodel = grown_bmodel, istate = grown_istate,
414 verbose = T)
Nov 29, 2015 @cheeyeelim Generate readme
415
416 # (7) Visualise the Boolean model generated.
417 plotBM(final_model)
418 ```
419
420 ### Initial setup
421
Dec 11, 2015 @cheeyeelim Fix names
422 The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package.
Nov 29, 2015 @cheeyeelim Generate readme
423
424 ``` r
Dec 3, 2015 @cheeyeelim Final update
425 set.seed(0) #use to ensure reproducibility. remove in actual use.
Nov 29, 2015 @cheeyeelim Generate readme
426
Dec 3, 2015 @cheeyeelim Final update
427 # (1) Setup paths and environment.
Dec 11, 2015 @cheeyeelim Fix names
428 library(BTR)
Nov 29, 2015 @cheeyeelim Generate readme
429
Dec 3, 2015 @cheeyeelim Final update
430 # If intending to use parallel processing, uncomment the following lines.
431 # library(doParallel) num_core = 4 #specify the number of cores to be used.
432 # doParallel::registerDoParallel(cores=num_core)
Nov 29, 2015 @cheeyeelim Generate readme
433 ```
434
435 ### Data preparation
436
437 3 pieces of data are needed to infer a Boolean model with an initial model: an expression data, an initial Boolean model and an initial state.
438
439 To load the data into R, use `read.table` or `read.csv`. In this example, we are using the example data included with the package, so we are accessing it by using `data`.
440
Dec 3, 2015 @cheeyeelim Final update
441 `initialise_model` converts the data frame containing the Boolean model into a BoolModel object. `initialise_raw_data` is used to preprocess the data.
Nov 29, 2015 @cheeyeelim Generate readme
442
443 ``` r
Dec 3, 2015 @cheeyeelim Final update
444 # (2) Load data.
445 data(krum_bmodel) #load a data frame of Boolean model.
446 data(krum_istate) #load a data frame of initial state.
447 data(wilson_raw_data) #load a data frame of expression data.
Nov 29, 2015 @cheeyeelim Generate readme
448
449 bmodel = initialise_model(krum_bmodel)
450 istate = krum_istate
Mar 15, 2016 @cheeyeelim Changes for revision round.
451 cdata = initialise_raw_data(wilson_raw_data, max_expr = "low")
Nov 29, 2015 @cheeyeelim Generate readme
452 ```
453
Dec 3, 2015 @cheeyeelim Final update
454 Once data are loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. In this example, genes are not filtered as all genes that are present in both expression data and Boolean model are used automatically.
Nov 29, 2015 @cheeyeelim Generate readme
455
456 ``` r
457 # (3) Filter cell types.
Dec 3, 2015 @cheeyeelim Final update
458 cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep",
459 rownames(cdata))
460 fcdata = cdata[cell_ind, ] #select only relevant cells.
Nov 29, 2015 @cheeyeelim Generate readme
461 ```
462
463 ### Add extra genes to the initial Boolean model
464
465 Extra genes can be added to the initial model using `grow_bmodel`. The function will add extra genes into the initial model with empty update functions.
466
467 ``` r
Dec 3, 2015 @cheeyeelim Final update
468 # (4) Adding extra genes to the initial Boolean model. extra_genes =
469 # setdiff(colnames(wilson_raw_data), bmodel@target) print(extra_genes) #to
470 # view available genes to be added.
471 add_gene = "ldb1" #genes to be added: ldb1
Nov 29, 2015 @cheeyeelim Generate readme
472 grown_bmodel = grow_bmodel(add_gene, bmodel)
473 ```
474
475 ### Estimate initial state for the extra genes
476
477 Initial state needs to be modify to include the initial expression of the extra genes. The initial state of the extra genes can be set manually, or it can be estimated from the data if the data contain multiple cell types with known relationships. In this example, CMPs are known to be at developmental upstream of erythro-myeloid differentiation, therefore initial state can be estimated by taking the average expression of the extra genes in CMPs.
478
479 ``` r
Dec 3, 2015 @cheeyeelim Final update
480 # (5) Estimating initial state for the extra genes. (estimating from CMPs)
481 tmp_istate = mean(cdata[grepl("cmp", rownames(cdata)), add_gene])
482 tmp_istate = matrix(round(tmp_istate), nrow = 1)
Nov 29, 2015 @cheeyeelim Generate readme
483 colnames(tmp_istate) = add_gene
484 grown_istate = cbind(istate, tmp_istate)
485 grown_istate = initialise_data(grown_istate)
486 ```
487
488 ### Run model training
489
490 To reconstruct a Boolean model from an expression data, run `model_train`.
491
492 In this example, `model_train` takes a few minutes to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above.
493
494 You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise using `plotBM`, which is based on `igraph` package. For easier manipulation, output the Boolean model using `outgraph_model` and display it with Cytoscape or Gephi.
495
496 *Note that this example takes a long time to run. The use of parallel processing is recommended.*
497
498 ``` r
Dec 3, 2015 @cheeyeelim Final update
499 # (6) Inferring Boolean model.
Mar 15, 2016 @cheeyeelim Changes for revision round.
500 final_model = model_train(cdata = fcdata, bmodel = grown_bmodel, istate = grown_istate,
501 verbose = T)
Nov 29, 2015 @cheeyeelim Generate readme
502
Dec 3, 2015 @cheeyeelim Final update
503 # (7) Visualise the Boolean model generated.
Nov 29, 2015 @cheeyeelim Generate readme
504 plotBM(final_model)
505 ```
506
Mar 15, 2016 @cheeyeelim Changes for revision round.
507 ![](vignettes/btr_files/figure-markdown_github/unnamed-chunk-31-1.png)<!-- -->