diff --git a/DESCRIPTION b/DESCRIPTION index 33ba5b1..295842b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,4 +1,4 @@ -Package: BoolTraineR +Package: BTR Type: Package Title: Tools For Training and Analysing Asynchronous Boolean Models Version: 1.1.3 diff --git a/NAMESPACE b/NAMESPACE index 64b9c48..4620035 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,4 +1,4 @@ -# Generated by roxygen2: do not edit by hand +# Generated by roxygen2 (4.1.1): do not edit by hand export(BoolModel) export(amat_to_bm) @@ -36,4 +36,4 @@ import(methods) import(parallel) importFrom(Rcpp,evalCpp) importFrom(Rcpp,sourceCpp) -useDynLib(BoolTraineR) +useDynLib(BTR) diff --git a/R/RcppExports.R b/R/RcppExports.R index ae6d217..0624ff9 100644 --- a/R/RcppExports.R +++ b/R/RcppExports.R @@ -9,7 +9,7 @@ #' @param inf_mat matrix. It should be adjacency matrix of inferred network. #' @param true_mat matrix. It should be adjacency matrix of true network. rcpp_validate <- function(inf_mat, true_mat) { - .Call('BoolTraineR_rcpp_validate', PACKAGE = 'BoolTraineR', inf_mat, true_mat) + .Call('BTR_rcpp_validate', PACKAGE = 'BTR', inf_mat, true_mat) } #' @title Simulate a Boolean model. @@ -21,6 +21,6 @@ rcpp_validate <- function(inf_mat, true_mat) { #' @param fstate data frame. It must have been initialised by initialise_data(), and has gene names as column names. Must contain only 1 row. #' @param verbose logical. Indicates whether to output progress. rcpp_simulate <- function(bmodel, fstate, verbose = FALSE) { - .Call('BoolTraineR_rcpp_simulate', PACKAGE = 'BoolTraineR', bmodel, fstate, verbose) + .Call('BTR_rcpp_simulate', PACKAGE = 'BTR', bmodel, fstate, verbose) } diff --git a/R/btr.R b/R/btr.R new file mode 100644 index 0000000..9b155d0 --- /dev/null +++ b/R/btr.R @@ -0,0 +1,19 @@ +#' @title BTR: A package for studying asynchronous Boolean models +#' +#' @description +#' This package contains tools for Boolean model manipulation, as well as the search for the best Boolean model. +#' +#' @docType package +#' @name BTR +NULL + +## All the Roxygen codes below are for generating the correct NAMESPACE file. +#' @import methods +#' @import parallel +#' @import foreach +#' @import doParallel +NULL + +#' @useDynLib BTR +#' @importFrom Rcpp sourceCpp evalCpp +NULL \ No newline at end of file diff --git a/README.md b/README.md index f5ca867..ea54168 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ - [Installation](#installation) - [Input data format](#input-data-format) - [Output format](#output-format) -- [Useful functions in BoolTraineR](#useful-functions-in-booltrainer) +- [Useful functions in BTR](#useful-functions-in-BTR) - [Example workflows](#example-workflows) - [Inferring model without an initial model](#inferring-model-without-an-initial-model) - [Full workflow](#full-workflow) @@ -33,26 +33,26 @@ Brief introduction ================== -`BoolTraineR` is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users. +`BTR` is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users. -Running `BoolTraineR` is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, `BoolTraineR` may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of `BoolTraineR`. This can be easily achieved by using `doParallel` package, as illustrated in the example. +Running `BTR` is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, `BTR` may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of `BTR`. This can be easily achieved by using `doParallel` package, as illustrated in the example. Note that the examples presented in this vignette are different from the results presented in our paper. The examples presented here have been simplified to speed up the processing time. Installation ============ -`BoolTraineR` can be installed from CRAN. +`BTR` can be installed from CRAN. ``` r -install.packages('BoolTraineR') +install.packages('BTR') ``` Or from Github for the latest version. To install from Gitbub, you will require the `devtools` package. ``` r install.packages('devtools') -devtools::install_github("cheeyeelim/booltrainer") +devtools::install_github("cheeyeelim/BTR") ``` Also install `doParallel` package if you intend to use parallel processing. @@ -127,22 +127,22 @@ head(krum_istate) Output format ============= -BoolTraineR supports several output formats for Boolean models, as shown below. +BTR supports several output formats for Boolean models, as shown below. - `outgraph_model` - Outputs a Boolean model in a tab-delimited file with each line being an edge (i.e. gene interaction). This function also outputs a node attribute file, which can be used to distinguish gene and AND nodes in a graph plotting software. This format is readable by both Cytoscape and Gephi. - `outgenysis_model` - Outputs a Boolean model in a space-delimited file with each line being an edge (i.e. gene interaction). This format is readable by genYsis (used for steady state analysis). - `writeBM` - Outputs a Boolean model in a comma-delimited file similar in format to the input file format (i.e. two columns: genes and update functions). -BoolTraineR can also output a state transition graph. +BTR can also output a state transition graph. - `outstate_graph` - Outputs a state space of a Boolean model simulated with an initial state. This format is readable by both Cytoscape and Gephi. -Useful functions in BoolTraineR +Useful functions in BTR =============================== -Besides training Boolean models, BoolTraineR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data. +Besides training Boolean models, BTR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data. -- `model_train` - Core function in `BoolTraineR` that performs Boolean model inference. +- `model_train` - Core function in `BTR` that performs Boolean model inference. - `simulate_model` - Simulate a Boolean model asynchronously using an initial state, and return its state space. - `calc_mscore` - Calculate a distance score for a Boolean model with respect to an expression data. - `model_dist` - Calculate the number of genes in the update functions that differ between two Boolean models. @@ -158,7 +158,7 @@ Inferring model without an initial model This workflow is intended for use on inferring a Boolean model without an initial model. -When no initial model is used, BoolTraineR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data. +When no initial model is used, BTR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data. ### Full workflow @@ -168,7 +168,7 @@ Full workflow is included here for easy referencing. Each step is discussed in f set.seed(0) #use to ensure reproducibility. remove in actual use. # (1) Setup paths and environment. -library(BoolTraineR) +library(BTR) # If intending to use parallel processing, uncomment the following lines. # library(doParallel) num_core = 4 #specify the number of cores to be used. @@ -201,13 +201,13 @@ plotBM(final_model) ### Initial setup -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. +The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. ``` r set.seed(0) #use to ensure reproducibility. remove in actual use. # (1) Setup paths and environment. -library(BoolTraineR) +library(BTR) # If intending to use parallel processing, uncomment the following lines. # library(doParallel) num_core = 4 #specify the number of cores to be used. @@ -279,7 +279,7 @@ Full workflow is included here for easy referencing. Each step is discussed in f set.seed(0) #use to ensure reproducibility. remove in actual use. # (1) Setup paths and environment. -library(BoolTraineR) +library(BTR) # If intending to use parallel processing, uncomment the following lines. # library(doParallel) num_core = 4 #specify the number of cores to be used. @@ -312,13 +312,13 @@ plotBM(final_model) ### Initial setup -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. +The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. ``` r set.seed(0) #use to ensure reproducibility. remove in actual use. # (1) Setup paths and environment. -library(BoolTraineR) +library(BTR) # If intending to use parallel processing, uncomment the following lines. # library(doParallel) num_core = 4 #specify the number of cores to be used. @@ -392,7 +392,7 @@ Full workflow is included here for easy referencing. Each step is discussed in f set.seed(0) #use to ensure reproducibility. remove in actual use. # (1) Setup paths and environment. -library(BoolTraineR) +library(BTR) # If intending to use parallel processing, uncomment the following lines. # library(doParallel) num_core = 4 #specify the number of cores to be used. @@ -438,13 +438,13 @@ plotBM(final_model) ### Initial setup -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. +The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. ``` r set.seed(0) #use to ensure reproducibility. remove in actual use. # (1) Setup paths and environment. -library(BoolTraineR) +library(BTR) # If intending to use parallel processing, uncomment the following lines. # library(doParallel) num_core = 4 #specify the number of cores to be used. diff --git a/man/BTR.Rd b/man/BTR.Rd new file mode 100644 index 0000000..d51fad2 --- /dev/null +++ b/man/BTR.Rd @@ -0,0 +1,11 @@ +% Generated by roxygen2 (4.1.1): do not edit by hand +% Please edit documentation in R/btr.R +\docType{package} +\name{BTR} +\alias{BTR} +\alias{BTR-package} +\title{BTR: A package for studying asynchronous Boolean models} +\description{ +This package contains tools for Boolean model manipulation, as well as the search for the best Boolean model. +} + diff --git a/man/BoolModel-class.Rd b/man/BoolModel-class.Rd index dce6959..f709c8b 100644 --- a/man/BoolModel-class.Rd +++ b/man/BoolModel-class.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/boolmodel_class.R \docType{class} \name{BoolModel-class} diff --git a/man/amat_to_bm.Rd b/man/amat_to_bm.Rd index e718a6d..1f40e2c 100644 --- a/man/amat_to_bm.Rd +++ b/man/amat_to_bm.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{amat_to_bm} \alias{amat_to_bm} diff --git a/man/bm_to_amat.Rd b/man/bm_to_amat.Rd index 64726ee..9518b00 100644 --- a/man/bm_to_amat.Rd +++ b/man/bm_to_amat.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{bm_to_amat} \alias{bm_to_amat} diff --git a/man/bm_to_df.Rd b/man/bm_to_df.Rd index 92bf2fe..f01f9a7 100644 --- a/man/bm_to_df.Rd +++ b/man/bm_to_df.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{bm_to_df} \alias{bm_to_df} diff --git a/man/bon_bmodel.Rd b/man/bon_bmodel.Rd index e87c3ef..0fa5960 100644 --- a/man/bon_bmodel.Rd +++ b/man/bon_bmodel.Rd @@ -1,10 +1,10 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{bon_bmodel} \alias{bon_bmodel} \title{HSC Boolean Model from Bonzanni et al.} -\format{A data frame with 11 rows and 2 columns. +\format{A data frame with 11 rows and 2 columns. Rows: each row consists of 1 gene and its associated Boolean rule. Column 1: target gene diff --git a/man/bon_istate.Rd b/man/bon_istate.Rd index e4e2f26..75e26f2 100644 --- a/man/bon_istate.Rd +++ b/man/bon_istate.Rd @@ -1,10 +1,10 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{bon_istate} \alias{bon_istate} \title{Initial state from Bonzanni et al.} -\format{A data frame with 1 row and 11 columns. +\format{A data frame with 1 row and 11 columns. Rows: each row consists of 1 set of Boolean state. Columns: each column is for 1 gene/variable.} diff --git a/man/calc_mscore.Rd b/man/calc_mscore.Rd index fef9140..a1224c5 100644 --- a/man/calc_mscore.Rd +++ b/man/calc_mscore.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/score_calculation.R \name{calc_mscore} \alias{calc_mscore} diff --git a/man/calc_roc.Rd b/man/calc_roc.Rd index bd11a3c..e89015f 100644 --- a/man/calc_roc.Rd +++ b/man/calc_roc.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/score_calculation.R \name{calc_roc} \alias{calc_roc} diff --git a/man/check_and.Rd b/man/check_and.Rd index 2d14d1a..d638251 100644 --- a/man/check_and.Rd +++ b/man/check_and.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{check_and} \alias{check_and} diff --git a/man/compress_bmodel.Rd b/man/compress_bmodel.Rd index a59b23f..a1c8f2d 100644 --- a/man/compress_bmodel.Rd +++ b/man/compress_bmodel.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/compression.R \name{compress_bmodel} \alias{compress_bmodel} @@ -14,7 +14,7 @@ compress_bmodel(bmodel, encoding, max_varperrule) \item{max_varperrule}{integer. Maximum number of terms per rule (combining both act and inh rule). Note that this number must not be smaller than number of variables. Default to 6.} } \description{ -This function compresses S4 BoolModel object by representing variables using numbers, and also only the act rules and inh rules are kept. +This function compresses S4 BoolModel object by representing variables using numbers, and also only the act rules and inh rules are kept. Return a list of 3 vectors, corresponding to act rules and inh rules. } diff --git a/man/decompress_bmodel.Rd b/man/decompress_bmodel.Rd index cdd462d..e846067 100644 --- a/man/decompress_bmodel.Rd +++ b/man/decompress_bmodel.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/compression.R \name{decompress_bmodel} \alias{decompress_bmodel} @@ -16,7 +16,7 @@ decompress_bmodel(x, encoding, gene = NULL, format = "bmodel") \item{format}{character. Specifies which format to return. Possible values: 'bmodel', 'df', 'amat', 'simp_df'. Default to 'bmodel'.} } \description{ -This function decompresses the bmodel compressed by compress_bmodel(). +This function decompresses the bmodel compressed by compress_bmodel(). Return a S4 BoolModel object. } diff --git a/man/decreate_boolmodel.Rd b/man/decreate_boolmodel.Rd index f5e7602..6b3688d 100644 --- a/man/decreate_boolmodel.Rd +++ b/man/decreate_boolmodel.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/simulation.R \name{decreate_boolmodel} \alias{decreate_boolmodel} diff --git a/man/df_to_bm.Rd b/man/df_to_bm.Rd index 58e4305..04923a2 100644 --- a/man/df_to_bm.Rd +++ b/man/df_to_bm.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{df_to_bm} \alias{df_to_bm} diff --git a/man/emodel1.Rd b/man/emodel1.Rd index f51cca0..b51d416 100644 --- a/man/emodel1.Rd +++ b/man/emodel1.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{emodel1} diff --git a/man/emodel2.Rd b/man/emodel2.Rd index 0d3f435..7c26d86 100644 --- a/man/emodel2.Rd +++ b/man/emodel2.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{emodel2} diff --git a/man/emodel3.Rd b/man/emodel3.Rd index ad434d4..a68d0ff 100644 --- a/man/emodel3.Rd +++ b/man/emodel3.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{emodel3} diff --git a/man/eval_bool.Rd b/man/eval_bool.Rd index 1ef8ba4..d38c7ea 100644 --- a/man/eval_bool.Rd +++ b/man/eval_bool.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/simulation.R \name{eval_bool} \alias{eval_bool} diff --git a/man/extract_term.Rd b/man/extract_term.Rd index ae8b98c..5e372ba 100644 --- a/man/extract_term.Rd +++ b/man/extract_term.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{extract_term} \alias{extract_term} diff --git a/man/filter_dflist.Rd b/man/filter_dflist.Rd index 6264b88..aa85f85 100644 --- a/man/filter_dflist.Rd +++ b/man/filter_dflist.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{filter_dflist} \alias{filter_dflist} diff --git a/man/gen_one_rmodel.Rd b/man/gen_one_rmodel.Rd index 32cb837..5cdbfea 100644 --- a/man/gen_one_rmodel.Rd +++ b/man/gen_one_rmodel.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/rand_model.R \name{gen_one_rmodel} \alias{gen_one_rmodel} diff --git a/man/gen_singlerule.Rd b/man/gen_singlerule.Rd index f3b021a..2968654 100644 --- a/man/gen_singlerule.Rd +++ b/man/gen_singlerule.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/rand_model.R \name{gen_singlerule} \alias{gen_singlerule} diff --git a/man/gen_two_rmodel.Rd b/man/gen_two_rmodel.Rd index 567c245..9ccfb22 100644 --- a/man/gen_two_rmodel.Rd +++ b/man/gen_two_rmodel.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/rand_model.R \name{gen_two_rmodel} \alias{gen_two_rmodel} @@ -21,7 +21,7 @@ gen_two_rmodel(var, steps, mvar = length(var), and_bool = F, \item{self_loop}{logical. Indicates whether to allow self_loop. Default to F.} } \description{ -This function generates a random Boolean model, then get another random Boolean model that is a specified number of steps apart by adding and/or removing genes. +This function generates a random Boolean model, then get another random Boolean model that is a specified number of steps apart by adding and/or removing genes. Returns a list of two S4 BoolModel objects. } \details{ diff --git a/man/gen_two_rmodel_dag.Rd b/man/gen_two_rmodel_dag.Rd index 6e1b31b..d62f2e1 100644 --- a/man/gen_two_rmodel_dag.Rd +++ b/man/gen_two_rmodel_dag.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/rand_model.R \name{gen_two_rmodel_dag} \alias{gen_two_rmodel_dag} @@ -19,7 +19,7 @@ gen_two_rmodel_dag(var, steps, mvar = length(var), in_amat = NULL, \item{acyclic}{logical. Whether to restrict the model to being acyclic or not. Defaults to TRUE.} } \description{ -This function generates a random DAG Boolean model, then get another random DAG Boolean model that is a specified number of steps apart by adding and/or removing genes. +This function generates a random DAG Boolean model, then get another random DAG Boolean model that is a specified number of steps apart by adding and/or removing genes. Difficult to generate completely directed graph with a specified number of steps apart. } diff --git a/man/get_encodings.Rd b/man/get_encodings.Rd index 9a02613..f09f216 100644 --- a/man/get_encodings.Rd +++ b/man/get_encodings.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/compression.R \name{get_encodings} \alias{get_encodings} diff --git a/man/grow_bmodel.Rd b/man/grow_bmodel.Rd index f1d41ac..2599274 100644 --- a/man/grow_bmodel.Rd +++ b/man/grow_bmodel.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/model_modification.R \name{grow_bmodel} \alias{grow_bmodel} diff --git a/man/initialise_data.Rd b/man/initialise_data.Rd index 871034e..b0e3ceb 100644 --- a/man/initialise_data.Rd +++ b/man/initialise_data.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/initialisation.R \name{initialise_data} \alias{initialise_data} diff --git a/man/initialise_model.Rd b/man/initialise_model.Rd index 3a1a799..f4bd251 100644 --- a/man/initialise_model.Rd +++ b/man/initialise_model.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/initialisation.R \name{initialise_model} \alias{initialise_model} diff --git a/man/initialise_raw_data.Rd b/man/initialise_raw_data.Rd index 3aacc58..2c0c957 100644 --- a/man/initialise_raw_data.Rd +++ b/man/initialise_raw_data.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/initialisation.R \name{initialise_raw_data} \alias{initialise_raw_data} diff --git a/man/krum_bmodel.Rd b/man/krum_bmodel.Rd index 78a5d67..579807b 100644 --- a/man/krum_bmodel.Rd +++ b/man/krum_bmodel.Rd @@ -1,10 +1,10 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{krum_bmodel} \alias{krum_bmodel} \title{Myeloid Boolean Model from Krumsiek et al.} -\format{A data frame with 11 rows and 2 columns. +\format{A data frame with 11 rows and 2 columns. Rows: each row consists of 1 gene and its associated Boolean rule. Column 1: target gene diff --git a/man/krum_istate.Rd b/man/krum_istate.Rd index a97c9a9..dcd8f2d 100644 --- a/man/krum_istate.Rd +++ b/man/krum_istate.Rd @@ -1,10 +1,10 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{krum_istate} \alias{krum_istate} \title{Initial state from Krumsiek et al.} -\format{A data frame with 1 row and 11 columns. +\format{A data frame with 1 row and 11 columns. Rows: each row consists of 1 set of Boolean state. Columns: each column is for 1 gene/variable.} diff --git a/man/m_score.Rd b/man/m_score.Rd index 21f2ed8..010bd58 100644 --- a/man/m_score.Rd +++ b/man/m_score.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/score_calculation.R \name{m_score} \alias{m_score} diff --git a/man/man_dist.Rd b/man/man_dist.Rd index 1e5be3a..f2be266 100644 --- a/man/man_dist.Rd +++ b/man/man_dist.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/score_calculation.R \name{man_dist} \alias{man_dist} diff --git a/man/match_term.Rd b/man/match_term.Rd index b3b3573..4751b42 100644 --- a/man/match_term.Rd +++ b/man/match_term.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{match_term} \alias{match_term} diff --git a/man/minmod_internal.Rd b/man/minmod_internal.Rd index e1750da..4eb7b88 100644 --- a/man/minmod_internal.Rd +++ b/man/minmod_internal.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/model_modification.R \name{minmod_internal} \alias{minmod_internal} diff --git a/man/minmod_model.Rd b/man/minmod_model.Rd index e7c5b86..13ea234 100644 --- a/man/minmod_model.Rd +++ b/man/minmod_model.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/model_modification.R \name{minmod_model} \alias{minmod_model} diff --git a/man/model_consensus.Rd b/man/model_consensus.Rd index 739614f..e77c7c1 100644 --- a/man/model_consensus.Rd +++ b/man/model_consensus.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/search.R \name{model_consensus} \alias{model_consensus} diff --git a/man/model_dist.Rd b/man/model_dist.Rd index e915fb2..f785b0f 100644 --- a/man/model_dist.Rd +++ b/man/model_dist.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{model_dist} \alias{model_dist} diff --git a/man/model_setdiff.Rd b/man/model_setdiff.Rd index e895aab..2fa5646 100644 --- a/man/model_setdiff.Rd +++ b/man/model_setdiff.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{model_setdiff} \alias{model_setdiff} diff --git a/man/model_train.Rd b/man/model_train.Rd index 490637c..c9a0315 100644 --- a/man/model_train.Rd +++ b/man/model_train.Rd @@ -1,13 +1,12 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/search.R \name{model_train} \alias{model_train} \title{Training Model} \usage{ model_train(cdata, ddata = NULL, bmodel = NULL, istate = NULL, - max_varperrule = 6, preprocess = T, max_expr = "high", and_bool = T, - self_loop = F, con_thre = 0.3, tol = 1e-06, verbose = F, - detailed_output = F) + max_varperrule = 6, and_bool = T, self_loop = F, con_thre = 0.3, + tol = 1e-06, verbose = F, detailed_output = F) } \arguments{ \item{cdata}{data frame of expression data. Should have state(row) x gene(column).} @@ -20,10 +19,6 @@ model_train(cdata, ddata = NULL, bmodel = NULL, istate = NULL, \item{max_varperrule}{integer. Maximum number of terms per rule (combining both act and inh rule). Note that this number must be higher than number of genes. Defaults to 6.} -\item{preprocess}{logical. Whether to preprocess expression data. Default to T.} - -\item{max_expr}{character. Only use when preprocess==T. Specify whether max expression value is the lowest (as in qPCR), or the highest (as in RNAseq and microarray). Option: 'low', 'high'. Default to 'high'.} - \item{and_bool}{logical. Whether to consider AND terms. IF bmodel is not NULL, defaults to whether AND interaction is included in bmodel. If bmodel is NULL, then defaults to TRUE.} \item{self_loop}{logical. Whether to allow self_loop in random starting model. Only used if is.null(bmodel). Default to F.} diff --git a/man/outgenysis_model.Rd b/man/outgenysis_model.Rd index 4f5179a..8c0a9cc 100644 --- a/man/outgenysis_model.Rd +++ b/man/outgenysis_model.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/output_format.R \name{outgenysis_model} \alias{outgenysis_model} diff --git a/man/outgraph_model.Rd b/man/outgraph_model.Rd index 436dcac..c3edd9d 100644 --- a/man/outgraph_model.Rd +++ b/man/outgraph_model.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/output_format.R \name{outgraph_model} \alias{outgraph_model} diff --git a/man/outstate_graph.Rd b/man/outstate_graph.Rd index 86cf208..0ec40fe 100644 --- a/man/outstate_graph.Rd +++ b/man/outstate_graph.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/output_format.R \name{outstate_graph} \alias{outstate_graph} diff --git a/man/plotBM.Rd b/man/plotBM.Rd index d03c437..8e7d538 100644 --- a/man/plotBM.Rd +++ b/man/plotBM.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{plotBM} \alias{plotBM} diff --git a/man/printBM.Rd b/man/printBM.Rd index 8f8115b..2d8b0ad 100644 --- a/man/printBM.Rd +++ b/man/printBM.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{printBM} \alias{printBM} diff --git a/man/rcpp_simulate.Rd b/man/rcpp_simulate.Rd index c4db52f..4805cf5 100644 --- a/man/rcpp_simulate.Rd +++ b/man/rcpp_simulate.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/RcppExports.R \name{rcpp_simulate} \alias{rcpp_simulate} diff --git a/man/rcpp_validate.Rd b/man/rcpp_validate.Rd index 0aaeff8..6cd1548 100644 --- a/man/rcpp_validate.Rd +++ b/man/rcpp_validate.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/RcppExports.R \name{rcpp_validate} \alias{rcpp_validate} diff --git a/man/simulate_model.Rd b/man/simulate_model.Rd index 2b9650c..952490f 100644 --- a/man/simulate_model.Rd +++ b/man/simulate_model.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/simulation.R \name{simulate_model} \alias{simulate_model} diff --git a/man/unique_raw_data.Rd b/man/unique_raw_data.Rd index 79207b1..0cd4620 100644 --- a/man/unique_raw_data.Rd +++ b/man/unique_raw_data.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/initialisation.R \name{unique_raw_data} \alias{unique_raw_data} diff --git a/man/validate_adjmat.Rd b/man/validate_adjmat.Rd index 0e17ed1..159e7b4 100644 --- a/man/validate_adjmat.Rd +++ b/man/validate_adjmat.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/score_calculation.R \name{validate_adjmat} \alias{validate_adjmat} diff --git a/man/vcat.Rd b/man/vcat.Rd index b33a080..2274b5d 100644 --- a/man/vcat.Rd +++ b/man/vcat.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{vcat} \alias{vcat} diff --git a/man/which.random.min.Rd b/man/which.random.min.Rd index 2f75a73..1ca082e 100644 --- a/man/which.random.min.Rd +++ b/man/which.random.min.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/general.R \name{which.random.min} \alias{which.random.min} diff --git a/man/wilson_raw_data.Rd b/man/wilson_raw_data.Rd index ec91f7b..f3e2fa5 100644 --- a/man/wilson_raw_data.Rd +++ b/man/wilson_raw_data.Rd @@ -1,10 +1,10 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{wilson_raw_data} \alias{wilson_raw_data} \title{Raw single cell qRT-PCR expression data from Wilson et al.} -\format{A data frame with 1626 rows and 44 columns. +\format{A data frame with 1626 rows and 44 columns. Rows: each row consists of raw expression values from 1 cell. Columns: each column is for 1 gene/variable.} diff --git a/man/wilson_raw_rnaseq.Rd b/man/wilson_raw_rnaseq.Rd index c582799..3c94553 100644 --- a/man/wilson_raw_rnaseq.Rd +++ b/man/wilson_raw_rnaseq.Rd @@ -1,10 +1,10 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/data_desc.R \docType{data} \name{wilson_raw_rnaseq} \alias{wilson_raw_rnaseq} \title{Raw single cell RNAseq expression data from Wilson et al.} -\format{A data frame with 96 rows and 38498 columns. +\format{A data frame with 96 rows and 38498 columns. Rows: each row consists of raw expression values from 1 cell. Columns: each column is for 1 gene/variable.} diff --git a/man/writeBM.Rd b/man/writeBM.Rd index 5536855..1fef686 100644 --- a/man/writeBM.Rd +++ b/man/writeBM.Rd @@ -1,4 +1,4 @@ -% Generated by roxygen2: do not edit by hand +% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/methods.R \name{writeBM} \alias{writeBM} diff --git a/src/RcppExports.cpp b/src/RcppExports.cpp index 7e0bd3a..ef92ec5 100644 --- a/src/RcppExports.cpp +++ b/src/RcppExports.cpp @@ -7,7 +7,7 @@ using namespace Rcpp; // rcpp_validate Rcpp::NumericVector rcpp_validate(Rcpp::NumericMatrix inf_mat, Rcpp::NumericMatrix true_mat); -RcppExport SEXP BoolTraineR_rcpp_validate(SEXP inf_matSEXP, SEXP true_matSEXP) { +RcppExport SEXP BTR_rcpp_validate(SEXP inf_matSEXP, SEXP true_matSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; @@ -19,7 +19,7 @@ END_RCPP } // rcpp_simulate Rcpp::List rcpp_simulate(Rcpp::List bmodel, Rcpp::LogicalVector fstate, bool verbose); -RcppExport SEXP BoolTraineR_rcpp_simulate(SEXP bmodelSEXP, SEXP fstateSEXP, SEXP verboseSEXP) { +RcppExport SEXP BTR_rcpp_simulate(SEXP bmodelSEXP, SEXP fstateSEXP, SEXP verboseSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; diff --git a/vignettes/booltrainer.html b/vignettes/booltrainer.html deleted file mode 100644 index f77238a..0000000 --- a/vignettes/booltrainer.html +++ /dev/null @@ -1,604 +0,0 @@ - - - - - - - - - - - - - - -Using BoolTraineR to reconstruct asynchronous Boolean models - - - - - - - - - - - - - - - - - - -
- -
- - -
-

1 Brief introduction

-

BoolTraineR is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users.

-

Running BoolTraineR is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, BoolTraineR may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of BoolTraineR. This can be easily achieved by using doParallel package, as illustrated in the example.

-

Note that the examples presented in this vignette are different from the results presented in our paper. The examples presented here have been simplified to speed up the processing time.

-
-
-

2 Installation

-

BoolTraineR can be installed from CRAN.

-
install.packages('BoolTraineR')
-

Or from Github for the latest version. To install from Gitbub, you will require the devtools package.

-
install.packages('devtools')
-devtools::install_github("cheeyeelim/booltrainer")
-

Also install doParallel package if you intend to use parallel processing.

-
-
-

3 Input data format

-

Depending on the analysis, only 3 types of data will ever be needed. The format of the data required is discussed below.

-
    -
  1. Expression data. A matrix with genes on the columns, and cells on the row.
  2. -
-

The expression data should be preprocessed as in any standard sequencing data processing pipelines, which includes quality control filtering and normalisation.

-

Use initialise_raw_data to convert expression data into a suitable format for model inference. It is recommended to use initialise_raw_data before subsetting the expression data for preferred cell types.

-
data(wilson_raw_data)
-round(wilson_raw_data[1:5, 1:5], 4)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
bptfcbfa2t3hcsf1rdnmt3aeif2b1
lmpp_0021.02612.39442.68471.66362.0203
lmpp_0032.64961.78001.68211.59412.7736
lmpp_00410.30800.58894.2653-0.55650.0026
lmpp_0070.54191.863110.84680.17571.0873
lmpp_0080.92092.66372.85492.19652.3663
-
edata = initialise_raw_data(wilson_raw_data, max_expr='low') #max_expr='low' because this is qPCR data.
-
    -
  1. Initial Boolean model. A data frame with two columns, targets and update functions.
  2. -
-

Note that if an update function contains both activation and inhibition genes, they must be expressed with a separate clause containing only activation genes, and a separate clause containing only inhibition genes. (See the update functions of Gata1 and Gata2 for examples)

-

Use initialise_model to convert the input Boolean model into a BoolModel object.

-
data(krum_bmodel)
-head(krum_bmodel)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
targetsfactors
gata2gata2 & ! ((gata1 & fog1) | sfpi1)
gata1(gata1 | gata2 | fli1) & ! sfpi1
fog1gata1
eklfgata1 & ! fli1
fli1gata1 & ! eklf
sclgata1 & ! sfpi1
-
bmodel = initialise_model(krum_bmodel)
-
    -
  1. Initial state.
  2. -
-

A single row data frame with genes as the columns. The expression state of each gene must be in binarised form, i.e. 0s and 1s.

-

Note that all the genes that are present in the initial Boolean model must also be present here.

-
data(krum_istate)
-head(krum_istate)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cjuncebpafli1gata1gata2eklfsfpi1gfi1sclegrnabfog1
initial_state01001010000
-
-
-

4 Output format

-

BoolTraineR supports several output formats for Boolean models, as shown below.

- -

BoolTraineR can also output a state transition graph.

- -
-
-

5 Useful functions in BoolTraineR

-

Besides training Boolean models, BoolTraineR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data.

- -
-
-

6 Example workflows

-

Three example workflows will be discussed in this vignette: (1) Inferring model without an initial model, (2) Inferring model with an initial model, (3) Extending model with more genes. The two workflows are largely similar, which only differ in the data preparation step.

-
-

6.1 Inferring model without an initial model

-

This workflow is intended for use on inferring a Boolean model without an initial model.

-

When no initial model is used, BoolTraineR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data.

-
-

6.1.1 Full workflow

-

Full workflow is included here for easy referencing. Each step is discussed in further details below.

-
set.seed(0)  #use to ensure reproducibility. remove in actual use.
-
-# (1) Setup paths and environment.
-library(BoolTraineR)
-
-# If intending to use parallel processing, uncomment the following lines.
-# library(doParallel) num_core = 4 #specify the number of cores to be used.
-# doParallel::registerDoParallel(cores=num_core)
-
-# (2) Load data.
-data(wilson_raw_data)  #load a data frame of expression data.
-tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low")
-cdata = tmp_data[[1]]  #continuous data
-ddata = tmp_data[[2]]  #discretised data
-
-# (3) Filter cell types.
-cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", 
-    rownames(cdata))
-fcdata = cdata[cell_ind, ]  #select only relevant cells.
-fddata = ddata[cell_ind, ]
-
-# (4) Filter genes.
-gene_ind = c("fli1", "gata1", "gata2", "gfi1", "scl", "sfpi1")  #select genes to be included.
-fcdata = fcdata[, gene_ind]
-fddata = fddata[, gene_ind]
-
-# (5) Inferring Boolean model.
-final_model = model_train(cdata = fcdata, ddata = fddata, max_varperrule = 4, 
-    verbose = T)
-
-# (6) Visualise the Boolean model generated.
-plotBM(final_model)
-
-
-

6.1.2 Initial setup

-

The first step is to load the BoolTraineR package. If you are intending to use parallel processing, you will also need to load the doParallel package. Then specify how many cores you intend to use using registerDoParallel from the doParallel package.

-
set.seed(0)  #use to ensure reproducibility. remove in actual use.
-
-# (1) Setup paths and environment.
-library(BoolTraineR)
-
-# If intending to use parallel processing, uncomment the following lines.
-# library(doParallel) num_core = 4 #specify the number of cores to be used.
-# doParallel::registerDoParallel(cores=num_core)
-
-
-

6.1.3 Data preparation

-

Only the expression data is needed for inferring a Boolean model without an initial model.

-

To load the data into R, use read.table or read.csv. In this example, we are using the example data included with the package, so we are accessing it by using data.

-

initialise_raw_data is used to preprocess the data.

-
# (2) Load data.
-data(wilson_raw_data)  #load a data frame of expression data.
-tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low")
-cdata = tmp_data[[1]]  #continuous data
-ddata = tmp_data[[2]]  #discretised data
-

Once data is loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete.

-
# (3) Filter cell types.
-cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", 
-    rownames(cdata))
-fcdata = cdata[cell_ind, ]  #select only relevant cells.
-fddata = ddata[cell_ind, ]
-
-# (4) Filter genes.
-gene_ind = c("fli1", "gata1", "gata2", "gfi1", "scl", "sfpi1")  #select genes to be included.
-fcdata = fcdata[, gene_ind]
-fddata = fddata[, gene_ind]
-
-
-

6.1.4 Run model training

-

To reconstruct a Boolean model from an expression data, run model_train.

-

In this example, model_train takes a few seconds to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above.

-

You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise quickly using plotBM, which is based on igraph package. For easier manipulation, output the Boolean model using outgraph_model and display it with Cytoscape or Gephi.

-
# (5) Inferring Boolean model.
-final_model = model_train(cdata = fcdata, ddata = fddata, max_varperrule = 4, 
-    verbose = T)
-
-# (6) Visualise the Boolean model generated.
-plotBM(final_model)
-

-
-
-
-

6.2 Inferring model with an initial model

-

This workflow is intended for use on inferring a Boolean model with an initial model.

-

When an initial model is used, note that only genes that are both present in the initial model and expression data will be used for reconstructing gene interactions. Any genes in the initial model that do not have corresponding expression values in the data will keep their original gene interactions as specified in the initial model without any modifications.

-
-

6.2.1 Full workflow

-

Full workflow is included here for easy referencing. Each step is discussed in further details below.

-
set.seed(0)  #use to ensure reproducibility. remove in actual use.
-
-# (1) Setup paths and environment.
-library(BoolTraineR)
-
-# If intending to use parallel processing, uncomment the following lines.
-# library(doParallel) num_core = 4 #specify the number of cores to be used.
-# doParallel::registerDoParallel(cores=num_core)
-
-# (2) Load data.
-data(krum_bmodel)  #load a data frame of Boolean model.
-data(krum_istate)  #load a data frame of initial state.
-data(wilson_raw_data)  #load a data frame of expression data.
-
-bmodel = initialise_model(krum_bmodel)
-istate = krum_istate
-tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low")
-cdata = tmp_data[[1]]  #continuous data
-ddata = tmp_data[[2]]  #discretised data
-
-# (3) Filter cell types.
-cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", 
-    rownames(cdata))
-fcdata = cdata[cell_ind, ]  #select only relevant cells.
-fddata = ddata[cell_ind, ]
-
-# (4) Inferring Boolean model.
-final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = bmodel, istate = istate, 
-    max_varperrule = 4, verbose = T)
-
-# (5) Visualise the Boolean model generated.
-plotBM(final_model)
-
-
-

6.2.2 Initial setup

-

The first step is to load the BoolTraineR package. If you are intending to use parallel processing, you will also need to load the doParallel package. Then specify how many cores you intend to use using registerDoParallel from the doParallel package.

-
set.seed(0)  #use to ensure reproducibility. remove in actual use.
-
-# (1) Setup paths and environment.
-library(BoolTraineR)
-
-# If intending to use parallel processing, uncomment the following lines.
-# library(doParallel) num_core = 4 #specify the number of cores to be used.
-# doParallel::registerDoParallel(cores=num_core)
-
-
-

6.2.3 Data preparation

-

3 pieces of data are needed to infer a Boolean model with an initial model: an expression data, an initial Boolean model and an initial state.

-

To load the data into R, use read.table or read.csv. In this example, we are using the example data included with the package, so we are accessing it by using data.

-

initialise_model converts the data frame containing the Boolean model into a BoolModel object. initialise_raw_data is used to preprocess the data.

-
# (2) Load data. (2) Load data.
-data(krum_bmodel)  #load a data frame of Boolean model.
-data(krum_istate)  #load a data frame of initial state.
-data(wilson_raw_data)  #load a data frame of expression data.
-
-bmodel = initialise_model(krum_bmodel)
-istate = krum_istate
-tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low")
-cdata = tmp_data[[1]]  #continuous data
-ddata = tmp_data[[2]]  #discretised data
-

Once data are loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. In this example, genes are not filtered as all genes that are present in both expression data and Boolean model are used automatically.

-
# (3) Filter cell types.
-cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", 
-    rownames(cdata))
-fcdata = cdata[cell_ind, ]  #select only relevant cells.
-fddata = ddata[cell_ind, ]
-
-
-

6.2.4 Run model training

-

To reconstruct a Boolean model from an expression data, run model_train.

-

In this example, model_train takes one or two minutes to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above.

-

You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise using plotBM, which is based on igraph package. For easier manipulation, output the Boolean model using outgraph_model and display it with Cytoscape or Gephi.

-
# (4) Inferring Boolean model.
-final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = bmodel, istate = istate, 
-    max_varperrule = 4, verbose = T)
-
-# (5) Visualise the Boolean model generated.
-plotBM(final_model)
-

-
-
-
-

6.3 Extending model with more genes

-

This workflow is intended for use on extending an initial Boolean model with additional genes.

-

When an initial model is used, note that only genes that are both present in the initial model and expression data will be used for reconstructing gene interactions. Any genes in the initial model that do not have corresponding expression values in the data will keep their original gene interactions as specified in the initial model without any modifications.

-
-

6.3.1 Full workflow

-

Full workflow is included here for easy referencing. Each step is discussed in further details below.

-

Note that this example takes a few minutes to run on a single core. The use of parallel processing is recommended.

-
set.seed(0)  #use to ensure reproducibility. remove in actual use.
-
-# (1) Setup paths and environment.
-library(BoolTraineR)
-
-# If intending to use parallel processing, uncomment the following lines.
-# library(doParallel) num_core = 4 #specify the number of cores to be used.
-# doParallel::registerDoParallel(cores=num_core)
-
-# (2) Load data.
-data(krum_bmodel)  #load a data frame of Boolean model.
-data(krum_istate)  #load a data frame of initial state.
-data(wilson_raw_data)  #load a data frame of expression data.
-
-bmodel = initialise_model(krum_bmodel)
-istate = krum_istate
-tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low")
-cdata = tmp_data[[1]]  #continuous data
-ddata = tmp_data[[2]]  #discretised data
-
-# (3) Filter cell types.
-cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", 
-    rownames(cdata))
-fcdata = cdata[cell_ind, ]  #select only relevant cells.
-fddata = ddata[cell_ind, ]
-
-# (4) Adding extra genes to the initial Boolean model. extra_genes =
-# setdiff(colnames(wilson_raw_data), bmodel@target) #to view available genes
-# to be added. print(extra_genes) #to view available genes to be added.
-add_gene = "ldb1"  #genes to be added: ldb1
-grown_bmodel = grow_bmodel(add_gene, bmodel)
-
-# (5) Estimating initial state for the extra genes. (estimating from CMPs)
-tmp_istate = mean(cdata[grepl("cmp", rownames(cdata)), add_gene])
-tmp_istate = matrix(round(tmp_istate), nrow = 1)
-colnames(tmp_istate) = add_gene
-grown_istate = cbind(istate, tmp_istate)
-grown_istate = initialise_data(grown_istate)
-
-# (6) Inferring Boolean model.
-final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = grown_bmodel, 
-    istate = grown_istate, verbose = T)
-
-# (7) Visualise the Boolean model generated.
-plotBM(final_model)
-
-
-

6.3.2 Initial setup

-

The first step is to load the BoolTraineR package. If you are intending to use parallel processing, you will also need to load the doParallel package. Then specify how many cores you intend to use using registerDoParallel from the doParallel package.

-
set.seed(0)  #use to ensure reproducibility. remove in actual use.
-
-# (1) Setup paths and environment.
-library(BoolTraineR)
-
-# If intending to use parallel processing, uncomment the following lines.
-# library(doParallel) num_core = 4 #specify the number of cores to be used.
-# doParallel::registerDoParallel(cores=num_core)
-
-
-

6.3.3 Data preparation

-

3 pieces of data are needed to infer a Boolean model with an initial model: an expression data, an initial Boolean model and an initial state.

-

To load the data into R, use read.table or read.csv. In this example, we are using the example data included with the package, so we are accessing it by using data.

-

initialise_model converts the data frame containing the Boolean model into a BoolModel object. initialise_raw_data is used to preprocess the data.

-
# (2) Load data.
-data(krum_bmodel)  #load a data frame of Boolean model.
-data(krum_istate)  #load a data frame of initial state.
-data(wilson_raw_data)  #load a data frame of expression data.
-
-bmodel = initialise_model(krum_bmodel)
-istate = krum_istate
-tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low")
-cdata = tmp_data[[1]]  #continuous data
-ddata = tmp_data[[2]]  #discretised data
-

Once data are loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. In this example, genes are not filtered as all genes that are present in both expression data and Boolean model are used automatically.

-
# (3) Filter cell types.
-cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", 
-    rownames(cdata))
-fcdata = cdata[cell_ind, ]  #select only relevant cells.
-fddata = ddata[cell_ind, ]
-
-
-

6.3.4 Add extra genes to the initial Boolean model

-

Extra genes can be added to the initial model using grow_bmodel. The function will add extra genes into the initial model with empty update functions.

-
# (4) Adding extra genes to the initial Boolean model. extra_genes =
-# setdiff(colnames(wilson_raw_data), bmodel@target) print(extra_genes) #to
-# view available genes to be added.
-add_gene = "ldb1"  #genes to be added: ldb1
-grown_bmodel = grow_bmodel(add_gene, bmodel)
-
-
-

6.3.5 Estimate initial state for the extra genes

-

Initial state needs to be modify to include the initial expression of the extra genes. The initial state of the extra genes can be set manually, or it can be estimated from the data if the data contain multiple cell types with known relationships. In this example, CMPs are known to be at developmental upstream of erythro-myeloid differentiation, therefore initial state can be estimated by taking the average expression of the extra genes in CMPs.

-
# (5) Estimating initial state for the extra genes. (estimating from CMPs)
-tmp_istate = mean(cdata[grepl("cmp", rownames(cdata)), add_gene])
-tmp_istate = matrix(round(tmp_istate), nrow = 1)
-colnames(tmp_istate) = add_gene
-grown_istate = cbind(istate, tmp_istate)
-grown_istate = initialise_data(grown_istate)
-
-
-

6.3.6 Run model training

-

To reconstruct a Boolean model from an expression data, run model_train.

-

In this example, model_train takes a few minutes to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above.

-

You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise using plotBM, which is based on igraph package. For easier manipulation, output the Boolean model using outgraph_model and display it with Cytoscape or Gephi.

-

Note that this example takes a long time to run. The use of parallel processing is recommended.

-
# (6) Inferring Boolean model.
-final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = grown_bmodel, 
-    istate = grown_istate, verbose = T)
-
-# (7) Visualise the Boolean model generated.
-plotBM(final_model)
-

-
-
-
- - - - - - - - diff --git a/vignettes/booltrainer.md b/vignettes/booltrainer.md deleted file mode 100644 index 6fe73b5..0000000 --- a/vignettes/booltrainer.md +++ /dev/null @@ -1,529 +0,0 @@ -- [Brief introduction](#brief-introduction) -- [Installation](#installation) -- [Input data format](#input-data-format) -- [Output format](#output-format) -- [Useful functions in BoolTraineR](#useful-functions-in-booltrainer) -- [Example workflows](#example-workflows) - - [Inferring model without an initial model](#inferring-model-without-an-initial-model) - - [Full workflow](#full-workflow) - - [Initial setup](#initial-setup) - - [Data preparation](#data-preparation) - - [Run model training](#run-model-training) - - [Inferring model with an initial model](#inferring-model-with-an-initial-model) - - [Full workflow](#full-workflow-1) - - [Initial setup](#initial-setup-1) - - [Data preparation](#data-preparation-1) - - [Run model training](#run-model-training-1) - - [Extending model with more genes](#extending-model-with-more-genes) - - [Full workflow](#full-workflow-2) - - [Initial setup](#initial-setup-2) - - [Data preparation](#data-preparation-2) - - [Add extra genes to the initial Boolean model](#add-extra-genes-to-the-initial-boolean-model) - - [Estimate initial state for the extra genes](#estimate-initial-state-for-the-extra-genes) - - [Run model training](#run-model-training-2) - - -Brief introduction -================== - -`BoolTraineR` is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users. - -Running `BoolTraineR` is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, `BoolTraineR` may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of `BoolTraineR`. This can be easily achieved by using `doParallel` package, as illustrated in the example. - -Note that the examples presented in this vignette are different from the results presented in our paper. The examples presented here have been simplified to speed up the processing time. - -Installation -============ - -`BoolTraineR` can be installed from CRAN. - -``` r -install.packages('BoolTraineR') -``` - -Or from Github for the latest version. To install from Gitbub, you will require the `devtools` package. - -``` r -install.packages('devtools') -devtools::install_github("cheeyeelim/booltrainer") -``` - -Also install `doParallel` package if you intend to use parallel processing. - -Input data format -================= - -Depending on the analysis, only 3 types of data will ever be needed. The format of the data required is discussed below. - -1. Expression data. A matrix with genes on the columns, and cells on the row. - -The expression data should be preprocessed as in any standard sequencing data processing pipelines, which includes quality control filtering and normalisation. - -Use `initialise_raw_data` to convert expression data into a suitable format for model inference. It is recommended to use `initialise_raw_data` before subsetting the expression data for preferred cell types. - -``` r -data(wilson_raw_data) -round(wilson_raw_data[1:5, 1:5], 4) -``` - -| | bptf| cbfa2t3h| csf1r| dnmt3a| eif2b1| -|-----------|--------:|---------:|--------:|--------:|-------:| -| lmpp\_002 | 1.0261| 2.3944| 2.6847| 1.6636| 2.0203| -| lmpp\_003 | 2.6496| 1.7800| 1.6821| 1.5941| 2.7736| -| lmpp\_004 | 10.3080| 0.5889| 4.2653| -0.5565| 0.0026| -| lmpp\_007 | 0.5419| 1.8631| 10.8468| 0.1757| 1.0873| -| lmpp\_008 | 0.9209| 2.6637| 2.8549| 2.1965| 2.3663| - -``` r -edata = initialise_raw_data(wilson_raw_data, max_expr='low') #max_expr='low' because this is qPCR data. -``` - -1. Initial Boolean model. A data frame with two columns, targets and update functions. - -Note that if an update function contains both activation and inhibition genes, they must be expressed with a separate clause containing only activation genes, and a separate clause containing only inhibition genes. (See the update functions of Gata1 and Gata2 for examples) - -Use `initialise_model` to convert the input Boolean model into a BoolModel object. - -``` r -data(krum_bmodel) -head(krum_bmodel) -``` - -| targets | factors | -|:--------|:-----------------------------------| -| gata2 | gata2 & ! ((gata1 & fog1) | sfpi1) | -| gata1 | (gata1 | gata2 | fli1) & ! sfpi1 | -| fog1 | gata1 | -| eklf | gata1 & ! fli1 | -| fli1 | gata1 & ! eklf | -| scl | gata1 & ! sfpi1 | - -``` r -bmodel = initialise_model(krum_bmodel) -``` - -1. Initial state. - -A single row data frame with genes as the columns. The expression state of each gene must be in binarised form, i.e. 0s and 1s. - -Note that all the genes that are present in the initial Boolean model must also be present here. - -``` r -data(krum_istate) -head(krum_istate) -``` - -| | cjun| cebpa| fli1| gata1| gata2| eklf| sfpi1| gfi1| scl| egrnab| fog1| -|----------------|-----:|------:|-----:|------:|------:|-----:|------:|-----:|----:|-------:|-----:| -| initial\_state | 0| 1| 0| 0| 1| 0| 1| 0| 0| 0| 0| - -Output format -============= - -BoolTraineR supports several output formats for Boolean models, as shown below. - -- `outgraph_model` - Outputs a Boolean model in a tab-delimited file with each line being an edge (i.e. gene interaction). This function also outputs a node attribute file, which can be used to distinguish gene and AND nodes in a graph plotting software. This format is readable by both Cytoscape and Gephi. -- `outgenysis_model` - Outputs a Boolean model in a space-delimited file with each line being an edge (i.e. gene interaction). This format is readable by genYsis (used for steady state analysis). -- `writeBM` - Outputs a Boolean model in a comma-delimited file similar in format to the input file format (i.e. two columns: genes and update functions). - -BoolTraineR can also output a state transition graph. - -- `outstate_graph` - Outputs a state space of a Boolean model simulated with an initial state. This format is readable by both Cytoscape and Gephi. - -Useful functions in BoolTraineR -=============================== - -Besides training Boolean models, BoolTraineR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data. - -- `model_train` - Core function in `BoolTraineR` that performs Boolean model inference. -- `simulate_model` - Simulate a Boolean model asynchronously using an initial state, and return its state space. -- `calc_mscore` - Calculate a distance score for a Boolean model with respect to an expression data. -- `model_dist` - Calculate the number of genes in the update functions that differ between two Boolean models. -- `model_setdiff` - Show the genes in the update functions that differ between two Boolean models. - -Example workflows -================= - -Three example workflows will be discussed in this vignette: (1) Inferring model without an initial model, (2) Inferring model with an initial model, (3) Extending model with more genes. The two workflows are largely similar, which only differ in the data preparation step. - -Inferring model without an initial model ----------------------------------------- - -This workflow is intended for use on inferring a Boolean model without an initial model. - -When no initial model is used, BoolTraineR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data. - -### Full workflow - -Full workflow is included here for easy referencing. Each step is discussed in further details below. - -``` r -set.seed(0) #use to ensure reproducibility. remove in actual use. - -# (1) Setup paths and environment. -library(BoolTraineR) - -# If intending to use parallel processing, uncomment the following lines. -# library(doParallel) num_core = 4 #specify the number of cores to be used. -# doParallel::registerDoParallel(cores=num_core) - -# (2) Load data. -data(wilson_raw_data) #load a data frame of expression data. -tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low") -cdata = tmp_data[[1]] #continuous data -ddata = tmp_data[[2]] #discretised data - -# (3) Filter cell types. -cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", - rownames(cdata)) -fcdata = cdata[cell_ind, ] #select only relevant cells. -fddata = ddata[cell_ind, ] - -# (4) Filter genes. -gene_ind = c("fli1", "gata1", "gata2", "gfi1", "scl", "sfpi1") #select genes to be included. -fcdata = fcdata[, gene_ind] -fddata = fddata[, gene_ind] - -# (5) Inferring Boolean model. -final_model = model_train(cdata = fcdata, ddata = fddata, max_varperrule = 4, - verbose = T) - -# (6) Visualise the Boolean model generated. -plotBM(final_model) -``` - -### Initial setup - -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. - -``` r -set.seed(0) #use to ensure reproducibility. remove in actual use. - -# (1) Setup paths and environment. -library(BoolTraineR) - -# If intending to use parallel processing, uncomment the following lines. -# library(doParallel) num_core = 4 #specify the number of cores to be used. -# doParallel::registerDoParallel(cores=num_core) -``` - -### Data preparation - -Only the expression data is needed for inferring a Boolean model without an initial model. - -To load the data into R, use `read.table` or `read.csv`. In this example, we are using the example data included with the package, so we are accessing it by using `data`. - -`initialise_raw_data` is used to preprocess the data. - -``` r -# (2) Load data. -data(wilson_raw_data) #load a data frame of expression data. -tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low") -cdata = tmp_data[[1]] #continuous data -ddata = tmp_data[[2]] #discretised data -``` - -Once data is loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. - -``` r -# (3) Filter cell types. -cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", - rownames(cdata)) -fcdata = cdata[cell_ind, ] #select only relevant cells. -fddata = ddata[cell_ind, ] - -# (4) Filter genes. -gene_ind = c("fli1", "gata1", "gata2", "gfi1", "scl", "sfpi1") #select genes to be included. -fcdata = fcdata[, gene_ind] -fddata = fddata[, gene_ind] -``` - -### Run model training - -To reconstruct a Boolean model from an expression data, run `model_train`. - -In this example, `model_train` takes a few seconds to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above. - -You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise quickly using `plotBM`, which is based on `igraph` package. For easier manipulation, output the Boolean model using `outgraph_model` and display it with Cytoscape or Gephi. - -``` r -# (5) Inferring Boolean model. -final_model = model_train(cdata = fcdata, ddata = fddata, max_varperrule = 4, - verbose = T) - -# (6) Visualise the Boolean model generated. -plotBM(final_model) -``` - -![](vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-17-1.png) - -Inferring model with an initial model -------------------------------------- - -This workflow is intended for use on inferring a Boolean model with an initial model. - -When an initial model is used, note that only genes that are both present in the initial model and expression data will be used for reconstructing gene interactions. Any genes in the initial model that do not have corresponding expression values in the data will keep their original gene interactions as specified in the initial model without any modifications. - -### Full workflow - -Full workflow is included here for easy referencing. Each step is discussed in further details below. - -``` r -set.seed(0) #use to ensure reproducibility. remove in actual use. - -# (1) Setup paths and environment. -library(BoolTraineR) - -# If intending to use parallel processing, uncomment the following lines. -# library(doParallel) num_core = 4 #specify the number of cores to be used. -# doParallel::registerDoParallel(cores=num_core) - -# (2) Load data. -data(krum_bmodel) #load a data frame of Boolean model. -data(krum_istate) #load a data frame of initial state. -data(wilson_raw_data) #load a data frame of expression data. - -bmodel = initialise_model(krum_bmodel) -istate = krum_istate -tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low") -cdata = tmp_data[[1]] #continuous data -ddata = tmp_data[[2]] #discretised data - -# (3) Filter cell types. -cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", - rownames(cdata)) -fcdata = cdata[cell_ind, ] #select only relevant cells. -fddata = ddata[cell_ind, ] - -# (4) Inferring Boolean model. -final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = bmodel, istate = istate, - max_varperrule = 4, verbose = T) - -# (5) Visualise the Boolean model generated. -plotBM(final_model) -``` - -### Initial setup - -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. - -``` r -set.seed(0) #use to ensure reproducibility. remove in actual use. - -# (1) Setup paths and environment. -library(BoolTraineR) - -# If intending to use parallel processing, uncomment the following lines. -# library(doParallel) num_core = 4 #specify the number of cores to be used. -# doParallel::registerDoParallel(cores=num_core) -``` - -### Data preparation - -3 pieces of data are needed to infer a Boolean model with an initial model: an expression data, an initial Boolean model and an initial state. - -To load the data into R, use `read.table` or `read.csv`. In this example, we are using the example data included with the package, so we are accessing it by using `data`. - -`initialise_model` converts the data frame containing the Boolean model into a BoolModel object. `initialise_raw_data` is used to preprocess the data. - -``` r -# (2) Load data. (2) Load data. -data(krum_bmodel) #load a data frame of Boolean model. -data(krum_istate) #load a data frame of initial state. -data(wilson_raw_data) #load a data frame of expression data. - -bmodel = initialise_model(krum_bmodel) -istate = krum_istate -tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low") -cdata = tmp_data[[1]] #continuous data -ddata = tmp_data[[2]] #discretised data -``` - -Once data are loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. In this example, genes are not filtered as all genes that are present in both expression data and Boolean model are used automatically. - -``` r -# (3) Filter cell types. -cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", - rownames(cdata)) -fcdata = cdata[cell_ind, ] #select only relevant cells. -fddata = ddata[cell_ind, ] -``` - -### Run model training - -To reconstruct a Boolean model from an expression data, run `model_train`. - -In this example, `model_train` takes one or two minutes to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above. - -You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise using `plotBM`, which is based on `igraph` package. For easier manipulation, output the Boolean model using `outgraph_model` and display it with Cytoscape or Gephi. - -``` r -# (4) Inferring Boolean model. -final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = bmodel, istate = istate, - max_varperrule = 4, verbose = T) - -# (5) Visualise the Boolean model generated. -plotBM(final_model) -``` - -![](vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-23-1.png) - -Extending model with more genes -------------------------------- - -This workflow is intended for use on extending an initial Boolean model with additional genes. - -When an initial model is used, note that only genes that are both present in the initial model and expression data will be used for reconstructing gene interactions. Any genes in the initial model that do not have corresponding expression values in the data will keep their original gene interactions as specified in the initial model without any modifications. - -### Full workflow - -Full workflow is included here for easy referencing. Each step is discussed in further details below. - -*Note that this example takes a few minutes to run on a single core. The use of parallel processing is recommended.* - -``` r -set.seed(0) #use to ensure reproducibility. remove in actual use. - -# (1) Setup paths and environment. -library(BoolTraineR) - -# If intending to use parallel processing, uncomment the following lines. -# library(doParallel) num_core = 4 #specify the number of cores to be used. -# doParallel::registerDoParallel(cores=num_core) - -# (2) Load data. -data(krum_bmodel) #load a data frame of Boolean model. -data(krum_istate) #load a data frame of initial state. -data(wilson_raw_data) #load a data frame of expression data. - -bmodel = initialise_model(krum_bmodel) -istate = krum_istate -tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low") -cdata = tmp_data[[1]] #continuous data -ddata = tmp_data[[2]] #discretised data - -# (3) Filter cell types. -cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", - rownames(cdata)) -fcdata = cdata[cell_ind, ] #select only relevant cells. -fddata = ddata[cell_ind, ] - -# (4) Adding extra genes to the initial Boolean model. extra_genes = -# setdiff(colnames(wilson_raw_data), bmodel@target) #to view available genes -# to be added. print(extra_genes) #to view available genes to be added. -add_gene = "ldb1" #genes to be added: ldb1 -grown_bmodel = grow_bmodel(add_gene, bmodel) - -# (5) Estimating initial state for the extra genes. (estimating from CMPs) -tmp_istate = mean(cdata[grepl("cmp", rownames(cdata)), add_gene]) -tmp_istate = matrix(round(tmp_istate), nrow = 1) -colnames(tmp_istate) = add_gene -grown_istate = cbind(istate, tmp_istate) -grown_istate = initialise_data(grown_istate) - -# (6) Inferring Boolean model. -final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = grown_bmodel, - istate = grown_istate, verbose = T) - -# (7) Visualise the Boolean model generated. -plotBM(final_model) -``` - -### Initial setup - -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. - -``` r -set.seed(0) #use to ensure reproducibility. remove in actual use. - -# (1) Setup paths and environment. -library(BoolTraineR) - -# If intending to use parallel processing, uncomment the following lines. -# library(doParallel) num_core = 4 #specify the number of cores to be used. -# doParallel::registerDoParallel(cores=num_core) -``` - -### Data preparation - -3 pieces of data are needed to infer a Boolean model with an initial model: an expression data, an initial Boolean model and an initial state. - -To load the data into R, use `read.table` or `read.csv`. In this example, we are using the example data included with the package, so we are accessing it by using `data`. - -`initialise_model` converts the data frame containing the Boolean model into a BoolModel object. `initialise_raw_data` is used to preprocess the data. - -``` r -# (2) Load data. -data(krum_bmodel) #load a data frame of Boolean model. -data(krum_istate) #load a data frame of initial state. -data(wilson_raw_data) #load a data frame of expression data. - -bmodel = initialise_model(krum_bmodel) -istate = krum_istate -tmp_data = initialise_raw_data(wilson_raw_data, max_expr = "low") -cdata = tmp_data[[1]] #continuous data -ddata = tmp_data[[2]] #discretised data -``` - -Once data are loaded and preprocessed, filter the cell types or genes to be included in the analysis if needed. It is advisable to reduce the number of genes to be included if the computation takes too long to complete. In this example, genes are not filtered as all genes that are present in both expression data and Boolean model are used automatically. - -``` r -# (3) Filter cell types. -cell_ind = grepl("cmp", rownames(cdata)) | grepl("gmp", rownames(cdata)) | grepl("mep", - rownames(cdata)) -fcdata = cdata[cell_ind, ] #select only relevant cells. -fddata = ddata[cell_ind, ] -``` - -### Add extra genes to the initial Boolean model - -Extra genes can be added to the initial model using `grow_bmodel`. The function will add extra genes into the initial model with empty update functions. - -``` r -# (4) Adding extra genes to the initial Boolean model. extra_genes = -# setdiff(colnames(wilson_raw_data), bmodel@target) print(extra_genes) #to -# view available genes to be added. -add_gene = "ldb1" #genes to be added: ldb1 -grown_bmodel = grow_bmodel(add_gene, bmodel) -``` - -### Estimate initial state for the extra genes - -Initial state needs to be modify to include the initial expression of the extra genes. The initial state of the extra genes can be set manually, or it can be estimated from the data if the data contain multiple cell types with known relationships. In this example, CMPs are known to be at developmental upstream of erythro-myeloid differentiation, therefore initial state can be estimated by taking the average expression of the extra genes in CMPs. - -``` r -# (5) Estimating initial state for the extra genes. (estimating from CMPs) -tmp_istate = mean(cdata[grepl("cmp", rownames(cdata)), add_gene]) -tmp_istate = matrix(round(tmp_istate), nrow = 1) -colnames(tmp_istate) = add_gene -grown_istate = cbind(istate, tmp_istate) -grown_istate = initialise_data(grown_istate) -``` - -### Run model training - -To reconstruct a Boolean model from an expression data, run `model_train`. - -In this example, `model_train` takes a few minutes to be completed on a single core. If this steps take a very long time to complete, do consider using the parallel processing option as described above. - -You will receive a BoolModel object at the end of the model training process. The BoolModel object can be visualise using `plotBM`, which is based on `igraph` package. For easier manipulation, output the Boolean model using `outgraph_model` and display it with Cytoscape or Gephi. - -*Note that this example takes a long time to run. The use of parallel processing is recommended.* - -``` r -# (6) Inferring Boolean model. -final_model = model_train(cdata = fcdata, ddata = fddata, bmodel = grown_bmodel, - istate = grown_istate, verbose = T) - -# (7) Visualise the Boolean model generated. -plotBM(final_model) -``` - -![](vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-31-1.png) diff --git a/vignettes/booltrainer.pdf b/vignettes/booltrainer.pdf deleted file mode 100644 index 4fadf43..0000000 Binary files a/vignettes/booltrainer.pdf and /dev/null differ diff --git a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-15-1.png b/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-15-1.png deleted file mode 100644 index fe40a1b..0000000 Binary files a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-15-1.png and /dev/null differ diff --git a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-17-1.png b/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-17-1.png deleted file mode 100644 index 35b3cda..0000000 Binary files a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-17-1.png and /dev/null differ diff --git a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-21-1.png b/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-21-1.png deleted file mode 100644 index fe40a1b..0000000 Binary files a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-21-1.png and /dev/null differ diff --git a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-23-1.png b/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-23-1.png deleted file mode 100644 index 35b3cda..0000000 Binary files a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-23-1.png and /dev/null differ diff --git a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-29-1.png b/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-29-1.png deleted file mode 100644 index fe40a1b..0000000 Binary files a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-29-1.png and /dev/null differ diff --git a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-31-1.png b/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-31-1.png deleted file mode 100644 index 35b3cda..0000000 Binary files a/vignettes/booltrainer_files/figure-markdown_github/unnamed-chunk-31-1.png and /dev/null differ diff --git a/vignettes/booltrainer.Rmd b/vignettes/btr.Rmd similarity index 88% rename from vignettes/booltrainer.Rmd rename to vignettes/btr.Rmd index b982c81..aa5f705 100644 --- a/vignettes/booltrainer.Rmd +++ b/vignettes/btr.Rmd @@ -1,13 +1,19 @@ --- -title: "Using BoolTraineR to reconstruct asynchronous Boolean models" +title: "Using BTR to reconstruct asynchronous Boolean models" author: "Chee Yee Lim" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true number_sections: true + rmarkdown::pdf_document: + toc: true + number_sections: true + rmarkdown::md_document: + variant: markdown_github + toc: true vignette: > - %\VignetteIndexEntry{Using BoolTraineR to reconstruct asynchronous Boolean models} + %\VignetteIndexEntry{Using BTR to reconstruct asynchronous Boolean models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- @@ -22,30 +28,30 @@ vignette: > --> ```{r, echo=FALSE} -library('BoolTraineR') +library('BTR') ``` # Brief introduction -`BoolTraineR` is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users. +`BTR` is a model learning algorithm for reconstructing and training asynchronous Boolean models using single-cell expression data. Refer to the paper for more details on the concepts behind the algorithm. This vignette serves as a tutorial to demonstrate example workflows that can be adapted to individual cases experienced by users. -Running `BoolTraineR` is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, `BoolTraineR` may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of `BoolTraineR`. This can be easily achieved by using `doParallel` package, as illustrated in the example. +Running `BTR` is straightforward. However, note that depending on the (1) size of single-cell expression data and (2) complexity of Boolean model, `BTR` may take a long time to complete the computation. In such cases, it is advisable to use the built-in parallel processing capability of `BTR`. This can be easily achieved by using `doParallel` package, as illustrated in the example. Note that the examples presented in this vignette are different from the results presented in our paper. The examples presented here have been simplified to speed up the processing time. # Installation -`BoolTraineR` can be installed from CRAN. +`BTR` can be installed from CRAN. ```{r, eval = FALSE} -install.packages('BoolTraineR') +install.packages('BTR') ``` Or from Github for the latest version. To install from Gitbub, you will require the `devtools` package. ```{r, eval = FALSE} install.packages('devtools') -devtools::install_github("cheeyeelim/booltrainer") +devtools::install_github("cheeyeelim/BTR") ``` Also install `doParallel` package if you intend to use parallel processing. @@ -109,21 +115,21 @@ knitr::kable(head(krum_istate)) # Output format -BoolTraineR supports several output formats for Boolean models, as shown below. +BTR supports several output formats for Boolean models, as shown below. * `outgraph_model` - Outputs a Boolean model in a tab-delimited file with each line being an edge (i.e. gene interaction). This function also outputs a node attribute file, which can be used to distinguish gene and AND nodes in a graph plotting software. This format is readable by both Cytoscape and Gephi. * `outgenysis_model` - Outputs a Boolean model in a space-delimited file with each line being an edge (i.e. gene interaction). This format is readable by genYsis (used for steady state analysis). * `writeBM` - Outputs a Boolean model in a comma-delimited file similar in format to the input file format (i.e. two columns: genes and update functions). -BoolTraineR can also output a state transition graph. +BTR can also output a state transition graph. * `outstate_graph` - Outputs a state space of a Boolean model simulated with an initial state. This format is readable by both Cytoscape and Gephi. -# Useful functions in BoolTraineR +# Useful functions in BTR -Besides training Boolean models, BoolTraineR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data. +Besides training Boolean models, BTR can be used for simulating a Boolean model asynchronously and calculate the score of a Boolean model with respect to a data. -* `model_train` - Core function in `BoolTraineR` that performs Boolean model inference. +* `model_train` - Core function in `BTR` that performs Boolean model inference. * `simulate_model` - Simulate a Boolean model asynchronously using an initial state, and return its state space. * `calc_mscore` - Calculate a distance score for a Boolean model with respect to an expression data. * `model_dist` - Calculate the number of genes in the update functions that differ between two Boolean models. @@ -137,7 +143,7 @@ Three example workflows will be discussed in this vignette: (1) Inferring model This workflow is intended for use on inferring a Boolean model without an initial model. -When no initial model is used, BoolTraineR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data. +When no initial model is used, BTR will reconstruct gene interactions from a list of user-specified genes. If the number of genes in the expression data is low (e.g. in qPCR), it is also possible to use all the genes in the expression data. ### Full workflow @@ -147,7 +153,7 @@ Full workflow is included here for easy referencing. Each step is discussed in f set.seed(0) #use to ensure reproducibility. remove in actual use. #(1) Setup paths and environment. -library(BoolTraineR) +library(BTR) #If intending to use parallel processing, uncomment the following lines. #library(doParallel) @@ -179,13 +185,13 @@ plotBM(final_model) ### Initial setup -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. +The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. ```{r, tidy=TRUE} set.seed(0) #use to ensure reproducibility. remove in actual use. #(1) Setup paths and environment. -library(BoolTraineR) +library(BTR) #If intending to use parallel processing, uncomment the following lines. #library(doParallel) @@ -257,7 +263,7 @@ Full workflow is included here for easy referencing. Each step is discussed in f set.seed(0) #use to ensure reproducibility. remove in actual use. #(1) Setup paths and environment. -library(BoolTraineR) +library(BTR) #If intending to use parallel processing, uncomment the following lines. #library(doParallel) @@ -289,13 +295,13 @@ plotBM(final_model) ### Initial setup -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. +The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. ```{r, tidy=TRUE} set.seed(0) #use to ensure reproducibility. remove in actual use. #(1) Setup paths and environment. -library(BoolTraineR) +library(BTR) #If intending to use parallel processing, uncomment the following lines. #library(doParallel) @@ -370,7 +376,7 @@ Full workflow is included here for easy referencing. Each step is discussed in f set.seed(0) #use to ensure reproducibility. remove in actual use. #(1) Setup paths and environment. -library(BoolTraineR) +library(BTR) #If intending to use parallel processing, uncomment the following lines. #library(doParallel) @@ -415,13 +421,13 @@ plotBM(final_model) ### Initial setup -The first step is to load the `BoolTraineR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. +The first step is to load the `BTR` package. If you are intending to use parallel processing, you will also need to load the `doParallel` package. Then specify how many cores you intend to use using `registerDoParallel` from the `doParallel` package. ```{r, tidy=TRUE} set.seed(0) #use to ensure reproducibility. remove in actual use. #(1) Setup paths and environment. -library(BoolTraineR) +library(BTR) #If intending to use parallel processing, uncomment the following lines. #library(doParallel)