This repository contains simulator used in the work "NestedBD: Bayesian Inference of Phylogenetic Trees From Single-Cell Copy Number Profile Data Under a Birth-Death Model". The simulator is based on https://github.com/compbiofan/SingleCellCNABenchmark, with several modifications.
For the purpose of the study, we set X=8, W=0, m=20M and e=100M. The value of c under different simulation setting is described in the method section.
python main.par.py -r $dir -n $n -X $X -t $ref -W $W -C $C -m $m -e $e -amp $amp -SP $SP
-
$dir: the folder where the simulated data will be put. It could be a relative path. For example: large_dataset/. Default: test.
-
$n: number of cells in the tree.
-
$X: how much more CNAs on the edge to the root than other edges. For example, 8.
-
$ref (required): reference fasta file in an absolute path.
-
$W: if there are whole chromosomal amplifications, 1 (yes) or 0 (no).
-
$C: the probability that a chromosome may be amplified if $W is 1.
-
$m: minimum copy number size.
-
$e: parameter p in exponential distribution for the copy number size that will be added to $m.
-
$amp: control the non-uniformness when selecting position of CNA on genome during simulation. When setting to 0 the CNAs are sampled randomly. Default is 0.
-
$SP (required): Tree option. When setting SP = 0, a prompt will require user enter the path to a file contain a tree in newick format to be used for simulation; SP = 1 simulate a tree with birth death process before adding CNAs along the branches; SP = 2 simulate a beta splitting tree and CNA along branches simulataneously.
-
$c: event multipler, the number of events added to each branch is sampled from a poisson distribution with mean eqaul to the product of %c and branch length.
Same as the original simulator. The commands used in step 2 is summarized in gen_bam/simulate.sh and gen_bam/make_bam_from_fq.sh.