This analysis compares the Pediatric Brain Tumor samples of this dataset to the adult brain tumor samples from The Cancer Genome Atlas.
Table of Contents generated with DocToc
To run this from the command line, first, the snv-callers
scripts must be run with a machine that has at least 256 GB of RAM.
bash analyses/snv-callers/run_caller_consensus_analysis-pbta.sh
bash analyses/snv-callers/run_caller_consensus_analysis-tcga.sh
The results from those scripts are saved to snv-callers/results/consensus/
folder and used here.
Some versions of these files are incorporated into the data release and saved to data
folder so this module could be altered to use the file versions in the data
folder.
Then you can run this module's analysis by the following command to create the plot:
Rscript -e "rmarkdown::render('analyses/tmb-compare/compare-tmb.Rmd',
clean = TRUE)"
These steps are also run soup to nuts in the figures script: figures/scripts/fig2-mutational-landscape.R
.
The results of this analysis are the TMB calculations for PBTA and TCGA datasets plotted side by side:
Additionally, the resulting TCGA TMB calculations used are saved to results/brain_related_tcga_tmb.tsv
in this folder.
Overall, tumor mutation burden for both brain tumor datasets are calculated using mutations from exonic regions of the genome only.
The TMB calculations for the pediatric brain tumor set were carried out in snv-callers analysis in this repository. In brief, tumor mutation burden is calculated using all SNV calls found by both Mutect2 and Strelka2 found in coding sequences. The total number coding sequence consensus SNVs were used for the numerator and the effective size of the genome surveyed is used as the denominator.
TMB = (total # coding sequence consensus snvs) / (size of effectively surveyed genome)
For more details, see snv-callers README.
For calculating TCGA tumor mutation burden, TCGA brain-related tumor projects only were used:
The size of the exome BED regions file included with the MC3 project overlapped with the same coding sequences used for PBTA data.