We developed a user-friendly Python toolkit, MutScape, which provides a comprehensive pipeline of filtering, combination, transformation, analysis, and visualization for researchers, to easily explore the cohort-based mutational characterization for studying cancer genomics when obtaining somatic mutation data. MutScape can not only preprocess millions of mutation records in a few minutes, but offers various analyses simultaneously. Furthermore, MutScape supports somatic variant data in both Variant Call Format (VCF) and Mutation Annotation Format (MAF), and leverages caller combination strategies to quickly eliminate false-positives. With only two simple commands, robust results and publication-quality images are generated automatically.
Before implement quick installation, please be sure that you have installed MiniConda3, created a new conda environment and activate it. Also, to make this implementation run smoothly, please confirm that the Internet is connected always and the server/computer has enough storage memory.
git clone https://github.com/anitalu724/MutScape.git
bash MutScape/mutscape/installation/quickInstall_1.sh
bash vcf2maf-1.6.20/MutScape/mutscape/installation/quickInstall_2.sh
The latest tested version in parentheses:
-
Using Miniconda (py37_4.9.2) to install:
samtools (v1.10), ucsc-liftover (v377), bcftools (v1.10.2), htslib (v1.10.2) and ensembl-vep (v102.0)
-
Download vcf2maf (v1.6.20) and git clone MutScape (v1.0)
-
Download VEP cache data of GRCh37 and the reference FASTA (v102.0)
Numerous modules for this toolkit will be installed by conda
.
If you have never installed conda, please refer to Miniconda website. For high compatibility, we recommended users install Miniconda3-py37_4.9.2
. (SHA256 hash 79510c6e7bd9e012856e25dcb21b3e093aa4ac8113d9aa7e82a86987eabe1c31)
There is a script for users to install Miniconda quickly.
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh
sha256sum Miniconda3-py37_4.9.2-Linux-x86_64.sh
bash Miniconda3-py37_4.9.2-Linux-x86_64.sh
export PATH="$HOME/miniconda3/bin:$PATH"
MutScape is preferred to be implementing under a brand-new conda environment.
conda create --name MutScape
conda activate MutScape
If you have already install Ensembl's VEP, you may skip this part and directly into the next part to install vcf2maf
. (However, you must confirm that your VEP version is compatible to vcf2maf. Here, we recommended installing ensembl-vep=102.0
. )
conda install -c bioconda -c conda-forge samtools=1.10 ucsc-liftover=377 bcftools=1.10.2 htslib==1.10.2
conda install -c bioconda -c conda-forge -c defaults ensembl-vep=102.0
For transforming the VCF into the MAF, this procedure is implemented by vcf2maf
utility, which processes variant annotation and transcript prioritization. You can refer to this script or just follow the commands below. (Before this step, you must be sure that you have installed Ensembl's VEP)
wget https://github.com/mskcc/vcf2maf/archive/refs/tags/v1.6.20.tar.gz
tar -zxf v1.6.20.tar.gz
cd vcf2maf-1.6.20
perl vcf2maf.pl --man
perl maf2maf.pl --man
Before we start to use vcf2maf, we need to download VEP cache data and the reference FASTA.
ℹ️ Here we recommended to download 102_GRCh37
mkdir -p $HOME/.vep/homo_sapiens/102_GRCh37/
wget ftp://ftp.ensembl.org/pub/grch37/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
mv Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/102_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
wget ftp://ftp.ensembl.org/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh37.tar.gz
mv homo_sapiens_vep_102_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh37.tar.gz -C $HOME/.vep/
MutScape is provided on the Github website, please download it.
git clone https://github.com/anitalu724/MutScape.git
If you have never installed pip
, install it by conda
.
conda install -c anaconda pip
To make sure all code smoothly implement, you need to install several modules that are used in MutScape:
cd MutScape/mutscape
bash installation/install_module.sh
MutScape has simply separated into two main modules: data preprocessing and analysis and visualization. Detailed structure please refer to Fig. 1.
MutScape accepts both VCF and MAF files as input data.
For multiple VCF/MAF files will be implemented simultaneously, MutScape requires a limited-format TSV file as input. For the detailed format please refer to example files such as examples/tsv/testData_vcf.tsv
and examples/tsv/testData_maf.tsv
or just see Wiki.
For VCFs as input data, -f
, -o
and -m
are required while -vf
, -ra
, -v2m
and -mf
are optional.
Some simple test commands are displayed below.
See Wiki for detailed information.
python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-vf CI "*,*,*,6,*,*,*,*"
python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-vf GI [1,3] \
-v2m 8
python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-vf GI "{1: [*,*], 2 : [1, 300000]}" CI "15,15,0,6,0,0.05,8,8" PA 0 AV 0.9 \
-v2m
python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-v2m 8 \
-mf GI [1,3]
- Reject and accept list (-ra)
Schematic diagram is shown in S2.
python3 dataPreprocess.py \ -f examples/tsv/testData_vcf.tsv \ -ra examples/test_data/vcf/reject.vcf examplestest_data/vcf/accept.vcf \ -o examples/output \ -m examples/meta \ -vf CI "*,*,*,6,*,*,*,*" \ -v2m 8 \ -mf GI [1,3]
For MAFs as input data, -f
, -o
and -m
are required while -mf
are optional.
Some simple test commands are displayed below.
python3 dataPreprocess.py \
-f examples/tsv/testData_maf.tsv \
-mf GI [1:3] \
-o examples/output \
-m examples/meta
python3 dataPreprocess.py \
-f examples/tsv/testData_maf.tsv \
-mf GI [1:3] CI "15,15,0,0,0,0.05,8,8" TE [BLCA,5] PAC 1 HY 500 \
-o examples/output \
-m examples/meta
MutScape provides 9 different analyses and some of them generate plots after analysis.
See Wiki for detailed information.
Some simple test commands are displayed below.
-
Significantly mutated gene detection
python3 mafAnalysis.py \ -f examples/test_data/maf/TCGA_test.maf \ -smg \ -o examples/output \ -p examples/pic/
-
Known cancer gene annotation
python3 mafAnalysis.py \ -f examples/test_data/maf/TCGA_test.maf \ -kcga \ -o examples/output \ -p examples/pic/
-
Mutation burden statistics
python3 mafAnalysis.py \ -f examples/test_data/maf/TCGA_test.maf \ -tmb 60456963 \ -o examples/output \ -p examples/pic/
-
CoMut plot analysis
Output figure is shown like Fig. 2.
See Wiki for detailed information.python3 mafAnalysis.py \ -f examples/test_data/maf/TCGA_test.maf \ -cm 60456963 \ -o examples/output \ -p examples/pic/ python3 mafAnalysis.py \ -cmp examples/tsv/comut.tsv examples/tsv/comut_info.tsv 0 comut.pdf \ -o examples/output \ -p examples/pic/
-
Mutational signature
Signature refitting: the output figure of
-ms 0
is shown in Wiki.
De novo extraction: the output figure of-ms 1
and-ms 2
is shown like Fig. 3.python3 mafAnalysis.py \ -f examples/test_data/maf/ms.maf \ -ms 0 "[SBS1, SBS5, SBS40, SBS87]" \ -o examples/output \ -p examples/pic/ python3 mafAnalysis.py \ -f examples/test_data/maf/ms.maf \ -ms 1 "[2,9,10]" \ -o examples/output \ -p examples/pic/ python3 mafAnalysis.py \ -f examples/test_data/maf/ms.maf \ -ms 2 "[3]" \ -o examples/output \ -p examples/pic/
-
HRD Score
Output figure is shown like Fig. 4A, B.
python3 mafAnalysis.py \ -hrd examples/tsv/hrd.tsv grch37 \ -o examples/output \ -p examples/pic/
-
Whole-genome doubling (WGD) and Chromosome instability (CIN)
Output figure is shown like Fig. 4C, D.
python3 mafAnalysis.py \ -wgdcin examples/tsv/hrd.tsv \ -o examples/output \ -p examples/pic/
-
HRD, CIN and WGD Comparison
Output figure is shown like Fig. 5.
python3 mafAnalysis.py \ -hcwc examples/tsv/hcw_comparison.tsv grch37 \ -o examples/output \ -p examples/pic/
-
Actionable mutation (drug) annotation
oncokb-annotator
was free under the GPL 3.0 license.
[your_oncokb_token]
is gotten from OncoKB Website. You must create your own account and get your personal API token.
Output figure is shown like Fig. 6.python3 mafAnalysis.py \ -f examples/test_data/maf/TCGA_test.maf \ -oncokb ../oncokb-annotator/ [your_oncokb_token] 4 examples/test_data/oncokb/clinical_input.txt \ -o examples/output \ -p examples/pic/
Lu, C. H., Wu, C. H., Tsai, M. H., Lai, L. C., & Chuang, E. Y. (2021). MutScape: an analytical toolkit for probing the mutational landscape in cancer genomics. NAR genomics and bioinformatics, 3(4), lqab099.