MitoH3 is a pipeline that calls MT variants from sequence file (cram file). In addition to the regular MT reference sequence file, the pipeline also applied a shifted MT reference file to get calls at D-loop region more precisely.
There are two main steps of the pipeline:
-
- Generate raw vcf file from cram file.
-
- Call haplogroup from cleaned vcf file; Split raw vcf file into homoplasmic variants vcf and heteroplasmic variants vcf at a given alternative allele frequency cutoff.
- build singularity image using definition file:
How to set up singularity remote build
wget https://raw.githubusercontent.com/MarchOnion/MitoH3/main/MitoH3.def
singularity build --remote MitoH3.sif MitoH3.def
- download reference file and prepare json file:
wget https://www.dropbox.com/s/4u1mz7h7ws1z89k/MT_WDL_ref.zip
unzip MT_WDL_ref.zip
- 1KG project cram file example:
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR398/ERR3988882/HG01433.final.cram
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR398/ERR3988882/HG01433.final.cram.crai
- prepare json file:
wget https://raw.githubusercontent.com/MarchOnion/MitoH3/main/input.json
dir=`pwd`
sed -i "s|your_local_path|${dir}|g" input.json
- step1:
singularity exec --bind "$dir" MitoH3.sif bash /script/run1.sh input.json
- step2:
step1_output=cromwell-executions/MitochondriaPipeline/*/call-SplitMultiAllelicSites/execution/*.final.split.vcf
singularity exec --bind "$dir" MitoH3.sif bash /script/run2.sh $step1_output prefix 0.05 0.95
- example script for merge vcf files is also provided.
https://raw.githubusercontent.com/MarchOnion/MitoH3/main/merge_vcf_files.sh