PDXToolkit is a mouse reads filtering pipeline from WGS/WES and RNA-seq data
- Java (JRE1.8.x)
- conda install -c bioconda bwa=0.7.15
- conda install -c bioconda samtools=1.5
- conda install -c bioconda picard=2.17.11
- conda install -c bioconda bamtools=2.4.0
- conda install -c bioconda ngs-disambiguate=2016.11.10
- conda install -c bioconda star=2.5.4a
Version 1.0
Usage: pdx_disam_kit.sh -p <command> [options]
Key commands:
dnaFull Do full step from *.fq.gz to create new bam
rnaFull Do full step from *.fq.gz to create new *.fq.gz
humanBam Separate step - Only create human bam
mouseBam Separate step - Only create mouse bam
disambiguate Separate step - Do disambiguate
Options:
-p <command> Key command
-n <name> Any name for building folder
-1 <fastq.gz> Fastq gzip file
-2 <fastq.gz> Fastq gzip file
-o <directory> Output directory
-d <directory> Directory. It use to separate step.
-
WGS/WES full step
sh pdx_disam_kit.sh -p dnaFull -n sample -1 fq1.gz -2 fq2.gz -o outdir
-
RNA full step
sh pdx_disam_kit.sh -p rnaFull -n sample -1 fq1.gz -2 fq2.gz -o outdir
-
Separate steps
-
Make human to bam
sh pdx_disam_kit.sh -p humanBam -n sample -1 fq1.gz -2 fq2.gz -o outdir
-
Make mouse to bam
sh pdx_disam_kit.sh -p mouseBam -n sample -1 fq1.gz -2 fq2.gz -o outdir
-
Disambiguate
sh pdx_disam_kit.sh -p disambiguate -n sample -d pdx_wxs_dir
-
-
WGS/WES/RNA-seq
Pair fastq gzip files (e.g. *.1.fq.gz / *.2.fq.gz)
-
WGS/WES data output
*.disam.reAlign.remDup.bam *.disam.reAlign.remDup.bam.bai
-
RNA-seq data output
*.disam.rna-seq.1.fastq.gz *.disam.rna-seq.2.fastq.gz
-
Docker files (https://github.com/ding-lab/dockers)
-
Usage
* Only Disambiguate docker pull hsun9/disambiguate docker run hsun9/disambiguate ngs_disambiguate --help * Full pipeline of mouse filter (wxs/wgs data) docker image docker pull hsun9/disambiguateplus docker run hsun9/disambiguateplus ngs_disambiguate --help
The CWL version developed by Matthew Wyczalkowski (https://github.com/ding-lab/MouseTrap2)
It's not including in PDXToolkit
NOTE:Please refer to "example.mgi.gmt.sh"
If test the example.mgi.gmt.sh, you must modify the vars including name, outDir and bed.
The shell script only run in MGI servers.
The *.permissive.out is the final filtered result.
- Test sample
name="example" #-- any name
outDir=/example/outFolder #-- any dir
bed=/example/target.bed #-- any bed file
hg19=/path/GRCh37-lite.fa #-- GRCh37
mm10=/path/Mus_musculus.GRCm38.dna_sm.primary_assembly.fa
chain10=/path/hg19ToMm10.over.chain
gmt somatic filter-mouse-bases --chain-file=$chain10 \
--human-reference=$hg19 \
--mouse-reference=$mm10 --variant-file=$bed \
--filtered-file=$outDir/log/$name.mouse.hg19toMm10.out \
--output-file=$outDir/$name.hg19toMm10.permissive.out --permissive
## target.bed (create based on VCF)
1 121009 121009 C T
## chain download
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/hg19ToMm10.over.chain.gz
Hua Sun, hua.sun@wustl.edu