SVanGogh

Pixelate SVs!

🚧⚠️ WIP ⚠️🚧

I am currently NOT developing this software. If you use this software or methodology please cite my thesis (soon).

Install

git clone --recursive https://github.com/dantaki/SVanGogh
cd SVanGogh && pip install dist/SVanGogh-0.1.tar.gz

Requires:

Pillow (PIL)
pybedtools
pysam
scipy
tqdm

Usage

$ svangogh --help

---------------------------------------------------------
8""""8  88   8                8""""8                   
8       88   8  eeeee  eeeee  8    "  eeeee  eeeee  e   e 
8eeeee  88  e8  8   8  8   8  8e      8  88  8   8  8   8 
    88  "8  8   8eee8  8e  8  88  ee  8   8  8e     8eee8 
e   88   8  8   88  8  88  8  88   8  8   8  88 "8  88  8 
8eee88   8ee8   88  8  88  8  88eee8  8eee8  88ee8  88  8

pixelate SVs         Author:Danny Antaki dantaki@ucsd.edu
---------------------------------------------------------

Usage: svangogh (-i BAM) [-r REGION] [-t SVTYPE] [-v VCF] [-b BED] 
                         [--ci] [-w WINDOW] [-c CLIP] 
                         [--min-ovr MINOVR] [--min-inv MININV] [--min-indel INDEL]
                         [--max-reads MAXREADS] [--max-mapq MAXMAPQ] [--min-sr MINSR] 
                         [-f FLANK] [-s SCALE] [--hs HSCALE] [--ws WSCALE] 
                         [-V --verbose] [-P] [-o OUT] [-h --help]

  -h --help             show this help message and exit
  --version             print the version number

Required Arguments:
  -i BAM                BAM file

SV Arguments:  
  -r REGION             breakpoint [chr:start-end]
  -t SVTYPE             SV type, required if -r is used. [DEL|DUP|INV|INS]
  -v VCF                VCF file
  -b BED                BED file

SV Options:
  --ci                  flag clips within confidence intervals, requires -v, overrides -c
  -w WINDOW             analyze reads +/- bp from breakpoints [default: 100]
  -c CLIP               max distance of clips to breakpoint [default: 50]
  --min-ovr MINOVR      min SV overlap, flags clips with CIGAR strings [default: 0.8]
  --min-inv MININV      min alignment overlap to inversion, flags supporting reads [default: 0.5]
  --min-indel INDEL     min indel size [default: 7]

Pixelating Options:  
  -f FLANK              flanking bp to paint [default: 250]
  --max-reads MAXREADS  max number of reads to pixelate [default: 10]
  --max-mapq MAXMAPQ    max MAPQ [Maximum in subsample]
  --min-sr MINSR        min number of supporting reads [default: 0]
  -s SCALE              scaling multiplier, adjusts the image size [default: 5]
  --hs HSCALE           height scaling multiplier, adjusts scaled height [default: 5]
  --ws WSCALE           width scaling multiplier, adjusts scaled width [default: 5]

Options:  
  -V --verbose          verbose mode
  -P                    display a progress bar
  -o OUT                output prefix [default: svangogh]

Output

svangogh creates three output files for each SV:

data file containing pixels with RGB values
PNG image of the SV, scaled to 800x300
PNG image of the SV, unscaled

The prefix of the output file defined by the <-o output> argument

Methodology

🚧⚠️ WIP ⚠️🚧

Subsampling Reads

To make images consistent for learning, the default maximum number of reads (rows) to pixelate is 10.

Reads are then ordered, first printing out supporting reads. A more detailed list of ordering is found below:

Deletions, Duplications

Reads with alignments on the same strand. Minimum sum of the difference between both clipped positions to the median clipped positions and MAPQ (maximum 60).

the read has two clipped positions

Alignments on the same strand. Minimum sum of the difference between a clipped position to the median and MAPQ.

the read has one clipped position

Alignments on the same strand. Minimum sum of he MAPQ.

the read has no clipped positions

Alignments on different strands. Minimum sum in 1.

the read has alignments on different strands, which for deletions and duplications are more likely to be erroreous than valid. Two clipped positions.

Similar to 2. but with alignments on different strands.
Similar to 3. but with alignments on different strands.

Inversions

Alignments on different strands. Minimum sum of the differences between both clipped positions to the median and MAPQ (maximum 60).

For inversions, at least one alignment should map to an opposite strand. Two clipped positions

Alignments on different strands. Minimum sum of one clipped positions and MAPQ.
Alignments on the same strand. Minimum sum of the difference between a clipped position to the median and MAPQ

Sometimes the inverted sequence portion of the read is unmapped, but has an informative clipped alignment

Alignments on the same strand. Munimum sum of the MAPQ.

No clips, properly aligned reads

Contact

This is an alpha release of svangogh. Use at your own risk:exclamation:

Author: Danny Antaki dantaki@ucsd.edu

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
SVanGogh.egg-info		SVanGogh.egg-info
dist		dist
svangogh		svangogh
LICENSE		LICENSE
MANIFEST		MANIFEST
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SVanGogh.egg-info

SVanGogh.egg-info

dist

dist

svangogh

svangogh

LICENSE

LICENSE

MANIFEST

MANIFEST

MANIFEST.in

MANIFEST.in

README.md

README.md

setup.py

setup.py

Repository files navigation

SVanGogh

Install

Requires:

Usage

Output

Methodology

Subsampling Reads

Deletions, Duplications

Inversions

Contact

About

Releases

Packages

Languages

License

dantaki/SVanGogh

Folders and files

Latest commit

History

Repository files navigation

SVanGogh

Install

Requires:

Usage

Output

Methodology

Subsampling Reads

Deletions, Duplications

Inversions

Contact

About

Resources

License

Stars

Watchers

Forks

Languages