pauvre custommargin -i custom.tsv --ycol length --xcol qual # Custom tsv input
Pauvre is a plotting package originally designed to help QC the length and
quality distribution of Oxford Nanopore or PacBio reads. The main outputs
are marginplots. Now, pauvre
also hosts other additional data plotting
scripts.
This package currently hosts five scripts for plotting and/or printing stats.
pauvre marginplot
- takes a fastq file as input and outputs a marginal histogram with a heatmap.
pauvre custommargin
- takes a tsv as input and outputs a marginal histogram with custom columns of your choice.
pauvre stats
- Takes a fastq file as input and prints out a table of stats, including how many basepairs/reads there are for a length/mean quality cutoff.
- This is also automagically called when using
pauvre marginplot
pauvre redwood
- I am happy to introduce the redwood plot to the world as a method
of representing circular genomes. A redwood plot contains long
reads as "rings" on the inside, a gene annotation
"cambrium/phloem", and a RNAseq "bark". The input is
.bam
files for the long reads and RNAseq data, and a.gff
file for the annotation. More details to follow as we document this program better...
- I am happy to introduce the redwood plot to the world as a method
of representing circular genomes. A redwood plot contains long
reads as "rings" on the inside, a gene annotation
"cambrium/phloem", and a RNAseq "bark". The input is
pauvre synteny
- Makes a synteny plot of circular genomes. Finds the most
parsimonius rotation to display the synteny of all the input
genomes with the fewest crossings-over. Input is one
.gff
file per circular genome and one directory of gene alignments.
- Makes a synteny plot of circular genomes. Finds the most
parsimonius rotation to display the synteny of all the input
genomes with the fewest crossings-over. Input is one
- You must have the following installed on your system to install this software:
- python 3.x
- matplotlib
- biopython
- pandas
- pillow
- Instructions to install on your mac or linux system. Not sure on
Windows! Make sure python 3 is the active environment before
installing.
git clone https://github.com/conchoecia/pauvre.git
cd ./pauvre
pip3 install .
- Or, install with pip
pip3 install pauvre
- generate basic statistics about the fastq file. For example, if I
want to know the number of bases and reads with AT LEAST a PHRED
score of 5 and AT LEAST a read length of 500, run the program as below
and look at the cells highlighted with
<braces>
.pauvre stats --fastq miniDSMN15.fastq
numReads: 1000
numBasepairs: 1029114
meanLen: 1029.114
medianLen: 875.5
minLen: 11
maxLen: 5337
N50: 1278
L50: 296
Basepairs >= bin by mean PHRED and length
minLen Q0 Q5 Q10 Q15 Q17.5 Q20 Q21.5 Q25 Q25.5 Q30
0 1029114 1010681 935366 429279 143948 25139 3668 2938 2000 0
500 984212 <968653> 904787 421307 142003 24417 3668 2938 2000 0
1000 659842 649319 616788 300948 103122 17251 2000 2000 2000 0
et cetera...
Number of reads >= bin by mean Phred+Len
minLen Q0 Q5 Q10 Q15 Q17.5 Q20 Q21.5 Q25 Q25.5 Q30
0 1000 969 865 366 118 22 3 2 1 0
500 873 <859> 789 347 113 20 3 2 1 0
1000 424 418 396 187 62 11 1 1 1 0
et cetera...
- automatically calls
pauvre stats
for each fastq file - Make the default plot showing the 99th percentile of longest reads
- Make a marginal histogram for ONT 2D or 1D^2 cDNA data with a lower maxlen and higher maxqual.
- Filter out reads with a mean quality less than 5, and a length less than 800. Zoom in to plot only mean quality of at least 4 and read length at least 500bp.
-
Plot ONT 1D data with a large tail
pauvre marginplot --maxlen 100000 --maxqual 15 --lengthbin 500 <myfile>.fastq
-
Get more resolution on lengths
pauvre marginplot --maxlen 100000 --lengthbin 5 <myfile>.fastq
-
Turn off transparency if you just want a white background
@conchoecia (Darrin Schultz) @mebbert (Mark Ebbert) @wdecoster (Wouter De Coster)