Output File Descriptions
Pilon produces a set of output files named pilon.*
by default. If the user specified an --output <prefix>
argument, the output files will be named <prefix>.*
. If the --outdir <dir>
option is used, all the output files will be placed in the specified directory.
Pilon will normally output a fasta file (pilon.fasta
by default) which contains a version of the assembly in which errors are fixed as specified by the --fix
option. Pilon renames the sequence headers by appending _pilon to each FASTA element name.
If the --iupac
argument is given, Pilon will use IUPAC nucleotide codes in the output FASTA file to represent ambiguous bases and/or heterozygous SNPs.
If run with the --changes
argument, Pilon produces a file (pilon.changes
by default) containing a space-delimited record of every change made in the assembly as instructed by the --fix
option. The format for the file is as follows:
<Original Scaffold Coordinate> <New Scaffold Coordinate> <Original Sequence> <New Sequence>
These headers are further described in the table below:
Header | Description | Example |
---|---|---|
Original Scaffold Coordinate | The coordinate of sequence in the original fasta that was flagged by pilon for a change | scaffold00003:690754 |
New Scaffold Coordinate | The coordinate of the changed sequence in the pilon fasta file | scaffold00003_pilon:690809-690853 |
Original Sequence | The sequence in the original fasta file that was flagged by pilon for a change | . |
New Sequence | The sequence in the pilon fasta file that was inserted or deleted by pilon to fix the assembly | GGCCAGTCCACAACAAGGCAAACATACCAACGCCCACGGCTATCT |
A period (.
) will be used to represent blank fields for original or new sequence. Using the examples in the table above, the pilon change file record would appear as follows:
scaffold00003:690754 scaffold00003_pilon:690809-690853 . GGCCAGTCCACAACAAGGCAAACATACCAACGCCCACGGCTATCT
In this case, pilon did not remove any sequence from at position 690754 in scaffold00003 of the original assembly, but inserted 45 bases and wrote the output to the pilon fasta file at coordinates 690809-690853 in scaffold00003_pilon.
If the --vcf
option is specified, Pilon variant output is stored in a file named pilon.vcf
by default (if --output
is specified, the file will be named <output>.vcf
).
Calls are classified by small number VCF FILTER tags:
Filter Tag | Description |
---|---|
PASS | A passing call, either reference confirmation or difference |
Amb | Ambiguous; significant evidence for more than one allele at this position. Meant for haploid genomes, this filter tag is suppressed by the --diploid argument, as it will result in a heterozygous call for a diploid genome |
LowCov | Valid read coverage less than the threshold controlled by the --mindepth argument |
Del | Provides pileup information for loci which were removed by a variation in another line; this gives a sense of the alignment evidence at that locus had the larger variation not been called |
Pilon includes many computed values in the VCF INFO
field; here is an example along with a description of the values:
DP=38;TD=55;BQ=25;MQ=19;QD=2;BC=0,22,2,14;QP=0,72,1,27;PC=958;IC=0;DC=0;XC=0;AC=1;AF=0.27
Example | Description |
---|---|
DP=38 | Depth of valid reads in pileup (not invalid pair; not soft-clipped) |
TD=55 | Total Depth, including reads excluded from pileups |
BQ=25 | Mean base base quality in pileup |
MQ=19 | Mean mapping quality in pileup |
QD=2 | Quality normalized to depth; meant to give a sense of how confident the base call is. |
BC=0,22,2,14 | Base count in pileups (order A,C,G,T) |
QP=0,72,1,27 | Percentage of weighted evidence for each base (order A,C,G,T) |
PC=958 | Physical coverage of valid reads or pairs spanning this locus |
IC=0 | Number of reads in pileup calling an insertion at this locus |
DC=0 | Number of reads in pileup calling a deletion at this locus |
XC=0 | Number of reads in pileup soft-clipped at this locus |
AC=1 | Alternate allele count, as defined in VCF spec (0=reference call, 1=heterozygous/ambiguous, 2=alternate call) |
AF=0.27 | Fraction of evidence in support of alternate allele |
Potentially larger events resulting from local reassembly of suspicious regions are represented by VCF Structural Variant records. Pilon assigns SVTYPE=INS
if the variant contains more bases than the reference region, otherwise SVTYPE=DEL
, even if the events are in the form of block substitutions (not pure insertions or deletions). Example:
gi|395136682|gb|CP003248.1| 2133481 . T TGCCGTCACCTCGCAT . PASS SVTYPE=INS;SVLEN=15;END=2133481 GT 1/1
If there are unknown (N
) bases in the resulting event, the INFO
tag IMPRECISE
will be included. This can happen when Pilon partially assembles an event, such as a large insertion, but it cannot resolve the complete event by joining the two flanks. This will only happen if the --fix +breaks
option is turned on, which is implied by the --variant
option.
Pilon will use SVTYPE=DUP
records to indicate possible large segmental duplications, meaning there is read evidence for more copies of this region than appear in the input genome. This feature should be considered experimental and advisory, and is not meant to give a definitive call of such events. Example:
gi|395136682|gb|CP003248.1| 3494060 . T <DUP> . PASS SVTYPE=DUP;SVLEN=218228;END=3712288;IMPRECISE GT ./.
For more information on the VCF file format specification, see Variant Call Format version 4.1
If run with the --tracks
argument, Pilon produces .bed
and .wig
files that may be viewed in genome browsers such as IGV, GenomeView, and other applications that support these formats. The tracks produced by Pilon are as follows:
Track Name | Filename | Description |
---|---|---|
Pilon | pilonPilon.bed | Several classes of issue found by Pilon in a compact format |
Changes | pilonChanges.wig | Changes made by Pilon |
Unconfirmed | pilonUnconfirmed.wig | Non-zero in regions where the input genome was not confirmed |
Copy Number | pilonCopyNumber.wig | The copy number in the genome of the sequence at a given location |
Coverage | pilonCoverage.wig | The sequence coverage at a given position |
Bad Coverage | pilonBadCoverage.wig | The coverage of reads that do not map logically |
Delta Coverage | pilonDeltaCoverage.wig | A measure of local rate-of-change of valid coverage |
Dip Coverage | pilonDipCoverage.wig | A metric designed to identify local dips in coverage, often indicating a contiguity break |
Frag Coverage | pilonFragCoverage.wig | The fragment read coverage at a given position |
Physical Coverage | pilonPhysicalCoverage.wig | The physical coverage at a given position |
GC track | pilonGC.wig | The percent GC of sequence in a 100bp window centered on the given location |
Pct Bad | pilonPctBad.wig | Percentage of invalid reads (usually bad pairing) compared to total depth |
Weighted Qual | pilonWeightedQual.wig | The weighted base quality of reads at a given position |
Weighted MQ | pilonWeightedMq.wig | The weighted mapping quality of reads at given position |
Clipped Alignments | pilonClippedAlignments.wig | A count of how many soft-clipping events started at this locus |
The SD tracks express a metric as standard deviations from the mean of the metric across a given input fasta element. The values are integers representing 0.1 sigma, i.e., a value of -21 means 2.1 standard deviations below the mean.
Many of these tracks were primarily of use in developing Pilon's heuristics for making calls and identifying regions of possible misassembly; they may be removed in a future release.