To output unaligned PacBio reads in FASTA and FASTQ

Vladimir Rainish edited this page May 18, 2016 · 3 revisions

In order to output unaligned PacBio reads in FASTA format, use blasr --unaligned option . Example:

$ blasr input.fofn ref.fasta --unaligned unaligned.fasta

blasr only output unaligned PacBio reads in FASTA format, in order to get unaligned PacBio reads in FASTQ format, please try the following python script. You will need to install pbcore and have SMRTCells (input.fofn) available.

#!/usr/bin/env python
from pbcore.io import *

fofn = "input.fofn" # input fofn of SMRTCells
unaligned_fa = "unaligned.fasta" # input unaligned PacBio reads in fasta file
unaligned_fq = "unaligned.fastq" # output unaligned PacBio Reads in fastq file

# Scan over bas.h5 file in fofn
h5 = BasH5Collection(fofn)

# output fastq writer
fqwriter = FastqWriter(unaligned_fq)

for read in FastaReader(unaligned_fa):
    # iterate over unaligned reads
    subread = h5[read.header]
    # write to output fastq
    fqwriter.writeRecord(subread.readName, subread.basecalls(), subread.qv("QualityValue"))
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.