Bank is the representation of a file containing some sequences. Accepted formats are: Fasta and Fastq (plain text or gzip).
Bank is the class you use:
- to open a sequence file
- to iterate over its sequences
- to do some work with Sequence objects
This code snippet illustrates the use of the Bank and Sequence APIs:
# we import pyGATB Bank from gatb import Bank # We will use a file containing some Fasta sequences F_NAME='../thirdparty/gatb-core/gatb-core/test/db/query.fa' # We create the bank representation of the Fasta sequence file bank=Bank(F_NAME) print ("File '%s' is of type: %s"% (bank.uri, bank.type)) nseqs=0 # We iterate over some sequences. for i, seq in enumerate(bank): # 'seq' is of type 'Sequence'. # Accessing 'Sequence' internals is done as follows: # sequence header : seq.comment # sequence quality: seq.quality (Fastq only) # sequence letters: seq.sequence # sequence size : len(seq) seqid=seq.comment.decode("utf-8").split(" ") if i<5: print('%d: %s: %d letters' % (i, seqid, len(seq))) nseqs+=1 print('#sequences: %d' % nseqs)
(This code is taken from here).
Output of this Python3-pyGATB program is:
File '../thirdparty/gatb-core/gatb-core/test/db/query.fa' is of type: fasta 0: ENSTTRP00000007204: 585 letters 1: ENSTTRP00000007206: 232 letters 2: ENSTTRP00000007207: 435 letters 3: ENSTTRP00000007208: 529 letters 4: ENSTTRP00000000008: 529 letters #sequences: 71