You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of my students encountered this and I was able to reproduce it with biotite version 0.29.0 as well as a fresh checkout from HEAD:
When performing a blast search with BlastWebApp on swissprot, the query sequence was a valid amino acid sequence. However, BlastWebApp encountered an AlphabetError when parsing the results:
Example Code to reproduce the error:
frombiotite.sequence.ioimportfastafrombiotite.applicationimportblastimportbiotite.sequenceasseq# Downloading sequence file_name="sequence.txt"fasta_file=fasta.FastaFile.read(file_name)
ref_seq=fasta.get_sequence(fasta_file)
print(f"U in sequence? {'U'inref_seq}") # > False#Find homologous proteins using NCBI Blast# Search only the UniProt/SwissProt databaseref_seq=seq.ProteinSequence(ref_seq)
blast_app=blast.BlastWebApp("blastp", ref_seq, "swissprot")
blast_app.start()
blast_app.join() # > Alphabet Error
Tracelog
Traceback (most recent call last):
File "biotite_bug.py", line 17, in <module>
blast_app.join()
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/application/application.py", line 58, in wrapper
return func(*args, **kwargs)
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/application/application.py", line 153, in join
self.evaluate()
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/application/blast/webapp.py", line 346, in evaluate
seq2 = ProteinSequence(seq2_str.replace("-", ""))
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/seqtypes.py", line 473, in __init__
super().__init__(sequence)
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/sequence.py", line 147, in __init__
self.symbols = sequence
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/sequence.py", line 183, in symbols
self._seq_code = alph.encode_multiple(value, dtype)
File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/alphabet.py", line 390, in encode_multiple
return encode_chars(alphabet=self._symbols, symbols=symbols)
File "src/biotite/sequence/codec.pyx", line 74, in biotite.sequence.codec.encode_chars
biotite.sequence.AlphabetError: Symbol 'U' is not in the alphabet
It seems like one of the sequences swissprot returned contains a U (I asserted with some debug print statements) and BlastWebApp is not handling this properly yet. Maybe we can move the fix from #246 from the fasta class to ProteinSequence?
Here is the query sequence to reproduce this sequence.txt.
Also linking #232 here which was a similar issue.
The text was updated successfully, but these errors were encountered:
One of my students encountered this and I was able to reproduce it with biotite version 0.29.0 as well as a fresh checkout from HEAD:
When performing a blast search with
BlastWebApp
on swissprot, the query sequence was a valid amino acid sequence. However,BlastWebApp
encountered anAlphabetError
when parsing the results:Example Code to reproduce the error:
Tracelog
It seems like one of the sequences swissprot returned contains a
U
(I asserted with some debug print statements) and BlastWebApp is not handling this properly yet. Maybe we can move the fix from #246 from the fasta class to ProteinSequence?Here is the query sequence to reproduce this sequence.txt.
Also linking #232 here which was a similar issue.
The text was updated successfully, but these errors were encountered: