Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selenocysteine strikes again: AlphabetError in BlastWebApp #344

Closed
dnlbauer opened this issue Aug 4, 2021 · 1 comment · Fixed by #348
Closed

Selenocysteine strikes again: AlphabetError in BlastWebApp #344

dnlbauer opened this issue Aug 4, 2021 · 1 comment · Fixed by #348

Comments

@dnlbauer
Copy link
Collaborator

dnlbauer commented Aug 4, 2021

One of my students encountered this and I was able to reproduce it with biotite version 0.29.0 as well as a fresh checkout from HEAD:

When performing a blast search with BlastWebApp on swissprot, the query sequence was a valid amino acid sequence. However, BlastWebApp encountered an AlphabetError when parsing the results:

Example Code to reproduce the error:

from biotite.sequence.io import fasta
from biotite.application import blast
import biotite.sequence as seq

# Downloading sequence 
file_name = "sequence.txt"
fasta_file = fasta.FastaFile.read(file_name)
ref_seq = fasta.get_sequence(fasta_file)
print(f"U in sequence? {'U' in ref_seq}") # > False

#Find homologous proteins using NCBI Blast
# Search only the UniProt/SwissProt database
ref_seq = seq.ProteinSequence(ref_seq)
blast_app = blast.BlastWebApp("blastp", ref_seq, "swissprot")
blast_app.start()
blast_app.join() # > Alphabet Error

Tracelog

Traceback (most recent call last):
  File "biotite_bug.py", line 17, in <module>
    blast_app.join() 
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/application/application.py", line 58, in wrapper
    return func(*args, **kwargs)
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/application/application.py", line 153, in join
    self.evaluate()
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/application/blast/webapp.py", line 346, in evaluate
    seq2 = ProteinSequence(seq2_str.replace("-", ""))
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/seqtypes.py", line 473, in __init__
    super().__init__(sequence)
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/sequence.py", line 147, in __init__
    self.symbols = sequence
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/sequence.py", line 183, in symbols
    self._seq_code = alph.encode_multiple(value, dtype)
  File "/home/bauer/conda/lib/python3.7/site-packages/biotite-0.29.0-py3.7-linux-x86_64.egg/biotite/sequence/alphabet.py", line 390, in encode_multiple
    return encode_chars(alphabet=self._symbols, symbols=symbols)
  File "src/biotite/sequence/codec.pyx", line 74, in biotite.sequence.codec.encode_chars
biotite.sequence.AlphabetError: Symbol 'U' is not in the alphabet

It seems like one of the sequences swissprot returned contains a U (I asserted with some debug print statements) and BlastWebApp is not handling this properly yet. Maybe we can move the fix from #246 from the fasta class to ProteinSequence?

Here is the query sequence to reproduce this sequence.txt.
Also linking #232 here which was a similar issue.

@padix-key
Copy link
Member

Thank you for the report. Yes, it is indeed related to the same issue. The lines causing this problem are

seq1 = ProteinSequence(seq1_str.replace("-", ""))
seq2 = ProteinSequence(seq2_str.replace("-", ""))

because the sequence string is given unchecked to the ProteinSequence, so U is not converted into C.

padix-key added a commit to padix-key/biotite that referenced this issue Aug 26, 2021
@padix-key padix-key mentioned this issue Aug 26, 2021
padix-key added a commit that referenced this issue Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants