-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FASTA parsing error #232
Comments
Thanks for reporting, I can reproduce the error. One problem is that Selenocysteine ( fasta_file = fasta.FastaFile.read('sequence.fasta')
seq_dict = {
header: seq.ProteinSequence(seq_str.replace("U", "C"))
for header, seq_str in fasta_file.items()
}
sequences = list(seq_dict.values())
print(type(sequences[0]))
print(sequences[0])
|
Thanks for the work-around, I'll give it a try. Regarding Selenocysteine, I find the error message confusing (the first msg points to F when in fact U is the culprit; the second does not tell, whether biotite tried to parse it as nucleic acid or protein sequence). Perhaps the first could be improved by mentioning missing Selenocysteine support, and the second by having individual error messages for nucleotide and protein? |
The problem is that the first error message is raised by the |
I decided for the solution that selenocysteine is automatically converted into cysteine, when |
While parsing a larger FASTA file, I hit an error for a specific protein sequence (see below).
I could reproduce this behaviour with smaller sequences down to
FU
(not, however,MU
). It seems, some sequences with charU
may be recognized as NucleotideSequence instead of ProteinSequence.Steps to reproduce
Error
The text was updated successfully, but these errors were encountered: