New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bio::Sequence.guess issue #14
Comments
Bio::Sequence.guess('ACGTCCGGTGGGGGGCACGTAAGATCTCG\n') Bio::Sequence.guess('ACGTCCGGTGGGGGGCACGTAGCTGATCG\t') Bio::Sequence.guess('ACGTCC\n') |
Current definition in /lib/bio/sequence.rb
So, perhaps the line "total = str.length - cmp['N'] - cmp['n']"
I think this is meaningful because they may input triplet sequence like |
should sequence.composition actually be counting non-sequence characters? (some users might also end up with numbers in their sequence because of copy-pasting from a genbank entry?) |
In fact, the sequence should not contain non-sequence characters. in fasta.rb, the input sequence is processed prior to make a sequence like:
Thus, Bio::Sequence::new() and Bio::Sequence::guess() doesn't consider Note that guess() is marked as "In general, used by developers only, but if you know what you are doing, feel free." User input needs cleanup before passing to Sequence::new(). |
Okay - I hadn't seen the "used by developers only" comment. But without sequence cleaning, the same issues would carry over to Sequence::new(), right?
262: def auto |
Sequence.new() just assigns the input to self without any modification. Since Sequence is a general class, I don't think Sequence::clean() can be defined. For numbers within AA sequence, someone may want to include 1 and 2 for separate serine residues encoded by non neighboring codons. This is perhaps not supported, but not denied either in current implementation. If the user inputs number in a web, it might be better to interpret as gi number or some kind of accession numbers, though. |
ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT" )
=> Bio::Sequence::NA
ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT\n" )
=> Bio::Sequence::AA
whitespace should not affect sequence determination?
and perhaps Bio::Sequence.guess(" ") should throw an error instead of returning AA?
cheers,
yannick
The text was updated successfully, but these errors were encountered: