New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue #615 #616
fix issue #615 #616
Conversation
biopython#615 ensure that the filed DEFINITION ends with a period as in Genbank format specifications.
The unit tests failed (see Travis CI) because we can no longer round-trip an arbitrary description. Should the GenBank parser remove any trailing period, or not? If the parser should stay as it is (probably best), then |
The parser does not modify the definition field.
The problem is that the method which raise the error I don't know which strategy to fix the tests
L 80 # Just insist on at least one word in common:
if (old.description or new.description) \
and not set(old.description.split()).intersection(new.description.split()):
raise ValueError("%s versus %s"
% (repr(old.description), repr(new.description))) |
Assuming we don't change the parser then the simplest fix in
with:
Do you think changing the parser to remove the trailing |
BioPerl does not care to the period at the end of definition, But emboss does. input in FASTA format
after conversion in GENBANK
|
And continuing the example, it would appear (that online version of) EMBOSS will take that GenBank file with the period/dot and convert it to a FASTA file with the the period/dot:
Based on this the round trip FASTA -> GenBank -> FASTA with EMBOSS adds a trailing dot/period to the description if not already there. |
We can consider the ending period as an element belonging to the genbank syntax. in this case
genbank
fasta again
But in this case if fasta comments end with a period we will obtain fasta
genbank
fasta again
|
Whatever we do will have side effects. Here's another example FASTA input file to consider, using multiple trailing
|
with my last "algorithm" it will produce
with four periods at the end
the original when it is converted back in fasta. |
OK, let's try that then - can you commit that change, and retest locally (or push the update to your branch and TravisCI will retest it)? |
due to issue biopython/biopython#616 the genbak file in output is not alaways recognize as genebank by squizz. thus integronfinger take care that SeqREcord description property ends with a period.
in GENBANK format the DEFINITION field must ends with a period ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt [3.4.5] So we consider the ending period belonging to the syntax not to the data itself. We remove it in the GenBankScanner and add it in GenBankWriter see biopython#616
don't trigger travis on master
This reverts commit c899a41.
This reverts commit c899a41.
This reverts commit 121bdb5.
I have patch as we discussed the genbank parser and writer the tests passed successfully for the SeqIO module |
# see ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt [3.4.5] | ||
# and discussion https://github.com/biopython/biopython/pull/616 | ||
# So let's add a period | ||
descr += '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the description was the dummy value <unknown description>
we don't want to record ..
but just .
Note For the other tests which verify a specific description, and the doctests which verify an exact string output, these must be updated manually. |
if we use trim we could remove sevral period like '...' and in writer we will add only one.
the definition field in genbank is mandatory. so the parser consider that this period belong to the syntax and remove it the writer add an ending period before serialisation see discussion biopython#616 fix test according to this new behavior
Hi peter, |
Thanks @bneron - I'm currently waiting for the TravisCI tests to finish. Assuming there's nothing further to do, I will probably squash this down into a single commit when I apply it to the master. Would you be happy to be named in the |
On 10/29/2015 11:56 AM, Peter Cock wrote:
thank you, I'm already in CONTRIB file ;-)
Bertrand Néron Institut Pasteur |
Note to self: Examine what changed in |
Rebased/squashed version of this work with related changes here: https://github.com/peterjc/biopython/tree/dots |
Two releases later, I just rebased my branch https://github.com/peterjc/biopython/tree/dots again. If the TravisCI run looks clean I think I'll just merge this... |
#615
ensure that the filed DEFINITION ends with a period as in Genbank
format specifications.