New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenBankWriter does not use information in the original header #942
Comments
Hi, This is the spec from the above link, so I guess what is correct behaviour is a little undefined.
Looking at the GenbankParser, it ignores fields 4 and 5 of the input document LOCUS row and instead generates them from other information gleaned from the sequence - so it does lose the original MoleculeType and Division information? There is only 1 field that modification date can be stored. You could argue that you've modified the Genbank record, so the modificationDate should reflect that? I suppose a solution might be
|
Do I understand it right that #1042 fully fixes this issue? |
It fixes the issue with of being able to maintain the locus line details yes I need to open another ticket around the other features of a GenBank file header that are not correctly reproduced when using the GenbankWriter that I have discovered while doing this work |
GenBankWriter does not use information in the original header #942
Hello,
I have recently been working on adding an accession ID to a few hundred GenBank files using biojava.
I have been using the following approach:
This works for inserting the new accession ID however information is lost in the locus line. As rather than using information from the original file a new locus line is created using default settings.
For example if I update a GenBank file that contains the following original locus line:
LOCUS test_locus_name 9291 BP DS-DNA CIRCULAR SYN 13-JUL-1994
The GenBank file that gets written by the writeNucleotideSequence() method will look like this:
LOCUS new_accession 9291 bp DNA circular 12-Jul-2021
We therefore loose the following:
I would argue that there should be another way to write a GenBank file from a DNASequence that could use an original header so no information is lost through the processes of reading and writing the same file.
I would be interested to know what you think about this?
Many thanks,
James
The text was updated successfully, but these errors were encountered: