You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking for a fast way of processing large amounts of genbank entries, and found your library. It definitely offers an improvement over biopython, but I'm wondering why did you not include GBParsy in the speed comparison? It is a parser written in pure C, and likely even faster than gb-io.
Lee TH, Kim YK, Nahm BH. GBParsy: a GenBank flatfile parser library with high speed. BMC Bioinformatics. 2008 Jul 25;9:321. doi: 10.1186/1471-2105-9-321. PMID: 18652706; PMCID: PMC2516526.
I did not include GBParsy because i was not aware of this project, and since it's not on PyPI it's not exactly the most convenient, tools-included GenBank parser out there. Additionally, I tried to build from source from the GitHub repository you linked, but the code seems quite outdated (it still uses the PyString_FromStringAndSize C API, which was removed from Python 3)...
Yes, you are right, the code was written in 2008 which is sixteen years ago, and is probably not compatible with the current Python C API. Also, it has not been uploaded to PyPI or conda-forge.
Digging a bit deeper I did realize that the code on the GitHub repository is an export of the old google-code repository and doesn't represent the latest version. The repository has v0.5.0 while the supplementary file of the publication iteself includes v0.6.0 (2008-07-10) at:
I was looking for a fast way of processing large amounts of genbank entries, and found your library. It definitely offers an improvement over
biopython
, but I'm wondering why did you not include GBParsy in the speed comparison? It is a parser written in pure C, and likely even faster thangb-io
.The text was updated successfully, but these errors were encountered: