Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed comparison with GBParsy #39

Open
xapple opened this issue Jul 6, 2023 · 2 comments
Open

Speed comparison with GBParsy #39

xapple opened this issue Jul 6, 2023 · 2 comments

Comments

@xapple
Copy link

xapple commented Jul 6, 2023

I was looking for a fast way of processing large amounts of genbank entries, and found your library. It definitely offers an improvement over biopython, but I'm wondering why did you not include GBParsy in the speed comparison? It is a parser written in pure C, and likely even faster than gb-io.

Lee TH, Kim YK, Nahm BH. GBParsy: a GenBank flatfile parser library with high speed. BMC Bioinformatics. 2008 Jul 25;9:321. doi: 10.1186/1471-2105-9-321. PMID: 18652706; PMCID: PMC2516526.

@althonos
Copy link
Owner

althonos commented Jul 6, 2023

Hi @xapple,

I did not include GBParsy because i was not aware of this project, and since it's not on PyPI it's not exactly the most convenient, tools-included GenBank parser out there. Additionally, I tried to build from source from the GitHub repository you linked, but the code seems quite outdated (it still uses the PyString_FromStringAndSize C API, which was removed from Python 3)...

@xapple
Copy link
Author

xapple commented Jul 6, 2023

Yes, you are right, the code was written in 2008 which is sixteen years ago, and is probably not compatible with the current Python C API. Also, it has not been uploaded to PyPI or conda-forge.

Digging a bit deeper I did realize that the code on the GitHub repository is an export of the old google-code repository and doesn't represent the latest version. The repository has v0.5.0 while the supplementary file of the publication iteself includes v0.6.0 (2008-07-10) at:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2516526/bin/1471-2105-9-321-S1.tgz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants