These scripts will help you to fetch taxonomic information from a blast result file.
gi_taxid_search.py
: This module contains theGItaxidFinder
class. This class will fetch a taxonid given a ginumber and thegi_taxid_nucl.dmp
file located on your computer. I described this module in detail in my blogpost.blasthittaxonomy.py
: This module contains theTaxonomyFetcher
class which will fetch taxonomic information from NCBI using the taxonid fetched by theGItaxidFinder
class. I described this module in detail in another blogpost.
Testgi_taxid_search.py
: unittests for theGItaxidFinder
class.TestTaxonomyFetcher.py
: unittests for theTaxonomyFetcher
class. This one will not work offline as it actually tries to reach the NCBI servers.
parsexmlblast.py
: this script will transform your BLAST xml file into a tsv.addTaxonomyToBlastOutput.py
: this script will read in the tsv file created byparsexmlblast.py
and add taxonomic information to it using the modules in this repo.
- You might not be happy with the tsv file in between. It was a custom format I used for doing my own stuff so I suggest you change it if you want to. You only need it if you want to run
addTaxonomyToBlastOutput.py
as is. You could also useaddTaxonomyToBlastOutput.py
as a demonstration of how to use the classes ingi_taxid_search.py
andblasthittaxonomy.py
and write your own wrapper script. - Be sure to write your own email address in
blasthittaxonomy.py
. When I wrote these scripts, NCBI did not require you to do so but that could have changed by now. I think it is just a polite thing to do. - Make sure to get a copy of the
gi_taxid_nucl.dmp
file (I got it from NCBI's ftp servers) and refer to it inblasthittaxonomy.py
. By default,blasthittaxonomy.py
will look for this file in the current working directory, so you can also put it (or a link to it) there.
- Run
parsexmlblast.py
:./parsexmlblast.py <blastresultsfile>
Make sure the blastresultsfile is in xml. - Run
addTaxonomyToBlastOutput.py
:./addTaxonomyToBlastOutput.py <blast output file>
If you want to cite this repository, I recommend you use the following information:
- Author name, as described in the LICENSE file,
- The url to this repository, or more precisely, to the version of this repository you used. See the tab releases. For instance
https://github.com/bartaelterman/BlastTaxonomy/releases/tag/v0.2
. - You can use the name of this repository and/or its description as stated at the top of this repository's home page.
Thank you for your interest. If you encouter any problems or if you have optimized some stuff, let me know by reporting an issue or send a pull request.