Skip to content

bartaelterman/BlastTaxonomy

Repository files navigation

BLAST taxonomy

These scripts will help you to fetch taxonomic information from a blast result file.

Contents

modules

  • gi_taxid_search.py: This module contains the GItaxidFinder class. This class will fetch a taxonid given a ginumber and the gi_taxid_nucl.dmp file located on your computer. I described this module in detail in my blogpost.
  • blasthittaxonomy.py: This module contains the TaxonomyFetcher class which will fetch taxonomic information from NCBI using the taxonid fetched by the GItaxidFinder class. I described this module in detail in another blogpost.

unit tests

  • Testgi_taxid_search.py: unittests for the GItaxidFinder class.
  • TestTaxonomyFetcher.py: unittests for the TaxonomyFetcher class. This one will not work offline as it actually tries to reach the NCBI servers.

scripts

  • parsexmlblast.py: this script will transform your BLAST xml file into a tsv.
  • addTaxonomyToBlastOutput.py: this script will read in the tsv file created by parsexmlblast.py and add taxonomic information to it using the modules in this repo.

Before you run it

  • You might not be happy with the tsv file in between. It was a custom format I used for doing my own stuff so I suggest you change it if you want to. You only need it if you want to run addTaxonomyToBlastOutput.py as is. You could also use addTaxonomyToBlastOutput.py as a demonstration of how to use the classes in gi_taxid_search.py and blasthittaxonomy.py and write your own wrapper script.
  • Be sure to write your own email address in blasthittaxonomy.py. When I wrote these scripts, NCBI did not require you to do so but that could have changed by now. I think it is just a polite thing to do.
  • Make sure to get a copy of the gi_taxid_nucl.dmp file (I got it from NCBI's ftp servers) and refer to it in blasthittaxonomy.py. By default, blasthittaxonomy.py will look for this file in the current working directory, so you can also put it (or a link to it) there.

Run the stuff

  • Run parsexmlblast.py: ./parsexmlblast.py <blastresultsfile> Make sure the blastresultsfile is in xml.
  • Run addTaxonomyToBlastOutput.py: ./addTaxonomyToBlastOutput.py <blast output file>

Cite this repository

If you want to cite this repository, I recommend you use the following information:

  • Author name, as described in the LICENSE file,
  • The url to this repository, or more precisely, to the version of this repository you used. See the tab releases. For instance https://github.com/bartaelterman/BlastTaxonomy/releases/tag/v0.2.
  • You can use the name of this repository and/or its description as stated at the top of this repository's home page.

Thanks

Thank you for your interest. If you encouter any problems or if you have optimized some stuff, let me know by reporting an issue or send a pull request.

About

Fetch taxonomic information from BLAST hits

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages