Skip to content
A Federated Index of Virus Metadata and Hyperdata in Public Repositories
Python Shell Makefile TSQL
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
etc Blank credentials Nov 6, 2019
python
requirements Fix documentation, deploy target, and tests/check-module.sh script, b… Nov 13, 2019
schema combined_known_interactions Nov 6, 2019
tests Remove virtualenv requirement Nov 22, 2019
viral_index
.gitignore Add etc/cred.json to .gitignore Nov 6, 2019
.travis.yml Fix script in .travis.yml Nov 6, 2019
LICENSE Initial commit Oct 28, 2019
Makefile
README.md Add details about JSON file and BigQuery authentication Nov 30, 2019
Virus Indexing and Sequence Quality Team Scope.pdf Flow chart of the team scope Nov 4, 2019
requirements.txt tools to deal with NCBI taxonomy Nov 5, 2019
setup.cfg First draft, skeleton for python API Nov 6, 2019
setup.py Fix documentation, deploy target, and tests/check-module.sh script, b… Nov 13, 2019

README.md

The_Virus_Index

A Federated Index of Virus Metadata and Hyperdata in Public Repositories

API

Status: Extensible DRAFT API

Build Status

https://test.pypi.org/project/viral-index/

Requirements:

Developer instructions

  1. Install the viral-index module
python3 -m venv .env
source .env/bin/activate
pip install -q --extra-index-url https://test.pypi.org/simple/ viral-index 
  1. Configure BigQuery access credentials

Usage of this API requires access to GCP BigQuery. To set up authentication, please follow the instructions in the section "Setting up authentication" in this page. Note: when prompted to save the JSON file with your key downloads, we suggest we save it to a filename without spaces. In that way it's easier to set the GOOGLE_APPLICATION_CREDENTIALS environment variable :)

N.B.: You may be charged for using this API. Please learn more about BigQuery pricing.

  1. Write your code to access the index!

Sample code

>>> from viral_index.client import ViralIndex
>>> viral_client = ViralIndex()
>>> cdd_id = 165276
>>> runs = viral_client.get_SRAs_where_CDD_is_found(cdd_id)
>>> print([r for r in runs])
['SRR2187433', 'SRR533343', 'ERR1915143']
>>> 

>>> pig_taxid = 9823
>>> viruses = viral_client.get_viruses_for_host_taxonomy(pig_taxid)
>>> if viruses is not None:
        for virus in viruses:
            print(virus)
['Rotavirus C', 36427]
['Porcine rubulavirus', 53179]
['Porcine associated porprismacovirus 7', 2170123]
['Porcine enterovirus b/BEL/15V010', 2017720]
[...]
>>>

>>> spacer_seqs=viral_client.get_spacer_seqs(1915496)
>>> print([s for s in spacer_seqs])
[['112', 'CAGCCATCCGCGACGCCACGACAGCGGCCGAGAGTGT', 'GCF_002508705', 'GTDB'], ['1', 'AATCAGCCCGTCGGGGTAGCCAGGGACGCCCTCCA', 'GCF_002508705', 'GTDB'],
[...]

>>> spacer_seq='CACGAGTGCGAAGCATCCAATCCATATGACTACAT'
>>> spacer_tax_ids=viral_client.get_taxid_from_spacer_seq(str(spacer_seq))
>>> print([t for t in spacer_tax_ids])
[['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915496], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915507], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915502], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915504], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915506], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915510], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915499], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915512], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915500], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915495], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915498], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915505], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915508], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915503]]

Additional sample code can be found in python/sample-viral-index-access.py.

Troubleshooting

  1. If you get an error like the one below, it's likely that you don't have Bigquery configured properly for your project. See step 2 in developer instructions above.

    Access Denied: Project {YOUR_PROJECT_HERE}:
    User does not have bigquery.jobs.create permission in project
    {YOUR_PROJECT_HERE}

Maintainer instructions

Maintainer dependencies

  1. make: Run sudo apt-get -y -m update && sudo apt-get install -y make or equivalent command for your system.
  2. python3
  3. GCP SDK

Instructions

  1. Check out the source code: git clone https://github.com/NCBI-Codeathons/The_Virus_Index.git
  2. Set up the python virtual environment: make .env
  3. Enable python virtualenv: source .env/bin/activate
  4. Set up the GCP credentials: export GOOGLE_APPLICATION_CREDENTIALS=${PATH_TO_CREDENTIALS_JSON_FILE}.
  5. Write code that uses viral_index.client.ViralIndex

Automated testing is available in TravisCI.

The Makefile has several targets that may be helpful:

  • .env: initializes the python virtual environment.
  • check_bq: checks command line access to BigQuery (tool availability and authentication).
  • check_python_syntax: checks the syntax of python scripts in this repo.
  • check_taxadb: checks that taxadb was properly installed.
  • check_api: checks that the API can be retrieved from PyPI, runs demo script.
  • init_taxadb: Initializes and configures taxadb (needed for the taxonomy utilities).
  • deploy: Builds a tarball for distribution and uploads it to test.pypi.org (requires twine, contact @christiam).
  • setup_bigquery_authentication: Sample command lines to set up authentication for BigQuery.

The module's version is stored in setup.py.

Bonus: Taxonomy utilities

Dependencies

Initialize taxadb and environment

(Assumes bash and linux)

  1. Download and set up taxadb: Run make init_taxadb (this will take about 2-3 minutes).
  2. Initialize python virtual environment: Run source .env/bin/activate
  3. Set environment variable: export TAXADB_CONFIG=${PWD}/etc/taxadb.cfg

Available tools

  • python/name2taxid.py: takes scientific names on standard input or input files (spelling is significant) and outputs NCBI taxonomy IDs.
  • python/taxid2lineage.py: takes NCBI taxonomy IDs on standard input (or input files) and outputs the lineage for that given taxid.

Future work

  • Review data in BigQuery and integrate it better with the API
You can’t perform that action at this time.