Bioinformatics data-mining
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

BioMine

This is a toolkit to handle variant and clinical data (see variant classes). Web accessions are included through the webapi class for

  • ClinicalTrials.gov
  • NCBI Entrez services
  • Ensembl ReST API
  • ExAC (Harvard) ReST API

The webapi parent class can be used to more easily create classes for other URL-based web services as well.

The variant classes can be used independently of the webapi's, and they include many useful functions to conform to HGVS variant representations. The BioMine toolkit is used in other software, such as CharGer.

Installation

This package works with Python 2.7. To install: pip install . All dependencies should automatically install.

Note to developers

Please be aware of site's crawler limitations (robots.txt) & rate limitations when creating new classes for web services. Failure to do so can result in IP banning or denial of service.

Modules

Web API - general

Parent class of the included ReST APIs, web services, and HTML

Example to print this readme from github:

site = webAPI.webAPI( "https://github.com/" , "AdamDS/" )
site.action = "WebAPIs"
site.submit()
page = site.parseHTMLResponse()
readme = page.getElementById( 'readme' )
print readme.innerHTML

Clinical Trials - HTML

Based on ClinicalTrials.gov

NCBI: Entrez - service

Based on NCBI's Entrez Programming Utilities (E-utilities)

Search ClinVar

searchClinVar - search terms for ClinVar

returns requests response object

searchPubMed - search terms for PubMed

returns requests response object

NCBI: PubChem - ReST

Based on NCBI's Power User Gateway for PubChem

Compound Synonyms

compoundSynonyms - single compound lookup

returns tab delimited string (searched compound '\t' synonym)

compoundsSynonyms - array of compounds lookup

returns tab delimited array (searched compound '\t' synonym)

compoundsSynonyms2File - array of compounds lookup and write to file

no return

Ensembl: Variant Effect Predictor - ReST

Based on Ensembl's VEP annotator

HGVS genomic variant annotation

Use GRCh37 or GRCh38

Switch to GRCh37 or GRCh38

useGRCh37 - switches to annotation by GRCh37

useGRCh38 - switches to annotation by GRCh38

Annotate HGVS Notation with Genomic Variant

tab delimited annotation as: HGVS notation, chr, start, stop, ref, var, strand, classification

annotateHGVSScalar2Response - annotate a single mutation

returns request response as JSONP

annotateHGVSScalar2tsv - annotate a single mutation

returns dictionary with tsv annotation and error message

annotateHGVSScalarResponse2tsv - annotate a single mutation from response

returns dictionary with tsv annotation and error message

annotateHGVSArray2Dict - annotate an array of mutations

returns dictionary of HGVS notated mutations (key)/ response as JSONP (values)

annotateHGVSArray2tsv - annotate an array of mutations

returns dictionary with tsv annotations and error messages

annotatedHGVSDict2tsv - annotate a dictionary of mutations of mutations (key)/ responses (values)

returns dictionary with tsv annotations and error messages

annotateHGVSArray2File - annotate an array of mutations

no return

ExAC - HTML

Based on [ExAC Browser Beta](exac.broadinstitute.org/)

Variants

getAlleleFrequency - retrieves allele frequency of a variant

returns a number of the given variant