Skip to content

Commit

Permalink
- Can now take profit of env var 'TAXADB_CONFIG' or a config path at …
Browse files Browse the repository at this point in the history
…object construction to set db parameters connection.

- Updated docuemntation.

Force port number to be int when connecting with peewee

Upated README.md to reflect use of configuration

Test codecov

Cover taxadb only, not test

Added codecov.yml

Remove some pieces of code not used or not reached. Remove some exception not thorwn by method used.

Added .dmp test files

Updated tests and added codeclimate configuration

- Introduced use of config file or environment variable.
- Increased test coverage up to 99%.
- Reviewed some method according to tests. Updated README

Fixed codeclimate warnings. Refactored method to load taxa table data into memory

do not test coverage for taxadb.app.py

Added configuration file <.landscape.yml> for code health checking

Increased test coverage and cleaned code reagarding landascape.io

Adjusted small stuff

Use of configuration file and/or environment variable TAXADB_CONFIG
  • Loading branch information
horkko committed Apr 10, 2017
1 parent 2b51c2b commit 8eb662c
Show file tree
Hide file tree
Showing 26 changed files with 942 additions and 179 deletions.
24 changes: 24 additions & 0 deletions .codeclimate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
engines:
pep8:
enabled: true
duplication:
enabled: true
checks:
Similar Code:
enabled: true
config:
languages:
python:
python_version: 3
fixme:
enabled: true
radon:
enabled: true
config:
python_version: 3

exclude_paths:
- "taxadb/test/*"
ratings:
paths:
- "taxadb/*.py"
3 changes: 3 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ignore:
- "**/test/*.py" # ignore test scripts
- "**/app.py" # ignore main application script
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# taxa_db specific stuff
*accession2taxid*
taxdump*
*.dmp
n*.dmp
gc.prt
readme.txt

Expand Down
11 changes: 11 additions & 0 deletions .landscape.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
doc-warnings: yes
test-warnings: no
strictness: medium
max-line-length: 80
pep8:
full: true
python-targets:
- 3
ignore-paths:
- taxadb/test
- doc
6 changes: 5 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ python:
- '3.5'
install:
- pip install -r requirements.txt
- pip install codecov
- pip install -e .
script: nosetests --tc-file taxadb.ini
script: nosetests --tc-file taxadb.ini --with-coverage --cover-package=taxadb
notifications:
email: false
slack:
matrix:
- sgbcgroup:LQx8YcBhBqAUTON9eMivOuFr
rooms:
secure: Fdzf2EpffyQCgWKt5iDCIE7SiX2ElX4wtwcZPCtmlBL58JjxW5yQnh4Wgh7/cOdVs2HNzq6ANyFoVUe5S/2+bW9bjcDuvRLzoVy5FYZNoDQiT2/AphvYeZ1SkGJQsqVtnW+llOSteKRe3lJDipby+mtn8Qphiq96890IItAm6MV1MU08O24WwnB+bUt/GpUY907Q2e9CTL6JRt0GSPO3+azg/WtL9f0AxeJFcqpjJ2sSqXAqwQorB5QNAZJtSnsUO7TTlt1PMaaVa/g2x5xXcioABP37k+9XykidHj32fN1SgdJulbJuJQazpme0blgq8pdxp0ECnSBLba5zuuHzQo1e9pXZV7yil6jCaIhZclBKFDuwx+8zT8Dut7RzJZPd85u0pwHLr1UWdmH1xUQKWjCmQFvAl+tNviYTe0kUKK8OHJCTLHaZ6uAPqA5p0mfT21tTKds0OMfk00fTDxJx7D5HDSbaRtDbgprhld8l5HGNcbdjYmARbYruONw5unV8F+lgpt2FcIoXoPUIK0l8oXXf8Q3kQM1ER2IE3uA2o8a5oTuQQp+tQ8/sbnd4JWy7JbWdywsXmAX/OFpoujfRyMIVx/Y2JoFJ1EXxPwvYmd8Vnt9N/D89hw8Gnz1T3jKSJI9x7iOEYe55Ef0QRR73XfF+aBcvImFPVF20TNbMbi4=

after_success:
- codecov
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,34 @@ Get the taxonomic information for accession number(s).
('Z12029', 9915)
```

You can also use a configuration file in order to automatically set database
connection parameters at object build. Either set `config` parameter to `__init__`
object method:
```python
>>> from taxadb.accessionid import AccessionID

>>> my_accessions = ['X17276', 'Z12029']
>>> accession = AccessionID(config='/path/to/taxadb.cfg')
>>> taxids = accession.taxid(my_accessions)
>>> ...
```

or set environment variable `TAXADB_CONFIG` which point to configuration file:
```bash
$ export TAXADB_CONFIG='/path/to/taxadb.cfg'
```
then
```python
>>> from taxadb.accessionid import AccessionID

>>> my_accessions = ['X17276', 'Z12029']
>>> accession = AccessionID()
>>> taxids = accession.taxid(my_accessions)
>>> ...
```

Check documentation for more information.

### Creating the Database

#### Download data
Expand Down
2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ In brief Taxadb:
`PostgreSQL <https://www.postgresql.org>`_.
* has available pre-built SQLite databases (:ref:`Download build databases <download>`).
* has a comprehensive :ref:`API documentation <api>`.
* is actively being developped on `GitHub <https://github.com/HadrienG/taxadb.git>`_ and available under the MIT license. Please see the `README <https://github.com/HadrienG/taxadb>`_ for more information on development, support, and contributing.
* is actively being developed on `GitHub <https://github.com/HadrienG/taxadb.git>`_ and available under the MIT license. Please see the `README <https://github.com/HadrienG/taxadb>`_ for more information on development, support, and contributing.

Quickstart
----------
Expand Down
2 changes: 1 addition & 1 deletion doc/taxadb/api.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _api:

API Documentation
==========================
=================

Contents:

Expand Down
70 changes: 70 additions & 0 deletions doc/taxadb/query.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,76 @@ Firstly make sure you have :ref:`downloaded <download>` or :ref:`built <build_ow

Below you can find basic examples. For more complex examples, please refer to the complete :ref:`documentation <api>`.

.. _useconfig:

Using configuration file or environment variable
------------------------------------------------

Taxadb can now take profit of configuration file or environment variable to
set database connection parameters.

* Using configuration file

You can pass a configuration file when building your object:

.. code-block:: python
>>> from taxadb.taxid import TaxID
>>> taxid = TaxID(config='/path/to/taxadb.cfg')
>>> name = taxid.sci_name(33208)
>>> ...
* Configuration file format

The configuration file must use syntax supported by `configparser object
<https://docs.python.org/3.5/library/configparser.html>`_.
You must set database connection parameters in a section called
:code:`DBSETTINGS` as below:

.. code-block:: bash
[DBSETTINGS]
dbtype=<sqlite|postgres|mysql>
dbname=taxadb
hostname=taxadb.domain.org
username=admin
password=s3cr3T
port=
Some value will default it they are not set.

**hostname** will be set to value :code:`localhost` and **port** is set to
:code:`5432` for :code:`dbtype=postgres` and :code:`3306` for
:code:`dbtype=mysql`.

* Using environment variable

Taxadb can as well use an environment variable to automatically point the
application to a configuration file. To take profit of it, just set
:code:`TAXADB_CONFIG` to the path of your configuration file:

.. code-block:: bash
(bash) export TAXADB_CONFIG='/path/to/taxadb.cfg'
(csh) set TAXADB_CONFIG='/path/to/taxadb.cfg'
Then, just create your object as follow:

.. code-block:: python
>>> from taxadb.taxid import TaxID
>>> taxid = Taxid()
>>> name = taxid.sci_name(33208)
>>> ...
.. note::

Arguments passed to object initiation will always overwrite default values
as well as values that might have been set by configuration file or
environment variable :code:`TAXADB_CONFIG`.

.. _taxids:

taxids
Expand Down
78 changes: 38 additions & 40 deletions taxadb/accessionid.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,15 @@


class AccessionID(TaxaDB):

"""Main accession class
Provide methods to request accession table and get associated taxonomy for
accession ids.
Args:
dbtype (:obj:`str`): Database to connect to
dbtype (:obj:`str`): Database type to connect to (`sqlite`, `postgre`,
`mysql`). Default `sqlite`
**kwargs: Arbitrary arguments. Supported (username, password, port,
hostname)
hostname, config, dbtype, dbname)
Raises:
SystemExit: If table `accession` does not exist
Expand Down Expand Up @@ -41,16 +39,16 @@ def taxid(self, acc_number_list):
query = Accession.select().where(
Accession.accession << acc_number_list)
for i in query:
try:
yield (i.accession, i.taxid.ncbi_taxid)
except Taxa.DoesNotExist:
self._unmapped_taxid(i.accession)
yield (i.accession, i.taxid.ncbi_taxid)
# TODO: List accession ID not found?
# for noid in notfound:
# self._unmapped_taxid(noid)

def sci_name(self, acc_number_list):
"""Get taxonomic scientific name for accession ids
Given a list of acession numbers, yield the accession number and their
associated scientific name as tuples
Given a list of accession numbers, yield the accession number and their
associated scientific name as tuples
Args:
acc_number_list (:obj:`list`): a list of accession numbers
Expand All @@ -64,10 +62,10 @@ def sci_name(self, acc_number_list):
query = Accession.select().where(
Accession.accession << acc_number_list)
for i in query:
try:
yield (i.accession, i.taxid.tax_name)
except Taxa.DoesNotExist:
self._unmapped_taxid(i.accession)
yield (i.accession, i.taxid.tax_name)
# TODO: List accession ID not found?
# for noid in notfound:
# self._unmapped_taxid(noid)

def lineage_id(self, acc_number_list):
"""Get taxonomic lineage name for accession ids
Expand All @@ -87,20 +85,20 @@ def lineage_id(self, acc_number_list):
query = Accession.select().where(
Accession.accession << acc_number_list)
for i in query:
try:
lineage_list = []
current_lineage = i.taxid.tax_name
current_lineage_id = i.taxid.ncbi_taxid
parent = i.taxid.parent_taxid
while current_lineage != 'root':
lineage_list.append(current_lineage_id)
new_query = Taxa.get(Taxa.ncbi_taxid == parent)
current_lineage = new_query.tax_name
current_lineage_id = new_query.ncbi_taxid
parent = new_query.parent_taxid
yield (i.accession, lineage_list)
except Taxa.DoesNotExist:
self._unmapped_taxid(i.accession)
lineage_list = []
current_lineage = i.taxid.tax_name
current_lineage_id = i.taxid.ncbi_taxid
parent = i.taxid.parent_taxid
while current_lineage != 'root':
lineage_list.append(current_lineage_id)
new_query = Taxa.get(Taxa.ncbi_taxid == parent)
current_lineage = new_query.tax_name
current_lineage_id = new_query.ncbi_taxid
parent = new_query.parent_taxid
yield (i.accession, lineage_list)
# TODO: List accession ID not found?
# for noid in notfound:
# self._unmapped_taxid(noid)

def lineage_name(self, acc_number_list):
"""Get a lineage name for accession ids
Expand All @@ -120,15 +118,15 @@ def lineage_name(self, acc_number_list):
query = Accession.select().where(
Accession.accession << acc_number_list)
for i in query:
try:
lineage_list = []
current_lineage = i.taxid.tax_name
parent = i.taxid.parent_taxid
while current_lineage != 'root':
lineage_list.append(current_lineage)
new_query = Taxa.get(Taxa.ncbi_taxid == parent)
current_lineage = new_query.tax_name
parent = new_query.parent_taxid
yield (i.accession, lineage_list)
except Taxa.DoesNotExist:
self._unmapped_taxid(i.accession)
lineage_list = []
current_lineage = i.taxid.tax_name
parent = i.taxid.parent_taxid
while current_lineage != 'root':
lineage_list.append(current_lineage)
new_query = Taxa.get(Taxa.ncbi_taxid == parent)
current_lineage = new_query.tax_name
parent = new_query.parent_taxid
yield (i.accession, lineage_list)
# TODO: List accession ID not found?
# for noid in notfound:
# self._unmapped_taxid(noid)
14 changes: 7 additions & 7 deletions taxadb/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@

from taxadb import util
from taxadb.parser import TaxaDumpParser, Accession2TaxidParser
from taxadb.schema import *
from taxadb.schema import DatabaseFactory, db, Taxa, Accession


def download(args):
"""Main function for the 'taxadb download' sub-command.
This function downloads taxump.tar.gz and the content of the accession2taxid
This function downloads taxump.tar.gz and the content of accession2taxid
directory from the ncbi ftp.
Arguments:
Expand All @@ -37,20 +37,20 @@ def download(args):
os.chdir(os.path.abspath(out))

for file in acc_dl_list:
print('Started Downloading %s' % (file))
print('Started Downloading %s' % file)
with ftputil.FTPHost(ncbi_ftp, 'anonymous', 'password') as ncbi:
ncbi.chdir('pub/taxonomy/accession2taxid/')
ncbi.download_if_newer(file, file)
ncbi.download_if_newer(file + '.md5', file + '.md5')
util.md5_check(file)

print('Started Downloading %s' % (taxdump))
print('Started Downloading %s' % taxdump)
with ftputil.FTPHost(ncbi_ftp, 'anonymous', 'password') as ncbi:
ncbi.chdir('pub/taxonomy/')
ncbi.download_if_newer(taxdump, taxdump)
ncbi.download_if_newer(taxdump + '.md5', taxdump + '.md5')
util.md5_check(taxdump)
print('Unpacking %s' % (taxdump))
print('Unpacking %s' % taxdump)
with tarfile.open(taxdump, "r:gz") as tar:
tar.extractall()
tar.close()
Expand Down Expand Up @@ -90,7 +90,7 @@ def create_db(args):
# If taxa table already exists, do not recreate and fill it
# safe=True prevent not to create the table if it already exists
if not Taxa.table_exists():
parser.verbose("Creating table %s" % str(Taxa._meta.db_table))
parser.verbose("Creating table %s" % str(Taxa.get_table_name()))
db.create_table(Taxa, safe=True)
parser = TaxaDumpParser(nodes_file=os.path.join(args.input, 'nodes.dmp'),
names_file=os.path.join(args.input, 'names.dmp'))
Expand Down Expand Up @@ -127,7 +127,7 @@ def create_db(args):
Accession.insert_many(data_dict[0:args.chunk]).execute()
inserted_rows += len(data_dict)
print('%s: %s added to database (%d rows inserted)' % (
Accession._meta.db_table, acc_file, inserted_rows))
Accession.get_table_name(), acc_file, inserted_rows))
print('Sequence: completed')
db.close()

Expand Down
Loading

0 comments on commit 8eb662c

Please sign in to comment.