New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add 'sourmash lca' commands for kraken-style lowest-common-ancestor calculations. #367

Merged
merged 83 commits into from Dec 2, 2017

Conversation

Projects
None yet
4 participants
@ctb
Member

ctb commented Nov 5, 2017

New commands:

  • sourmash lca index taxonomy.csv save.db [list of signatures] - build save.db using given taxonomy and list of signatures.
  • sourmash lca classify --db save.db --query [list of signatures] - output a taxonomic classification of query signatures.
  • sourmash lca summarize --db save.db --query [list of signatures] - output a taxonomic summarization of query signatures.
  • sourmash lca rankinfo [list of databases] - output summary of database content by taxonomic level

TODO:

  • allow gzipped .lca.json files
  • support indexing of SBT directly?
  • add --traverse-directory
  • move functions out of command_classify.py
  • share taxlist
  • have classify output summary info as in this blog post
  • add metacodeR-compatible output as here
  • figure out how to detect and report taxonomy mismatches when merging databases

In the future:

  • output k-mers directly, not just hashes
  • include (?) validation/consonance of multiple tax dbs a la genbank double check

Ref #302.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?
@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Nov 5, 2017

Codecov Report

Merging #367 into master will increase coverage by 0.6%.
The diff coverage is 90.14%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #367     +/-   ##
=========================================
+ Coverage   87.42%   88.03%   +0.6%     
=========================================
  Files          15       23      +8     
  Lines        2338     3016    +678     
  Branches       36       36             
=========================================
+ Hits         2044     2655    +611     
- Misses        293      360     +67     
  Partials        1        1
Impacted Files Coverage Δ
sourmash_lib/sourmash_args.py 94.48% <100%> (+0.11%) ⬆️
sourmash_lib/lca/__init__.py 100% <100%> (ø)
sourmash_lib/__main__.py 100% <100%> (ø) ⬆️
sourmash_lib/lca/__main__.py 78.94% <78.94%> (ø)
sourmash_lib/lca/command_summarize.py 83.33% <83.33%> (ø)
sourmash_lib/lca/command_classify.py 87.2% <87.2%> (ø)
sourmash_lib/lca/command_index.py 89.14% <89.14%> (ø)
sourmash_lib/lca/command_rankinfo.py 89.36% <89.36%> (ø)
sourmash_lib/lca/command_compare_csv.py 92.3% <92.3%> (ø)
sourmash_lib/lca/lca_utils.py 95.97% <95.97%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29af52b...cc643fe. Read the comment docs.

codecov-io commented Nov 5, 2017

Codecov Report

Merging #367 into master will increase coverage by 0.6%.
The diff coverage is 90.14%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #367     +/-   ##
=========================================
+ Coverage   87.42%   88.03%   +0.6%     
=========================================
  Files          15       23      +8     
  Lines        2338     3016    +678     
  Branches       36       36             
=========================================
+ Hits         2044     2655    +611     
- Misses        293      360     +67     
  Partials        1        1
Impacted Files Coverage Δ
sourmash_lib/sourmash_args.py 94.48% <100%> (+0.11%) ⬆️
sourmash_lib/lca/__init__.py 100% <100%> (ø)
sourmash_lib/__main__.py 100% <100%> (ø) ⬆️
sourmash_lib/lca/__main__.py 78.94% <78.94%> (ø)
sourmash_lib/lca/command_summarize.py 83.33% <83.33%> (ø)
sourmash_lib/lca/command_classify.py 87.2% <87.2%> (ø)
sourmash_lib/lca/command_index.py 89.14% <89.14%> (ø)
sourmash_lib/lca/command_rankinfo.py 89.36% <89.36%> (ø)
sourmash_lib/lca/command_compare_csv.py 92.3% <92.3%> (ø)
sourmash_lib/lca/lca_utils.py 95.97% <95.97%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29af52b...cc643fe. Read the comment docs.

@luizirber

This comment has been minimized.

Show comment
Hide comment
@luizirber

luizirber Nov 6, 2017

Member

I was thinking about doing a group code review on this with @bluegenes and other people in the lab, do you think it's ready for a first pass?

Member

luizirber commented Nov 6, 2017

I was thinking about doing a group code review on this with @bluegenes and other people in the lab, do you think it's ready for a first pass?

@ctb

This comment has been minimized.

Show comment
Hide comment
@ctb

ctb Nov 6, 2017

Member
Member

ctb commented Nov 6, 2017

Show outdated Hide outdated sourmash_lib/lca/__main__.py Outdated

@ctb ctb referenced this pull request Nov 25, 2017

Open

[WIP] add a 'lca search' command #377

0 of 5 tasks complete
@luizirber

This comment has been minimized.

Show comment
Hide comment
@luizirber

luizirber Nov 27, 2017

Member

@ctb tests/test-data/lca-root/tax.csv is missing

Member

luizirber commented Nov 27, 2017

@ctb tests/test-data/lca-root/tax.csv is missing

@luizirber

This comment has been minimized.

Show comment
Hide comment
@luizirber

luizirber Nov 27, 2017

Member

It is also breaking on py27 because sourmash_lib/lca/command_index.py is opening the delmont supp table file as rt, but it contains windows newline chars (\r\n). One way to fix it is opening the CSV as rtU (for universal newlines) in this line.

Member

luizirber commented Nov 27, 2017

It is also breaking on py27 because sourmash_lib/lca/command_index.py is opening the delmont supp table file as rt, but it contains windows newline chars (\r\n). One way to fix it is opening the CSV as rtU (for universal newlines) in this line.

@ctb

This comment has been minimized.

Show comment
Hide comment
@ctb

ctb Nov 27, 2017

Member
Member

ctb commented Nov 27, 2017

@luizirber

This comment has been minimized.

Show comment
Hide comment
@luizirber

luizirber Nov 27, 2017

Member

Another one missing: tests/test-data/lca-root/TOBG_MED-875.fna.gz.sig

Member

luizirber commented Nov 27, 2017

Another one missing: tests/test-data/lca-root/TOBG_MED-875.fna.gz.sig

@ctb

This comment has been minimized.

Show comment
Hide comment
@ctb

ctb Nov 27, 2017

Member
Member

ctb commented Nov 27, 2017

@luizirber

A few comments (punt to issues?), but that filter_null would be better as a function right now =]

First, install sourmash from the LCA branch:
```
pip install -U https://github.com/dib-lab/sourmash/archive/add/lca.zip

This comment has been minimized.

@luizirber

luizirber Nov 28, 2017

Member

Probably bump version to 2.0.0a3 after merging this, and saying to install a version >=2.0.0a3 instead?

@luizirber

luizirber Nov 28, 2017

Member

Probably bump version to 2.0.0a3 after merging this, and saying to install a version >=2.0.0a3 instead?

Show outdated Hide outdated doc/tutorials-lca.md Outdated
# filter function toreplace blank/na/null with 'unassigned'
filter_null = lambda x: 'unassigned' if x.strip() in \
('[Blank]', 'na', 'null', '') else x
null_names = set(['[Blank]', 'na', 'null'])

This comment has been minimized.

@luizirber

luizirber Nov 28, 2017

Member

make null_names into a constant (frozenset?), use it in filter_null (defined as a function, not as a lambda).

NULL_NAMES = {'[Blank]', 'na', 'null', ''}

def filter_null(x):
    if x.strip() in NULL_NAMES:
        return 'unassigned'
    return x
@luizirber

luizirber Nov 28, 2017

Member

make null_names into a constant (frozenset?), use it in filter_null (defined as a function, not as a lambda).

NULL_NAMES = {'[Blank]', 'na', 'null', ''}

def filter_null(x):
    if x.strip() in NULL_NAMES:
        return 'unassigned'
    return x
def debug(*args):
if _print_debug:
pprint.pprint(args)

This comment has been minimized.

@luizirber

luizirber Nov 28, 2017

Member

Should this debug function be moved into sourmash_lib/logging.py instead?

@luizirber

luizirber Nov 28, 2017

Member

Should this debug function be moved into sourmash_lib/logging.py instead?

lineage_dict[int(k)] = tuple(vv)
# convert hashval -> lineage index keys to integers (looks like
# JSON doesn't have a 64 bit type so stores them as strings)

This comment has been minimized.

@luizirber

luizirber Nov 28, 2017

Member

Wait, what? JS doesn't support 64-bit ints, but the python JSON module does the right thing and save the number anyway (it should be up to JS to do the proper thing).

(NOTE: all signatures would be wrong if this assumption was true...)

@luizirber

luizirber Nov 28, 2017

Member

Wait, what? JS doesn't support 64-bit ints, but the python JSON module does the right thing and save the number anyway (it should be up to JS to do the proper thing).

(NOTE: all signatures would be wrong if this assumption was true...)

@luizirber luizirber merged commit 47e2c31 into master Dec 2, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pyup.io/safety-ci No dependencies with known security vulnerabilities.
Details

@luizirber luizirber deleted the add/lca branch Dec 2, 2017

@luizirber luizirber restored the add/lca branch Dec 6, 2017

@luizirber

This comment has been minimized.

Show comment
Hide comment
@luizirber

luizirber Dec 6, 2017

Member

(Let's keep this branch around because there are public tutorials using it)

Member

luizirber commented Dec 6, 2017

(Let's keep this branch around because there are public tutorials using it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment