New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Create an Index abstract base class #556

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@luizirber
Copy link
Member

luizirber commented Oct 5, 2018

From #545 (comment):

After the LCA index was implemented we figured it has a pretty similar API to the SBT 
index, and with the interested in other indexes it seems it's a good time to make a Index 
interface/base class and support more options.

Another way to test this abc is to define a Linear index: save all the signatures in a list/dictionary, search thru it linearly. It's also good as a base case for testing if SBT/LCA are finding all matches correctly (there are some test cases that already do this)

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?
@olgabot

This comment has been minimized.

Copy link
Contributor

olgabot commented Oct 12, 2018

Exciting!!

@codecov

This comment has been minimized.

Copy link

codecov bot commented Oct 13, 2018

Codecov Report

Merging #556 into master will decrease coverage by 0.09%.
The diff coverage is 83.82%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #556     +/-   ##
=========================================
- Coverage   89.44%   89.34%   -0.1%     
=========================================
  Files          27       28      +1     
  Lines        4227     4291     +64     
  Branches       39       39             
=========================================
+ Hits         3781     3834     +53     
- Misses        446      457     +11
Impacted Files Coverage Δ
sourmash/commands.py 89.09% <100%> (ø) ⬆️
sourmash/lca/lca_utils.py 94.63% <60%> (-1.4%) ⬇️
sourmash/sbt.py 85.54% <80%> (-0.17%) ⬇️
sourmash/index.py 89.36% <89.36%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ceb9b17...8cd5dd1. Read the comment docs.

@luizirber luizirber referenced this pull request Oct 24, 2018

Merged

[WIP] SBT #17

@luizirber luizirber referenced this pull request Dec 14, 2018

Open

[MRG] Replacing C++ with Rust #424

0 of 5 tasks complete
@ctb

This comment has been minimized.

Copy link
Member

ctb commented Dec 17, 2018

@luizirber once #533 is merged I plan to work on this next & clean up some API stuff. May I haz?

@luizirber luizirber force-pushed the refactor/index branch from e91153f to 9d32f4c Dec 18, 2018

@ctb

This comment has been minimized.

Copy link
Member

ctb commented Dec 20, 2018

Thought: I think we should provide at least two specialized functions on Index objects, equivalent to 'search' and 'gather'. The former would return matches with best Jaccard similarity, the latter would find matches with best containment. Since this is currently 100% of needed functionality it would be (IMO) be better than our current bending-over-backwards approach to working off of a generic 'find' function.

@luizirber

This comment has been minimized.

Copy link
Member

luizirber commented Dec 20, 2018

@ctb I like that! find is still useful for research (it's easier to define a function to pass to it than adding a method, because the DFS/BFS comes for free), but it is quite annoying for 'production' (or, let's say, when you need to cross language boundaries thru FFI =P)

I don't think there is a way of marking a method 'optional' with abc, so maybe remove find and add search and gather?

@ctb

This comment has been minimized.

Copy link
Member

ctb commented Dec 20, 2018

@ctb ctb referenced this pull request Dec 26, 2018

Merged

[MRG] add --scaled to sbt index to downsample signatures #378

5 of 5 tasks complete
@ctb

This comment has been minimized.

Copy link
Member

ctb commented Jan 5, 2019

hey @luizirber your thoughts on LinearIndex.search on this branch would be most welcome. Is that the right way to write an implementation, wrt keyword args etc etc? Inquiring minds...

""" """

@abstractmethod
def search(self, signature, *args, **kwargs):

This comment has been minimized.

@luizirber

luizirber Jan 5, 2019

Member

Or even defining search like this:

def search(self, signature, *, 
                 threshold=None,
                 do_containment=False, 
                 ignore_abundance=False,
                 **kwargs):

without *args, with default keyword arguments.

(the * in the middle is a Python 3 thing, but when this get merged I think we will have dropped Python 2 anyway?)

@ctb ctb changed the title [WIP] Create an Index abc [WIP] Create an Index abstract base class Jan 9, 2019

@luizirber

This comment has been minimized.

Copy link
Member

luizirber commented Jan 11, 2019

Tentative: add stub files with typing information too
https://mypy.readthedocs.io/en/stable/stubs.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment