More consolidation of Hashtable derived types. #1504

Merged
merged 34 commits into from Nov 15, 2016

Conversation

Projects
None yet
4 participants
Owner

ctb commented Nov 3, 2016 edited

Briefly, this PR:

  • interposes a new Hashgraph CPython class that inherits from Hashtable and becomes a new base for Countgraph and Nodegraph (in the CPython interface).
  • articulates the C++ internals so that they use Hashtable and Hashgraph appropriately (as well as Countgraph etc. where absolutely necessary).
  • renames the C++ and CPython API to match: Counttable, Nodetable, Countgraph, Nodegraph.

These changes make a clear distinction between 'tables' and 'graphs' - tables have all of
the basic functionality needed for counting, while graphs support various traversal methods.
This paves the way for:

(a) adding new hash functions, including irreversible ones supporting k > 32 for the *table objects; and
(b) building out a new Counttable CPython object that will support the non-graph operations for k > 32.

(I don't like the 'Counttable' name that much but it fits with Countgraph. I suppose we could do Countstable. But that might engender confusion. Thoughts?)

  • Is it mergeable?
  • make test Did it pass the tests?
  • make clean diff-cover If it introduces new functionality in
    scripts/ is it tested?
  • make format diff_pylint_report cppcheck doc pydocstyle Is it well
    formatted?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Is it documented in the ChangeLog?
    http://en.wikipedia.org/wiki/Changelog#Format
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Do the changes respect streaming IO? (Are they
    tested for streaming IO?)
  • Is the Copyright year up to date?

codecov-io commented Nov 3, 2016 edited

Current coverage is 95.76% (diff: 100%)

Merging #1504 into master will increase coverage by 0.02%

@@             master      #1504   diff @@
==========================================
  Files            36         36          
  Lines          2938       2952    +14   
  Methods           0          0          
  Messages          0          0          
  Branches        449        449          
==========================================
+ Hits           2813       2827    +14   
  Misses           55         55          
  Partials         70         70          

Powered by Codecov. Last update e2d1132...6303cb8

Owner

ctb commented Nov 8, 2016

Ready for initial review, y'all. @betatim @luizirber @camillescott @standage. Might be easier to go commit by commit :(.

Still have to check out my modifications to the abundance dist functions, and I'm not sure if I should add tests for Counttable and Nodetable on this PR, but I think most of it is done.

Member

betatim commented Nov 10, 2016

Prefer Counttable over Countstable. More substantial comments later

Owner

ctb commented Nov 10, 2016

  • look into using our saner inheritance hierarchy to simplify hashtable extraction in the CPython code.
khmer/_cpy_counttable.hh
+ Counttable * counttable;
+} khmer_KCounttable_Object;
+
+static PyMethodDef khmer_counttable_methods[] = {
@betatim

betatim Nov 14, 2016

Member

Do we need this here if it is just empty?

@ctb

ctb Nov 15, 2016

Owner

I went both ways on this - it's nice to have it there for if/when we add methods... but yeah, I guess no need.

khmer/_cpy_hashgraph.hh
+ 0, /*tp_setattro*/
+ 0, /*tp_as_buffer*/
+ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
+ "hashgraph object", /* tp_doc */
@betatim

betatim Nov 14, 2016

Member

Delete all the following lines

khmer/_cpy_hashgraph.hh
+ 0, /* tp_new */
+};
+
+#define is_hashgraph_obj(v) (Py_TYPE(v) == &khmer_KHashgraph_Type)
@betatim

betatim Nov 14, 2016

Member

delete me

+ ++since;
+ }
+ }
+#else
@betatim

betatim Nov 14, 2016

Member

Should we keep this around or delete?

@camillescott

camillescott Nov 15, 2016

Member

I'd like to see these bits of code detritus go, but I think perhaps that should get its own PR -- there's a fair amount of it.

lib/hashtable.cc
- // Iterate through the reads and consume their k-mers.
- while (!parser->is_complete( )) {
+ BoundedCounterType max_count = 0;
@betatim

betatim Nov 14, 2016

Member

If you move this up to be the first definition in the method you (probably) don't have to pay for a copy when returning it.

@ctb

ctb Nov 15, 2016

Owner

OK, but I don't understand why :)

@betatim

betatim Nov 15, 2016

Member

https://en.wikipedia.org/wiki/Return_value_optimization it probably isn't that exciting in this case but for other/bigger/expensive to copy types it might be more interesting

Member

betatim commented Nov 14, 2016

I assume that the substantial bits of code that were moved didn't get modified when you moved them because the tests still pass.

+
+ if (self != NULL) {
+ WordLength k = 0;
+ PyListObject * sizes_list_o = NULL;
@betatim

betatim Nov 15, 2016

Member

Do you know who owns the reference to the list that comes from PyArg_ParseTuple?

@betatim

betatim Nov 15, 2016

Member

PyArg_ParseTuple doesn't increase the reference count, and presumably the ref count can't decrease to zero while we are using an argument to this function.

@ctb

ctb Nov 15, 2016

Owner

Yep (or at least not with the GIL held).

Owner

ctb commented Nov 15, 2016

Let's ahead and push the magic merge button when the build ends! (but please don't squash commits :)

@betatim betatim merged commit a375468 into master Nov 15, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Member

camillescott commented Nov 15, 2016

Doh, I was too late :(

Owner

ctb commented Nov 15, 2016

wassup @camillescott?

standage deleted the refactor/storage2 branch Nov 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment