Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Think through khmer/oxli 2.0, 3.0 #389
One idea is to take 2.0 as an opportunity to set the Python API for a fairly near (fall 2014?) release, and then work on research for 3.0 in the distant future.
@ctb likes the idea of enabling an API for multiple container types in 2.0, and then going to town and implementing multiple useful container types in 3.0.
@camillescott points out that the Python API is not well done (well exposed from C++) and is not extensible in Python. Could this be addressed for 2.0? Function neutral but extensibility? In particular note that all objects are old-style classes and method inheritance doesn't currently work so all methods are implemented up to 2-3 times. Boo, hiss. A good example of how to do this is in LabelHash and Hashbits.
@qingpeng makes point that we can implement research stuff in Python initially and then reimplement gradually in C++ w/Python wrapping as we need greater speed etc. This gives us a nice use-case-driven design.
@mr-c proposes that our scripts become calls to functions.
@mr-c also notes that @ctb and he have talked about having a single point of entry script, like 'oxli', to which we add functions, e.g. 'oxli normalize' and 'oxli load-graph'. This would be a 2.0 or 3.0 addition, although early schemes could be implemented within 1.x as an extension rather than a replacement.
Documenting verbal discussion:
To improve the command line interface @mr-c suggested instituting a git-line subcommand structure (i.e. "khmer normalize [args args args]"). This would mean making a new file in scripts ("khmer.py"), migrating the code currently in each of the script files into the khmer directory so they can be imported as individual modules and re-write the scripts to simply import the relevant module, argparse and run the code.
This would maintain the current command line API while creating/improving a high-level python API. Though would also encourage/foster the development of the low level python API.
It was suggested that this would be a gradual, incremental change since doing this all at once would likely be A Bad Plan.
A rehash and expansion of @bocajnotnef's comment:
Anti-Goal: the development of the 2.0 release of khmer occurs in a separate branch that is never merged ("bleeding-edge 2.0")
Starting with a single script: refactor the Python code to be more modular. The script should just handle argument parsing and run other methods. Prove this by making a
Collect specific pain points in the low-level Python API and create separate issues for each one.
Each round is a single pull-request. Command line compatibility is maintained in the new shim scripts; innovation happens in the entry-point script (
Review the effort after the first round. Presuming that it is seen to be useful: turn the crank and migrate another script.
As clarity around the low-level Python API occurs those refactorings can happen interspersed with migrating each script.
I like the subcommand and API ideas, with a few caveats.
This leads me to the following proposals:
So, more specifically:
This would mean that our command line users will not notice anything different at all until there's a complete package rename, which I think is a great way to go.
I thgouth @mr-c's plan would maintain the command line functionality through the scripts that'd remain in the scripts directory. All the code would be moved to modules in the khmer/ directory and the scripts would just import the relevant functions from there. The subcommand structure would be a new functionality but the current command line interface would remain unchanged in behavior.
C. Titus Brown, firstname.lastname@example.org
On Sat, Nov 15, 2014 at 03:43:18PM -0800, Michael R. Crusoe wrote:
Agreed re same features/options.
We should ask new users to use the new syntax only after we change the name.
API food for thought: http://sweng.the-davies.net/Home/rustys-api-design-manifesto