New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think through khmer/oxli 2.0, 3.0 #389

Open
ctb opened this Issue Apr 16, 2014 · 15 comments

Comments

Projects
None yet
4 participants
@ctb
Member

ctb commented Apr 16, 2014

One idea is to take 2.0 as an opportunity to set the Python API for a fairly near (fall 2014?) release, and then work on research for 3.0 in the distant future.

@ctb likes the idea of enabling an API for multiple container types in 2.0, and then going to town and implementing multiple useful container types in 3.0.

@camillescott points out that the Python API is not well done (well exposed from C++) and is not extensible in Python. Could this be addressed for 2.0? Function neutral but extensibility? In particular note that all objects are old-style classes and method inheritance doesn't currently work so all methods are implemented up to 2-3 times. Boo, hiss. A good example of how to do this is in LabelHash and Hashbits.

@qingpeng makes point that we can implement research stuff in Python initially and then reimplement gradually in C++ w/Python wrapping as we need greater speed etc. This gives us a nice use-case-driven design.

@mr-c proposes that our scripts become calls to functions.

@mr-c also notes that @ctb and he have talked about having a single point of entry script, like 'oxli', to which we add functions, e.g. 'oxli normalize' and 'oxli load-graph'. This would be a 2.0 or 3.0 addition, although early schemes could be implemented within 1.x as an extension rather than a replacement.

@mr-c mr-c added this to the 2.0 milestone Jul 6, 2014

@bocajnotnef

This comment has been minimized.

Contributor

bocajnotnef commented Nov 14, 2014

Documenting verbal discussion:

To improve the command line interface @mr-c suggested instituting a git-line subcommand structure (i.e. "khmer normalize [args args args]"). This would mean making a new file in scripts ("khmer.py"), migrating the code currently in each of the script files into the khmer directory so they can be imported as individual modules and re-write the scripts to simply import the relevant module, argparse and run the code.

This would maintain the current command line API while creating/improving a high-level python API. Though would also encourage/foster the development of the low level python API.

It was suggested that this would be a gradual, incremental change since doing this all at once would likely be A Bad Plan.

@ctb @camillescott @luizirber, thoughts?

@mr-c

This comment has been minimized.

Contributor

mr-c commented Nov 14, 2014

A rehash and expansion of @bocajnotnef's comment:

Pain points:

  1. Can't import key khmer functionality into an IPython/Jupyter notebook. (that is: from khmer import normalize-by-median)
  2. Can't do mock testing of script functionality due to lack of modular code
  3. Low level Python API is a mess / underdocumented
  4. Still a couple command line inconsistencies left

Anti-Goal: the development of the 2.0 release of khmer occurs in a separate branch that is never merged ("bleeding-edge 2.0")

Proposal:

Starting with a single script: refactor the Python code to be more modular. The script should just handle argument parsing and run other methods. Prove this by making a khmer command with the name of the script as an ArgParse-based subcommand.

Collect specific pain points in the low-level Python API and create separate issues for each one.

Each round is a single pull-request. Command line compatibility is maintained in the new shim scripts; innovation happens in the entry-point script (khmer.py). The API guarantee will not extend to the new entry script until the 2.0 release (when the shim scripts are deleted).

Review the effort after the first round. Presuming that it is seen to be useful: turn the crank and migrate another script.

As clarity around the low-level Python API occurs those refactorings can happen interspersed with migrating each script.

@mr-c

This comment has been minimized.

Contributor

mr-c commented Nov 14, 2014

  • split doxygen into C++ and Python runs
@bocajnotnef

This comment has been minimized.

Contributor

bocajnotnef commented Nov 15, 2014

For reference/enhanced understanding, what're we considering to be the low-level python API?

@mr-c

This comment has been minimized.

Contributor

mr-c commented Nov 15, 2014

@bocajnotnef khmer/_khmermodule.cc and khmer/__init__.py

@ctb

This comment has been minimized.

Member

ctb commented Nov 15, 2014

I like the subcommand and API ideas, with a few caveats.

  • I think we are overestimating scientists' willingness to embrace change. I am very worried about changing the way people interact with khmer's functionality.
  • we need to sand down the corners on both subcommands and API before placing them under semantic versioning.
  • our command line API is considerably more robust than our Python API;
  • we also need to remove/reconsider things in the Python API, especially as we refactor and extend things in khmer.

This leads me to the following proposals:

  • we explore subcommands for ourselves before making them part of semantic versioning guarantees (already part of @mr-c proposal);
  • we do subcommands more quickly than, and as a completely separate process from, the Python API exploration;
  • we hold off on renaming and expecting external people to use the subcommands as long as possible;

So, more specifically:

  • we implement subcommands as 'oxli' commands in 1.0, but not under semver;
  • this is also how we run the cmd-line functions starting as soon as we want (as 'import oxli; oxli.normalize()') -- introducing the oxli package sooner, IOW;
  • we provide oxli subcommands in 2.0, under semantic versioning; package name is still khmer, but we can start to transition the protocols/workflows over to subcommands to try 'em out there;
  • for 2.0, still keep all the existing script functionality working under the same names;
  • through 2.x, we provide all functionality in both scripts and subcommands;
  • we rename the whole package/project to oxli in 3.0 and eliminate 'khmer', at which point the Python API switches over to using 'import oxli' for non-cmd-line functions as well;
  • we provide a document laying out this transition for users and developers, telling users what they can rely on; we make sure there's a tl;dr at the top, and we keep it up to date with the releases;
  • we don't place the Python API under semantic versioning until after oxli 3; tentatively, for 4.0.

This would mean that our command line users will not notice anything different at all until there's a complete package rename, which I think is a great way to go.

Comments?

@bocajnotnef

This comment has been minimized.

Contributor

bocajnotnef commented Nov 15, 2014

I thgouth @mr-c's plan would maintain the command line functionality through the scripts that'd remain in the scripts directory. All the code would be moved to modules in the khmer/ directory and the scripts would just import the relevant functions from there. The subcommand structure would be a new functionality but the current command line interface would remain unchanged in behavior.

@ctb

This comment has been minimized.

Member

ctb commented Nov 15, 2014

Yep.


C. Titus Brown, ctb@msu.edu

On Nov 15, 2014, at 12:58, bocajnotnef notifications@github.com wrote:

I thgouth @mr-c's plan would maintain the command line functionality through the scripts that'd remain in the scripts directory. All the code would be moved to modules in the khmer/ directory and the scripts would just import the relevant functions from there. The subcommand structure would be a new functionality but the current command line interface would remain unchanged in behavior.


Reply to this email directly or view it on GitHub.

@ctb

This comment has been minimized.

Member

ctb commented Nov 15, 2014

Just to clarify, at some point we will retire the current command line interface; the question is, when? I think the right time is when we change the name from khmer to oxli, i.e. khmer/oxli 3.0.

@mr-c

This comment has been minimized.

Contributor

mr-c commented Nov 15, 2014

I'm okay with both versions co-existing as long as they have the same features & options. At some point we may encourage new users to use the new syntax only prior to the full decommissioning.

@ctb

This comment has been minimized.

Member

ctb commented Nov 16, 2014

On Sat, Nov 15, 2014 at 03:43:18PM -0800, Michael R. Crusoe wrote:

I'm okay with both versions co-existing as long as they have the same features & options. At some point we may encourage new users to use the new syntax only prior to the full decommissioning.

Agreed re same features/options.

We should ask new users to use the new syntax only after we change the name.

@mr-c

This comment has been minimized.

Contributor

mr-c commented Nov 16, 2014

This is a great plan. I will submit a PR with a re-written version to be added to the docs.

@mr-c mr-c removed this from the 2.0 milestone Nov 16, 2014

@mr-c

This comment has been minimized.

Contributor

mr-c commented Nov 19, 2014

@ctb

This comment has been minimized.

Member

ctb commented Nov 22, 2014

Also see @tseemann's paper on "how to write CLI" -- http://www.gigasciencejournal.com/content/2/1/15

@mr-c

This comment has been minimized.

Contributor

mr-c commented Dec 15, 2014

@bocajnotnef see also #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment