Proposed QIIME2 Roadmap #4

ebolyen · 2014-09-23T19:22:13Z

No description provided.

gregcaporaso · 2014-09-23T19:23:46Z

All, As we discussed in #2, @jairideout, @ebolyen and I have been working on fleshing out some ideas into a proposal for what QIIME2 could look like. We've put together a roadmap document that we'd like input from all developers on. This is currently a pull request (reflecting that it's only a proposal at this stage). Once we bounce ideas back and forth and get to something that we're happy with as group, we can merge this (or whatever it becomes) and then open it up for feedback from the user community.

Thanks in advance for the input - looking forward to some good discussion about this!

coveralls · 2014-09-23T19:27:05Z

Coverage remained the same when pulling baeac5c on ebolyen:exec_summary into 04c5b1a on biocore:master.

coveralls · 2014-09-23T19:33:26Z

Coverage remained the same when pulling baeac5c on ebolyen:exec_summary into 04c5b1a on biocore:master.

justin212k · 2014-09-23T21:09:45Z

ROADMAP.md

+
+## Aspects of QIIME2
+
+### Client-Server Architecture


Is the idea to have something like an http RESTful api thing? e.g.:

curl -X POST -H "Content-Type: application/json" -d '
{
"otu_table":"full_otu_table",
"alpha_metrics":"PD"
}' http://qiime-server:8000/otu_tables/alpha_diversity

That is a possibility, but we are currently looking more towards a socket based protocol. (Using plain TCP and WebSockets).
Of course the actual implementation is less important than the idea of separating the interface from the server via a protocol, RESTful or not.

Aye, I just wonder if http-ising it would help with a website gui. Though I'm not sure if they need the same interface - the website might want to use http://qiime-server/alpha_diversity?summary=true

I'm not wedded to HTTP though.

Gotcha, no particular details are set in stone at this point (and they don't need to be yet). From a strictly hypothetical perspective, a WebSocket based protocol would actually make things a bit easier from the web GUI side, as it could just open a persistent connection from which the server could push updates to. (as opposed to the GUI continuously polling the server). This is also how IPython operates at the moment.

The idea (if I am understanding this correctly) is that users would have to start a server process and leverage all compute to that server from whatever interface they decide to use? This sounds nice. I wonder if setting up a system like this would perhaps be too ambitious for the average user.

To expand on this, it seems like the client-server architecture is good solution for a use-case where you want to streamline analyzing a have a high volume of datasets. In much simpler cases, it almost seems like overkill and an unnecessary thing to have. Clearly if compute and a deployed installation was to be provided for free to users, then this would make a lot of sense as that on itself becomes a high volume of datasets to process.

You have the idea correct.

One of our explicit goals is to support everything from a laptop to a cluster. This means progressive enhancement. For the average user, it conceptually isn't much different from an IPython notebook, where you type ipython notebook. Instead you might type qiime start, it could launch the server and the web-browser pointed at that server and the server will just use something like a SQLLite database at a path.

In the context of a cluster you could have the qiime-server running as an explicit service (like service qiime start) which users log into using their cluster credentials. The sysadmin can take responsibility for using a different database, managing plugins, ports, etc.

Here's a diagram representing the idea maybe a little better:
https://drive.google.com/file/d/0B_qySw7nb-DKOVU0NFlNRHZvTWc/edit?usp=sharing

In that diagram, each component can exist on it's own host, or they can all exist on the same host (like a laptop).

But definitely the goal is for this to be as simple as possible and to work out-of-the-box.

Or the components can exist on any combination of hosts that doesn't literally exceed the number of components being used.

That means that if I wanted to run something like a batch job (5
commands executed serially), this would require me to create a q2 script
that needs to be processed by the interactive q2 shell and then
gets executed? That seems like re-inventing the wheel, something like what
the Mothur CLI does.

In hope to reduce the volume of e-mails in everyone's inbox, let's have
this conversation over a call.

On (Sep-23-14|15:08), Evan Bolyen wrote:

+finalized and everything is
+subject to change.** Once we reach agreement on the project's direction and
+vision, we will provide additional documents with further details
+(e.g., requirements and design documents).
+
+The roadmap is meant to provide a high-level view of the QIIME2. It does not
+contain specific implementation details. For example, we may mention the use
+of a database, but we're not yet defining the database schema or assuming use
+of a particular database implementation (e.g., PostgreSQL).
+
+This document was originally prepared based on conversations between
+@gregcaporaso, @ebolyen, and @jairideout.
+
+## Aspects of QIIME2
+
+### Client-Server Architecture

You have the idea correct.

One of our explicit goals is to support everything from a laptop to a cluster. This means progressive enhancement. For the average user, it conceptually isn't much different from an IPython notebook, where you type ipython notebook. Instead you might type qiime start, it could launch the server and the web-browser pointed at that server and the server will just use something like a SQLLite database at a path.

In the context of a cluster you could have the qiime-server running as an explicit service (like service qiime start) which users log into using their cluster credentials. The sysadmin can take responsibility for using a different database, managing plugins, ports, etc.

Here's a diagram representing the idea maybe a little better:
https://drive.google.com/file/d/0B_qySw7nb-DKOVU0NFlNRHZvTWc/edit?usp=sharing

In that diagram, each component can exist on it's own host, or they can all exist on the same host (like a laptop).

But definitely the goal is for this to be as simple as possible and to work out-of-the-box.

Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17942882

Roger that. Though in your example a batch job would technically just be a workflow which should be as simple as a drag and drop. (which then does get processed by the qiime-server)

josenavas · 2014-09-23T21:40:25Z

There are a lot of common objectives between this document and Qiita.

rob-knight · 2014-09-23T21:42:35Z

Yes let’s just note that there’s a lot of overlap and that we aim to resolve this overlap (to the maximum extent possible) during the call, which I agree it would be good to have soon. I wonder whether it could wait until after Oct 6, which is the launch of the coursera course and British Gut, though?

On Sep 23, 2014, at 3:40 PM, josenavas <notifications@github.com mailto:notifications@github.com> wrote:

There are a lot of common objectives between this document and Qiita.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-56594454.

ElDeveloper · 2014-09-23T21:50:39Z

ROADMAP.md

+project (QiiTA).** This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currently


Would this mean that the only way to access this data would be to do it vía the Q2 interface itself? I try to think of the case where other tools want o access the data generated by QIIME, would this then add a step where you serialize as a regular file any of the contents of your QIIME study?

Yes there would be an explicit export step, or in the case of a web interface, likely a right-click download.
That is a downside to this approach, however it does allow more consistent data management, allows for the provenance of these artifacts to be maintained and reviewed.

It is also possible that we could provide the ability to export an entire analysis as a tarball which might look like a well organized output directory in qiime right now.

Thanks for expanding on this. I am concerned that this makes the usage
of the software on itself more complicated. Integrating data generated
with qiime1.x or data that was not generated with other packages becomes
a burden as the import step would require a variety of validations and
specifications.

On (Sep-23-14|14:56), Evan Bolyen wrote:

+and duplication in defining multiple interfaces. Additionally it will allow
+remote execution over a network barrier (this would have been difficult to
+achieve with pyqi).
+
+### Workers
+Once the qiime-server has received a request via the protocol, it will launch
+a worker job to perform the computation. The qiime-server will provide status
+updates to clients through the protocol. The worker job will record the results
+as an artifact in a database.
+
+### Database
+Note: This is not intended to be a substitute for the QIIME database
+project (QiiTA). This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currently

Yes there would be an explicit export step, or in the case of a web interface, likely a right-click download.
That is a downside to this approach, however it does allow more consistent data management, allows for the provenance of these artifacts to be maintained and reviewed.

It is also possible that we could provide the ability to export an entire analysis as a tarball which might look like a well organized output directory in qiime right now.

Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17942329

That is all true, but we do gain the ability of the interface to reason about composition if it has control of the artifacts in an abstract way. Perhaps a compromise is possible where a database simply logs the locations of files, but presently many of the files waste a lot of disk space by repeating what should be a relation to another table.

I think this is definitely something worth talking about over a call.

Thanks again.

This also seems like a good topic to discuss with an expert in HCI.

On (Sep-23-14|15:13), Evan Bolyen wrote:

+and duplication in defining multiple interfaces. Additionally it will allow
+remote execution over a network barrier (this would have been difficult to
+achieve with pyqi).
+
+### Workers
+Once the qiime-server has received a request via the protocol, it will launch
+a worker job to perform the computation. The qiime-server will provide status
+updates to clients through the protocol. The worker job will record the results
+as an artifact in a database.
+
+### Database
+Note: This is not intended to be a substitute for the QIIME database
+project (QiiTA). This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currently

That is all true, but we do gain the ability of the interface to reason about composition if it has control of the artifacts in an abstract way. Perhaps a compromise is possible where a database simply logs the locations of files, but presently many of the files waste a lot of disk space by repeating what should be a relation to another table.

I think this is definitely something worth talking about over a call.

Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17943105

Proposed QIIME2 Roadmap

ebolyen and others added 9 commits September 22, 2014 15:01

DOC: updated readme

440e470

Finish semantic type system section

10a7575

DOC: add plugin system section

8dbc37e

DOC: start reformatting beginning sections < 80

8b9ddaa

DOC: Finish reformatting to < 80 chars

086a539

DOC: cleanup to existing text, filled out more details throughout

2b9e5bd

DOC: add intro paragraph to executive summary

546ea45

DOC: clean up roadmap based on comments from @gregcaporaso and @ebolyen

d3aec69

MAINT/DOC: Move roadmap doc into ROADMAP.md.

7bfb297

DOC/MAINT: fix broken section link

baeac5c

justin212k reviewed Sep 23, 2014
View reviewed changes

ElDeveloper reviewed Sep 23, 2014
View reviewed changes

gregcaporaso added a commit that referenced this pull request Dec 18, 2014

Merge pull request #4 from ebolyen/exec_summary

088a36f

Proposed QIIME2 Roadmap

gregcaporaso merged commit 088a36f into qiime2-graveyard:master Dec 18, 2014

gregcaporaso deleted the exec_summary branch December 18, 2014 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed QIIME2 Roadmap #4

Proposed QIIME2 Roadmap #4

ebolyen commented Sep 23, 2014

gregcaporaso commented Sep 23, 2014

coveralls commented Sep 23, 2014

coveralls commented Sep 23, 2014

justin212k Sep 23, 2014

ebolyen Sep 23, 2014

justin212k Sep 23, 2014

ebolyen Sep 23, 2014

ElDeveloper Sep 23, 2014

ebolyen Sep 23, 2014

ebolyen Sep 23, 2014

ElDeveloper Sep 23, 2014

ebolyen Sep 23, 2014

josenavas commented Sep 23, 2014

rob-knight commented Sep 23, 2014

ElDeveloper Sep 23, 2014

ebolyen Sep 23, 2014

ElDeveloper Sep 23, 2014

ebolyen Sep 23, 2014

ElDeveloper Sep 23, 2014

Proposed QIIME2 Roadmap #4

Proposed QIIME2 Roadmap #4

Conversation

ebolyen commented Sep 23, 2014

gregcaporaso commented Sep 23, 2014

coveralls commented Sep 23, 2014

coveralls commented Sep 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josenavas commented Sep 23, 2014

rob-knight commented Sep 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment