Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed QIIME2 Roadmap #4

Merged
merged 10 commits into from Dec 18, 2014
Merged

Conversation

ebolyen
Copy link
Contributor

@ebolyen ebolyen commented Sep 23, 2014

No description provided.

@gregcaporaso
Copy link
Contributor

All, As we discussed in #2, @jairideout, @ebolyen and I have been working on fleshing out some ideas into a proposal for what QIIME2 could look like. We've put together a roadmap document that we'd like input from all developers on. This is currently a pull request (reflecting that it's only a proposal at this stage). Once we bounce ideas back and forth and get to something that we're happy with as group, we can merge this (or whatever it becomes) and then open it up for feedback from the user community.

Thanks in advance for the input - looking forward to some good discussion about this!

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling baeac5c on ebolyen:exec_summary into 04c5b1a on biocore:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling baeac5c on ebolyen:exec_summary into 04c5b1a on biocore:master.


## Aspects of QIIME2

### Client-Server Architecture

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to have something like an http RESTful api thing? e.g.:

curl -X POST -H "Content-Type: application/json" -d '
{
"otu_table":"full_otu_table",
"alpha_metrics":"PD"
}' http://qiime-server:8000/otu_tables/alpha_diversity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a possibility, but we are currently looking more towards a socket based protocol. (Using plain TCP and WebSockets).
Of course the actual implementation is less important than the idea of separating the interface from the server via a protocol, RESTful or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, I just wonder if http-ising it would help with a website gui. Though I'm not sure if they need the same interface - the website might want to use http://qiime-server/alpha_diversity?summary=true

I'm not wedded to HTTP though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, no particular details are set in stone at this point (and they don't need to be yet). From a strictly hypothetical perspective, a WebSocket based protocol would actually make things a bit easier from the web GUI side, as it could just open a persistent connection from which the server could push updates to. (as opposed to the GUI continuously polling the server). This is also how IPython operates at the moment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea (if I am understanding this correctly) is that users would have to start a server process and leverage all compute to that server from whatever interface they decide to use? This sounds nice. I wonder if setting up a system like this would perhaps be too ambitious for the average user.

To expand on this, it seems like the client-server architecture is good solution for a use-case where you want to streamline analyzing a have a high volume of datasets. In much simpler cases, it almost seems like overkill and an unnecessary thing to have. Clearly if compute and a deployed installation was to be provided for free to users, then this would make a lot of sense as that on itself becomes a high volume of datasets to process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the idea correct.

One of our explicit goals is to support everything from a laptop to a cluster. This means progressive enhancement. For the average user, it conceptually isn't much different from an IPython notebook, where you type ipython notebook. Instead you might type qiime start, it could launch the server and the web-browser pointed at that server and the server will just use something like a SQLLite database at a path.

In the context of a cluster you could have the qiime-server running as an explicit service (like service qiime start) which users log into using their cluster credentials. The sysadmin can take responsibility for using a different database, managing plugins, ports, etc.

Here's a diagram representing the idea maybe a little better:
https://drive.google.com/file/d/0B_qySw7nb-DKOVU0NFlNRHZvTWc/edit?usp=sharing

In that diagram, each component can exist on it's own host, or they can all exist on the same host (like a laptop).

But definitely the goal is for this to be as simple as possible and to work out-of-the-box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the components can exist on any combination of hosts that doesn't literally exceed the number of components being used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that if I wanted to run something like a batch job (5
commands executed serially), this would require me to create a q2 script
that needs to be processed by the interactive q2 shell and then
gets executed? That seems like re-inventing the wheel, something like what
the Mothur CLI does.

In hope to reduce the volume of e-mails in everyone's inbox, let's have
this conversation over a call.

On (Sep-23-14|15:08), Evan Bolyen wrote:

+finalized and everything is
+subject to change.** Once we reach agreement on the project's direction and
+vision, we will provide additional documents with further details
+(e.g., requirements and design documents).
+
+The roadmap is meant to provide a high-level view of the QIIME2. It does not
+contain specific implementation details.
For example, we may mention the use
+of a database, but we're not yet defining the database schema or assuming use
+of a particular database implementation (e.g., PostgreSQL).
+
+This document was originally prepared based on conversations between
+@gregcaporaso, @ebolyen, and @jairideout.
+
+## Aspects of QIIME2
+
+### Client-Server Architecture

You have the idea correct.

One of our explicit goals is to support everything from a laptop to a cluster. This means progressive enhancement. For the average user, it conceptually isn't much different from an IPython notebook, where you type ipython notebook. Instead you might type qiime start, it could launch the server and the web-browser pointed at that server and the server will just use something like a SQLLite database at a path.

In the context of a cluster you could have the qiime-server running as an explicit service (like service qiime start) which users log into using their cluster credentials. The sysadmin can take responsibility for using a different database, managing plugins, ports, etc.

Here's a diagram representing the idea maybe a little better:
https://drive.google.com/file/d/0B_qySw7nb-DKOVU0NFlNRHZvTWc/edit?usp=sharing

In that diagram, each component can exist on it's own host, or they can all exist on the same host (like a laptop).

But definitely the goal is for this to be as simple as possible and to work out-of-the-box.


Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17942882

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that. Though in your example a batch job would technically just be a workflow which should be as simple as a drag and drop. (which then does get processed by the qiime-server)

@josenavas
Copy link
Contributor

There are a lot of common objectives between this document and Qiita.

@rob-knight
Copy link

Yes let’s just note that there’s a lot of overlap and that we aim to resolve this overlap (to the maximum extent possible) during the call, which I agree it would be good to have soon. I wonder whether it could wait until after Oct 6, which is the launch of the coursera course and British Gut, though?

On Sep 23, 2014, at 3:40 PM, josenavas <notifications@github.commailto:notifications@github.com> wrote:

There are a lot of common objectives between this document and Qiita.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-56594454.

project (QiiTA).** This is a discussion of how data will be organized and stored
internally in QIIME2.

The database represents a significant departure from the way QIIME currently

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean that the only way to access this data would be to do it vía the Q2 interface itself? I try to think of the case where other tools want o access the data generated by QIIME, would this then add a step where you serialize as a regular file any of the contents of your QIIME study?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there would be an explicit export step, or in the case of a web interface, likely a right-click download.
That is a downside to this approach, however it does allow more consistent data management, allows for the provenance of these artifacts to be maintained and reviewed.

It is also possible that we could provide the ability to export an entire analysis as a tarball which might look like a well organized output directory in qiime right now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for expanding on this. I am concerned that this makes the usage
of the software on itself more complicated. Integrating data generated
with qiime1.x or data that was not generated with other packages becomes
a burden as the import step would require a variety of validations and
specifications.

On (Sep-23-14|14:56), Evan Bolyen wrote:

+and duplication in defining multiple interfaces. Additionally it will allow
+remote execution over a network barrier (this would have been difficult to
+achieve with pyqi).
+
+### Workers
+Once the qiime-server has received a request via the protocol, it will launch
+a worker job to perform the computation. The qiime-server will provide status
+updates to clients through the protocol. The worker job will record the results
+as an artifact in a database.
+
+### Database
+Note: This is not intended to be a substitute for the QIIME database
+project (QiiTA).
This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currently

Yes there would be an explicit export step, or in the case of a web interface, likely a right-click download.
That is a downside to this approach, however it does allow more consistent data management, allows for the provenance of these artifacts to be maintained and reviewed.

It is also possible that we could provide the ability to export an entire analysis as a tarball which might look like a well organized output directory in qiime right now.


Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17942329

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is all true, but we do gain the ability of the interface to reason about composition if it has control of the artifacts in an abstract way. Perhaps a compromise is possible where a database simply logs the locations of files, but presently many of the files waste a lot of disk space by repeating what should be a relation to another table.

I think this is definitely something worth talking about over a call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again.

This also seems like a good topic to discuss with an expert in HCI.

On (Sep-23-14|15:13), Evan Bolyen wrote:

+and duplication in defining multiple interfaces. Additionally it will allow
+remote execution over a network barrier (this would have been difficult to
+achieve with pyqi).
+
+### Workers
+Once the qiime-server has received a request via the protocol, it will launch
+a worker job to perform the computation. The qiime-server will provide status
+updates to clients through the protocol. The worker job will record the results
+as an artifact in a database.
+
+### Database
+Note: This is not intended to be a substitute for the QIIME database
+project (QiiTA).
This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currently

That is all true, but we do gain the ability of the interface to reason about composition if it has control of the artifacts in an abstract way. Perhaps a compromise is possible where a database simply logs the locations of files, but presently many of the files waste a lot of disk space by repeating what should be a relation to another table.

I think this is definitely something worth talking about over a call.


Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17943105

gregcaporaso added a commit that referenced this pull request Dec 18, 2014
@gregcaporaso gregcaporaso merged commit 088a36f into qiime2-graveyard:master Dec 18, 2014
@gregcaporaso gregcaporaso deleted the exec_summary branch December 18, 2014 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants