New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed QIIME2 Roadmap #4
Conversation
All, As we discussed in #2, @jairideout, @ebolyen and I have been working on fleshing out some ideas into a proposal for what QIIME2 could look like. We've put together a roadmap document that we'd like input from all developers on. This is currently a pull request (reflecting that it's only a proposal at this stage). Once we bounce ideas back and forth and get to something that we're happy with as group, we can merge this (or whatever it becomes) and then open it up for feedback from the user community. Thanks in advance for the input - looking forward to some good discussion about this! |
|
||
## Aspects of QIIME2 | ||
|
||
### Client-Server Architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to have something like an http RESTful api thing? e.g.:
curl -X POST -H "Content-Type: application/json" -d '
{
"otu_table":"full_otu_table",
"alpha_metrics":"PD"
}' http://qiime-server:8000/otu_tables/alpha_diversity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a possibility, but we are currently looking more towards a socket based protocol. (Using plain TCP and WebSockets).
Of course the actual implementation is less important than the idea of separating the interface from the server via a protocol, RESTful or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, I just wonder if http-ising it would help with a website gui. Though I'm not sure if they need the same interface - the website might want to use http://qiime-server/alpha_diversity?summary=true
I'm not wedded to HTTP though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, no particular details are set in stone at this point (and they don't need to be yet). From a strictly hypothetical perspective, a WebSocket based protocol would actually make things a bit easier from the web GUI side, as it could just open a persistent connection from which the server could push updates to. (as opposed to the GUI continuously polling the server). This is also how IPython operates at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea (if I am understanding this correctly) is that users would have to start a server process and leverage all compute to that server from whatever interface they decide to use? This sounds nice. I wonder if setting up a system like this would perhaps be too ambitious for the average user.
To expand on this, it seems like the client-server architecture is good solution for a use-case where you want to streamline analyzing a have a high volume of datasets. In much simpler cases, it almost seems like overkill and an unnecessary thing to have. Clearly if compute and a deployed installation was to be provided for free to users, then this would make a lot of sense as that on itself becomes a high volume of datasets to process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the idea correct.
One of our explicit goals is to support everything from a laptop to a cluster. This means progressive enhancement. For the average user, it conceptually isn't much different from an IPython notebook, where you type ipython notebook
. Instead you might type qiime start
, it could launch the server and the web-browser pointed at that server and the server will just use something like a SQLLite database at a path.
In the context of a cluster you could have the qiime-server running as an explicit service (like service qiime start
) which users log into using their cluster credentials. The sysadmin can take responsibility for using a different database, managing plugins, ports, etc.
Here's a diagram representing the idea maybe a little better:
https://drive.google.com/file/d/0B_qySw7nb-DKOVU0NFlNRHZvTWc/edit?usp=sharing
In that diagram, each component can exist on it's own host, or they can all exist on the same host (like a laptop).
But definitely the goal is for this to be as simple as possible and to work out-of-the-box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the components can exist on any combination of hosts that doesn't literally exceed the number of components being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means that if I wanted to run something like a batch job (5
commands executed serially), this would require me to create a q2 script
that needs to be processed by the interactive q2 shell and then
gets executed? That seems like re-inventing the wheel, something like what
the Mothur CLI does.
In hope to reduce the volume of e-mails in everyone's inbox, let's have
this conversation over a call.
On (Sep-23-14|15:08), Evan Bolyen wrote:
+finalized and everything is
+subject to change.** Once we reach agreement on the project's direction and
+vision, we will provide additional documents with further details
+(e.g., requirements and design documents).
+
+The roadmap is meant to provide a high-level view of the QIIME2. It does not
+contain specific implementation details. For example, we may mention the use
+of a database, but we're not yet defining the database schema or assuming use
+of a particular database implementation (e.g., PostgreSQL).
+
+This document was originally prepared based on conversations between
+@gregcaporaso, @ebolyen, and @jairideout.
+
+## Aspects of QIIME2
+
+### Client-Server ArchitectureYou have the idea correct.
One of our explicit goals is to support everything from a laptop to a cluster. This means progressive enhancement. For the average user, it conceptually isn't much different from an IPython notebook, where you type
ipython notebook
. Instead you might typeqiime start
, it could launch the server and the web-browser pointed at that server and the server will just use something like a SQLLite database at a path.In the context of a cluster you could have the qiime-server running as an explicit service (like
service qiime start
) which users log into using their cluster credentials. The sysadmin can take responsibility for using a different database, managing plugins, ports, etc.Here's a diagram representing the idea maybe a little better:
https://drive.google.com/file/d/0B_qySw7nb-DKOVU0NFlNRHZvTWc/edit?usp=sharingIn that diagram, each component can exist on it's own host, or they can all exist on the same host (like a laptop).
But definitely the goal is for this to be as simple as possible and to work out-of-the-box.
Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17942882
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger that. Though in your example a batch job would technically just be a workflow which should be as simple as a drag and drop. (which then does get processed by the qiime-server)
There are a lot of common objectives between this document and Qiita. |
Yes let’s just note that there’s a lot of overlap and that we aim to resolve this overlap (to the maximum extent possible) during the call, which I agree it would be good to have soon. I wonder whether it could wait until after Oct 6, which is the launch of the coursera course and British Gut, though? On Sep 23, 2014, at 3:40 PM, josenavas <notifications@github.commailto:notifications@github.com> wrote: There are a lot of common objectives between this document and Qiita. — |
project (QiiTA).** This is a discussion of how data will be organized and stored | ||
internally in QIIME2. | ||
|
||
The database represents a significant departure from the way QIIME currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this mean that the only way to access this data would be to do it vía the Q2 interface itself? I try to think of the case where other tools want o access the data generated by QIIME, would this then add a step where you serialize as a regular file any of the contents of your QIIME study?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there would be an explicit export step, or in the case of a web interface, likely a right-click download.
That is a downside to this approach, however it does allow more consistent data management, allows for the provenance of these artifacts to be maintained and reviewed.
It is also possible that we could provide the ability to export an entire analysis as a tarball which might look like a well organized output directory in qiime right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for expanding on this. I am concerned that this makes the usage
of the software on itself more complicated. Integrating data generated
with qiime1.x or data that was not generated with other packages becomes
a burden as the import step would require a variety of validations and
specifications.
On (Sep-23-14|14:56), Evan Bolyen wrote:
+and duplication in defining multiple interfaces. Additionally it will allow
+remote execution over a network barrier (this would have been difficult to
+achieve with pyqi).
+
+### Workers
+Once the qiime-server has received a request via the protocol, it will launch
+a worker job to perform the computation. The qiime-server will provide status
+updates to clients through the protocol. The worker job will record the results
+as an artifact in a database.
+
+### Database
+Note: This is not intended to be a substitute for the QIIME database
+project (QiiTA). This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currentlyYes there would be an explicit export step, or in the case of a web interface, likely a right-click download.
That is a downside to this approach, however it does allow more consistent data management, allows for the provenance of these artifacts to be maintained and reviewed.It is also possible that we could provide the ability to export an entire analysis as a tarball which might look like a well organized output directory in qiime right now.
Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17942329
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is all true, but we do gain the ability of the interface to reason about composition if it has control of the artifacts in an abstract way. Perhaps a compromise is possible where a database simply logs the locations of files, but presently many of the files waste a lot of disk space by repeating what should be a relation to another table.
I think this is definitely something worth talking about over a call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again.
This also seems like a good topic to discuss with an expert in HCI.
On (Sep-23-14|15:13), Evan Bolyen wrote:
+and duplication in defining multiple interfaces. Additionally it will allow
+remote execution over a network barrier (this would have been difficult to
+achieve with pyqi).
+
+### Workers
+Once the qiime-server has received a request via the protocol, it will launch
+a worker job to perform the computation. The qiime-server will provide status
+updates to clients through the protocol. The worker job will record the results
+as an artifact in a database.
+
+### Database
+Note: This is not intended to be a substitute for the QIIME database
+project (QiiTA). This is a discussion of how data will be organized and stored
+internally in QIIME2.
+
+The database represents a significant departure from the way QIIME currentlyThat is all true, but we do gain the ability of the interface to reason about composition if it has control of the artifacts in an abstract way. Perhaps a compromise is possible where a database simply logs the locations of files, but presently many of the files waste a lot of disk space by repeating what should be a relation to another table.
I think this is definitely something worth talking about over a call.
Reply to this email directly or view it on GitHub:
https://github.com/biocore/metoo/pull/4/files#r17943105
No description provided.