-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regraph/Reset user story #280
Comments
a few thoughts here:
|
I'd like to add additional (future) issues we need to factor into this design:
Putting all this together, I am in favor of pushing this to a post-MVP release, and giving us time to think about the UX more. I think I am an advocate for keeping the f/e super simple:
Random thinking: as an alternative UX, why not have the user "save" named layouts using the CLI (including all precompute), and then just use the front-end to visualize these "named layouts" (the front-end would only know the name & coordinates)? This is easily done for cell subsets that are defined computationally (eg, select by metadata field), but harder for lasso selected sets. We could solve for the latter by implementing the "save selection" feature (which drops a selection list). |
I should also mention that in addition to Dask support, we've been experimenting with Pywren/serverless, which could give lots of parallelism and also minimal configuration on the user's end. We're also thinking about distributed impl of the kNN computation to make it scalable and hopefully fast. |
Those future directions are super cool @laserson ! Totally on board with pushing this to post-MVP and just dropping the current I still lean towards supporting a few targeted forms of async compute eventually, especially if more speed ups are coming. One issue with the saving of named layouts is that the combinatorics just blow up. For example, even for metadata (e.g. cluster) based selections, it'll often be lots of different combinations of different clusters. I.e. the exploratory nature of the visualization is exactly what gives rise to a wide variety of subsequent computations that are hard to precompute. But this should really be driven by user feedback -- what additional computations do people want to do while using the tool in its current simple form? Can address that post MVP. |
also, a clarifying comment / question for @bkmartinjr -- when posing the question as, "should there be async / batch compute triggered by front-end (web UI)", we are already doing this when computing differential expression, right? so we've already gone down this road? it's simply a question of speed, and when the expected speed does or does justify exposing the functionality on the front-end |
@freeman-lab - we treat differential expression as a more-or-less synchronous operation (not literally, but in the sense that we expect a server response quickly enough that the UI does not need to have explicitly async workflow). It boils down to speed - if it is "interactive" (ie, ~<1sec) response, then we don't need to build in explicit async management. If it could take minutes to hours, then we need to provide the user with some signal that compute is in progress, notification when done, etc. |
Agree with this point re diffexp, the question is not will we do arbitrary
computation from the client or have to handle it differently with different
computation packages, but how long we are willing to wait for it in various
circumstances
…On Tue, Oct 16, 2018, 11:24 AM Jeremy Freeman ***@***.***> wrote:
also, a clarifying comment / question for @bkmartinjr
<https://github.com/bkmartinjr> -- when posing the question as, "should
there be async / batch compute triggered by front-end (web UI)", we are
already doing this when computing differential expression, right? so we've
already gone down this road? it's simply a question of speed, and when the
expected speed does or does justify exposing the functionality on the
front-end
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABsDGSBMJZVVgV1gh3AcB4yzns2ao33hks5uliRGgaJpZM4W9CMX>
.
|
awesome, that all matches my perspective, just wanted to sanity check |
I should add, I'm not opposed to adding async support if we do it right. We just aren't there - for example, the REST API has no concept of async requests (ie, no ability to know when an async request is completed). This isn't just a UI issue. |
Regraphing and reclustering were among the "most valuable features" identified in today's user feedback session with the Humphrey's lab. I agree that we need to be careful about what's feasible at interactive speed vs. async computation. FWIW, though, I think that re-embedding at totally reasonable (not quite interactive, I was mistaken) speed is totally possible with scanpy if the subselection is contiguous. I've detailed this more here, but this basically happens when selections are based on: I believe (worth following up) that this would satisfy at least most user needs. What's your impression on this one, @neuromusic ? |
@sidneymbell - what does |
@bkmartinjr - Great question! I should have elaborated. The subselection is contiguous when the neighbor graph connecting all the selected cells doesn't have any breaks in it. I.e., there's a way to "walk" directly between each pair of cells in the subselection (along edges the neighbor graph). So, for example, in this notebook I made two different mini datasets representing different ways of subselecting. The granulocytes are all closely related cells that are close to one another in the (contiguous) neighbor graph. In this case, it doesn't make a big difference whether we recompute the neighbor graph ('clean') or subset the existing neighbor graph from universe ('subset'). As a counterexample, I also made a dataset where I randomly subselected every 5th cell from the entire dataset. Here, the cells are from all over the neighbor graph, and many of the cells between those selected cells got dropped. The result is a ton of tiny neighbor graphs that are disconnected from one another. Here, we see that it makes a big difference whether we recompute the neighbor graph ('clean') or just subset the universal one ('subset'). Does that help? Happy to stop by and chat. |
A couple of thoughts here:
|
Spyros Darmanis mentioned that his group needs support for trajectory analyses, and that re-embedding + reanalysis of trajectories will be needed to drill into smaller developmental scales. He admits that the trajectory algorithms need more significant parameter tuning and wasn't a clear "default" that we could execute automatically. |
@ambrosejcarr reports that this issue is addressed. Closing during triage. |
Agree — this issue is resolved and implemented. If we return to trajectories, it's a new issue. |
We need to nail down user stories for Regraph/Reset in the first release.
Issues to consider:
Scanpy re-layout is fast when NN are already computed. Very slow when not. Should the web UI drive both, or make assumptions about some already existing & persisted?
^ @freeman-lab - please add your thoughts.
The text was updated successfully, but these errors were encountered: