Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datashader as plotting backend #2656

Open
grst opened this issue Sep 6, 2023 · 7 comments
Open

Datashader as plotting backend #2656

grst opened this issue Sep 6, 2023 · 7 comments

Comments

@grst
Copy link
Contributor

grst commented Sep 6, 2023

What kind of feature would you like to request?

Other?

Please describe your wishes

When dealing with millions of cells, plotting embeddings becomes annoyingly slow. Datashader aggregates data points before plotting, which is much faster than just making a scatterplot in matplotlib.

For instance, making a multi-panel UMAP plot with 2M cells that takes 1min15s with sc.pl.umap takes 7s with datashader+matplotlib.

I know datashader has come up before in different contexts (e.g. #1263), but here I mainly suggest it for speed.


FWIW, I made a prototype implementation of sc.pl.embedding with datashader. It's not feature-complete but covers some common use-cases:
https://gist.github.com/grst/424e3e24bf244820000c33a823a47ec1

@ivirshup
Copy link
Member

ivirshup commented Sep 7, 2023

See also:

@ivirshup
Copy link
Member

ivirshup commented Sep 7, 2023

How would you suggest doing the API for this? Another kwarg for backend?

The additional dependencies aren't so bad. They are xarray, dask, and pillow. But still, I probably wouldn't be up for data shader as a required dependency.

@grst
Copy link
Contributor Author

grst commented Sep 7, 2023

I have just been exploring the holoviz ecosystem a bit and wasn't aware how nice this is! Ideally we could use something like hvPlot and leave it to the user to select a backend.

The problem is that the scanpy plotting functions have way too many parameters. Supporting all of them in different backends sounds daunting if not impossible.

@ivirshup
Copy link
Member

ivirshup commented Sep 7, 2023

I think datashader would only work in scanpy via the matplotlib rendering backend. I think interactive plotting is definitely out of scope for scanpy. There'd likely be even more options that are interactive specific.

Even then, I'm still not 100% sure this should be in scanpy and not separate.

@grst
Copy link
Contributor Author

grst commented Sep 7, 2023

It should probably be discussed in the context of whatever the plotting plans are for scanpy 2.0. Maybe worth dedicating a community meeting to that?

@Intron7
Copy link
Member

Intron7 commented Sep 8, 2023

@grst would you be open to putting this into rsc? There we could even use cudf and GPU plotting for the dataframe.

@ivirshup
Copy link
Member

ivirshup commented Sep 8, 2023

I would like this be to somewhere where it'd also work for CPU. I think we can implement a __dataframe__ interface that passes either GPU or CPU memory to data shader, then let data shader handle the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants