Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cellxpose returns 400 Bas request for some requests if data is 16-bit #2379

Closed
arogozhnikov opened this issue Aug 12, 2021 · 17 comments · Fixed by #2406
Closed

[BUG] Cellxpose returns 400 Bas request for some requests if data is 16-bit #2379

arogozhnikov opened this issue Aug 12, 2021 · 17 comments · Fixed by #2406
Assignees
Labels

Comments

@arogozhnikov
Copy link

Describe the bug

Try saving test file after converting to fp16:

adata.X = adata.X.astype('float16')

Server normally starts when given this data (but prints some warnigns about conversion from 64bits).

However when you select gene or run differential expression, 400 appears.

Version affected: 0.17.0, within docker

In my case cellxgene first downloads file from the cloud, so 16bit format for storage is very reasonable - almost halves the space.

@bkmartinjr bkmartinjr self-assigned this Aug 19, 2021
@bkmartinjr
Copy link
Contributor

@arogozhnikov - I am exploring options. One important consideration: we use Numba internally for speed, and numba does not support float16.

I see two options - if you have any thoughts, please add them to this issue.

  1. convert X to float32 at startup (conversion in memory - will not change the H5AD). Or,
  2. fall back on somewhat slower Python code that will support float16.

Option 1 is quite simple to implement and we could have it incorporated into our next release. Option 2 will take more exploration to support without a significant performance penalty.

Please let me know if you have any input on the trade-off.

@arogozhnikov
Copy link
Author

In the context of this question, memory was also a big issue (multiple servers on the same host, each responsible for one dataset), so I'd recommend to provide a full support of fp16, even if that takes more time to implement.

@bkmartinjr
Copy link
Contributor

To clarify - full FP16 support it is likely to run slower as well (the aforementioned performance penalty).

@arogozhnikov
Copy link
Author

Still better have proper fp16.

While I did not study cellxgene code in detail, I believe all or almost all operations are column-wise.
So possibly you can convert column to fp32 on-the-fly before passing it further. This should be memory efficient, and overhead is very reasonable. Do you think so?

@bkmartinjr
Copy link
Contributor

I have a PR almost ready for full fp16 support. Tonight or tomorrow, so it will make the next release.

You are right that you will often get better performance if you encode the matrix in a column-friendly format (eg, CSC if it is sparse)

@arogozhnikov
Copy link
Author

@bkmartinjr can you please check this once again?
I was about to transition datasets to fp16, but I'm still seeing same issues (400 on diffexp, and expression of genes can't be loaded).
I've tried both csc and csr formats for X, but neither worked for me

@arogozhnikov
Copy link
Author

cellxgene==1.0.0, reproduced on mac and linux if that helps

@blrnw3
Copy link
Contributor

blrnw3 commented Feb 4, 2022

I can reproduce this too. I think the get_X_array method needs updating to support float16

@blrnw3 blrnw3 reopened this Feb 4, 2022
@blrnw3 blrnw3 self-assigned this Feb 7, 2022
@blrnw3
Copy link
Contributor

blrnw3 commented Feb 7, 2022

According to this scipy issue, float16 is not supported in sparse matrices, which presumably means cellxgene cannot support float16 sparse data. Sure we could convert to dense or to float32, but that would negate all the memory savings immediately.
Flagging @bkmartinjr in case I've missed something here.

@arogozhnikov
Copy link
Author

Good catch. Used float16 sparse for some

but that would negate all the memory savings immediately

h5ads still take enough space + download time, so I see benefits even if it is an immediate conversion

@blrnw3
Copy link
Contributor

blrnw3 commented Feb 7, 2022

h5ads still take enough space + download time, so I see benefits even if it is an immediate conversion

Is it not easy enough to just convert the h5ad to float32 after the download?

@bkmartinjr
Copy link
Contributor

Suggest a cast in diffex code if the set1/set2 slices are fp16, as that is where the dependency lives. Pretty sure the rest of the code base does not have this constraint.

As it is already sliced out of the main matrix, you will retain most of the memory savings (ie, the cast to 32 bit will be only for the duration of the t-test compute)

@blrnw3
Copy link
Contributor

blrnw3 commented Feb 7, 2022

@bkmartinjr the problem is slicing the rows for diffex out of the X matrix. Scipy does not support complex slices for float16.
e.g. X[4] is ok but X[[1,2,4]] is not.

The workaround is to slice individual rows one at a time and then vstack them together, but this is prohibitively slow on anything larger than a few hundred rows.

@arogozhnikov
Copy link
Author

Is it not easy enough to just convert the h5ad to float32 after the download?

Well, I use paths in s3, not downloading myself.

I see the issues you're running into, I agree it would be non-trivial to deal with non-supported scipy format. Probably an optimal one would be to add conversion of X.data somewhere here https://github.com/chanzuckerberg/cellxgene/blob/main/server/data_anndata/anndata_adaptor.py#L325

@blrnw3
Copy link
Contributor

blrnw3 commented Feb 8, 2022

Ah I see. Let's see what @bkmartinjr thinks about just converting the full float16 matrix to float32 up front (or perhaps only when X matrix is requested). So there'd be no memory saving in normal operation but you'd save download bandwidth.

@bkmartinjr
Copy link
Contributor

I think it is a good compromise. Thumbs-up!

blrnw3 added a commit that referenced this issue Feb 9, 2022
Convert to float32 on startup unless backed, in which case error. scipy does not support complex slicing from float16 data so this is the easiest fix for now.
blrnw3 added a commit that referenced this issue Feb 10, 2022
* Fix float16 support [#2379]

Convert to float32 on startup unless backed, in which case error. scipy does not support complex slicing from float16 data so this is the easiest fix for now.

* minor msg change
@blrnw3
Copy link
Contributor

blrnw3 commented Feb 28, 2022

Fixed in 1.0.1

@blrnw3 blrnw3 closed this as completed Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants