Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with VisiumHD #358

Closed
Rafael-Silva-Oliveira opened this issue Apr 16, 2024 · 8 comments
Closed

Compatibility with VisiumHD #358

Rafael-Silva-Oliveira opened this issue Apr 16, 2024 · 8 comments
Labels
discussion enhancement New feature or request

Comments

@Rafael-Silva-Oliveira
Copy link

Rafael-Silva-Oliveira commented Apr 16, 2024

Following up with this: #356

Are there any improvements coming soon in cell2location to take in the full data for very large datasets? The approach of splitting in batches removes any spatial information that might be used to do cell deconvolution so taking in the full dataset would be important

With the new VisiumHD being more prominent, the number of spots or bins can range anywhere from 150k to 650k+, so when I test with cell2location, I always get memory errors.

And doing in batches also doesn't work very well (the full dataset would have to be passed along).

image

@Rafael-Silva-Oliveira Rafael-Silva-Oliveira added the enhancement New feature or request label Apr 16, 2024
@vitkl
Copy link
Contributor

vitkl commented Apr 21, 2024

Solving modelling challenges for VisiumHD, NovaST, OpenST and other similar high-resolution technologies is a research project rather than a matter of tweaking models to work on larger data. Subcellular resolution of measurements requires rethinking what it means to analyse this data. I see two major ways forward:

  1. Modelling observations at high resolution but ignoring cell boundaries. We could estimate cell type and subcellular gene programme densities at 1-2um resolution using CNN, GNN, conditional autoregressive models as approaches to pool information at the subcellular resolution. You also need to represent that scRNA-derived RNA abundance profile needs to be decomposed into several subcellular compartments. I implemented a CNN-based approach that considers these factors in this branch High resolution model & distance function prior [draft] #337 but it likely requires much more compute than people in this community are used to (many GPUs many days). This approach is also related to https://github.com/HiDiHlabs/ssam by @pjb7687.
  2. Using cell segmentation by modifying the Baysor approach to work with 1000s of genes. Maybe @VPetukhov can weigh in too.

If anyone is interested in collaborating to make this happen please reach out to me, Omer and Oliver by email.

@vitkl
Copy link
Contributor

vitkl commented Apr 21, 2024

A practical way to analyse VisiumHD right now is to aggregate it at 8um or 20um or 50um resolution (depending on data quality) and use cell2location with tips discussed in #356.

@Rafael-Silva-Oliveira
Copy link
Author

A practical way to analyse VisiumHD right now is to aggregate it at 8um or 20um or 50um resolution (depending on data quality) and use cell2location with tips discussed in #356.

Hey thank you so much for your valuable input! Indeed, this was something I was thinking - Performing clustering in order to achive 25-55 micron artificial "spots" and perform deconvolution on these. Another thing I was testing was cell typist that is commonly used for scRNA annotation, and since the slots are so small now, it's no longer a manner of spot deconvolution anymore. So by using the 8 micron resolution and assuming that each bin is a cell, cell typist seemed to do okay in that mapping, but I'm still figuring out the biological meaning of that mapping (if there's one to begin with). Other would be to use cell typist to annotate the 2 micron, then use these predictions to predict the 8 micron bin with majority voting, and then use these 8 micron bins to predict predict the 16 micron bins with majority voting once again. But then again, the goal of this new HD technology would be to avoid performing deconvolution as that was one of the main problems to solve from that technology, so I'm also interested in finding a solution to this for my research

@Rafael-Silva-Oliveira
Copy link
Author

Rafael-Silva-Oliveira commented Apr 22, 2024

A practical way to analyse VisiumHD right now is to aggregate it at 8um or 20um or 50um resolution (depending on data quality) and use cell2location with tips discussed in #356.

Hey thank you so much for your valuable input! Indeed, this was something I was thinking - Performing clustering in order to achive 25-55 micron artificial "spots" and perform deconvolution on these. Another thing I was testing was cell typist that is commonly used for scRNA annotation, and since the slots are so small now, it's no longer a manner of spot deconvolution anymore. So by using the 8 micron resolution and assuming that each bin is a cell, cell typist seemed to do okay in that mapping, but I'm still figuring out the biological meaning of that mapping (if there's one to begin with). Other would be to use cell typist to annotate the 2 micron, then use these predictions to predict the 8 micron bin with majority voting, and then use these 8 micron bins to predict predict the 16 micron bins with majority voting once again. But then again, the goal of this new HD technology would be to avoid performing deconvolution as that was one of the main problems to solve from that technology, so I'm also interested in finding a solution to this for my research

Another thing from these probabilistic methods is that it's heavily dependent on the intersection between the genes present in the scRNA used as reference and the ST data. So even when I use the 16 bin resolution, which has over 150k bins for a sample I'm using, I'm essentially only using 1500-2000 genes to do the "predictions" on these 150k bins

@LinearParadox
Copy link

Another possible improvement might come on the torch's end. Not sure how this would impact scvi tools, but cuda currently has a way to treat system memory as gpu memory. Pytorch doesn't have this implemented to my knowledge (pytorch/pytorch#104417 (comment)) and https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/.

@vitkl
Copy link
Contributor

vitkl commented Jul 14, 2024

Visium HD data can be segmented into cells using https://github.com/Teichlab/bin2cell and then counts aggregated into segmented cells can be analysed using cell2location to examine cell purity and to further decompose areas with spatially interlaced cell types.

@Rafael-Silva-Oliveira
Copy link
Author

Rafael-Silva-Oliveira commented Jul 15, 2024

Visium HD data can be segmented into cells using https://github.com/Teichlab/bin2cell and then counts aggregated into segmented cells can be analysed using cell2location to examine cell purity and to further decompose areas with spatially interlaced cell types.

I've been using bin2cell, but even then it produces quite a few number of cells (one of the datasets I've been testing still counts 350-400k cells out of the 9 million + bins of 2 micron resolution after bin2cell processing) so cell2location would still throw some memory errors I'm afraid. Unless we only use cell2location for subsets of the image, but that wouldn't scale well for reproducibility

Right now I'm using Bin2Cell + CellTypist and seems to be the most scalable approach for cell type inference using a scRNA reference

@lisch7
Copy link

lisch7 commented Oct 23, 2024

@vitkl @Rafael-Silva-Oliveira Hello, I am interested in using Bin2Cell for cell segmentation followed by analysis with Cell2Loc on HD data. I have followed the demo notebook for Bin2Cell available at (https://nbviewer.org/github/Teichlab/bin2cell/blob/main/notebooks/demo.ipynb), but I am unsure of the next steps after completing the pipeline. Could you please direct me to any tutorials or reproducible code that demonstrate this workflow? I would prefer not to use CellTypist, as I already possess a fully annotated single-cell dataset. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants